This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
alicecrunchwip [2020/06/03 14:56] florido created |
alicecrunchwip [2020/06/12 16:31] florido [2020612] |
||
---|---|---|---|
Line 12: | Line 12: | ||
See description in this graph below. | See description in this graph below. | ||
+ | |||
+ | <code> | ||
+ | |||
+ | |||
+ | .------------------------------------------. | ||
+ | | M.2 Disk 250GB | | ||
+ | |------------------------------------------| | ||
+ | .----------| Main system disk - boot / | SW RAID1 | ||
+ | | | operating system |-------------------------------------------------------------. | ||
+ | | | /swap | | | ||
+ | | | /ahome (admin home separated from users) | | | ||
+ | | '------------------------------------------' | | ||
+ | | | | ||
+ | | SW RAID1 | | ||
+ | | .----------------------------------------. .----------------------------------------. | | ||
+ | | | U.2 disk 1 960GB | | U.2 disk 2 960GB | | | ||
+ | | |----------------------------------------| |----------------------------------------| | | ||
+ | | | - User folders /home 700GB |<---------------->| - User folders /home 700GB | | | ||
+ | | | -------------------------------------- | SW RAID1 | -------------------------------------- | | | ||
+ | --------->| - copy of boot disk (bootable) / 100GB | | - copy of boot disk (bootable) / 100GB |<---' | ||
+ | | copy of /ahome 100GB260GB |<---------------->| copy of /ahome 100GB | | ||
+ | '----------------------------------------' '----------------------------------------' | ||
+ | |||
+ | |||
+ | |||
+ | .------------------------------------------------. | ||
+ | __________ | RAID6 storage - 8x 12TB disks /disk | __________ | ||
+ | [_|||||||_°]|------------------------------------------------|[_|||||||_°] | ||
+ | [_|||||||_°]| - 7x 12TB in RAID6 -> 60TB (50-55 usable) |[_|||||||_°] | ||
+ | [_|||||||_°]| - 1x 12TB spare |[_|||||||_°] | ||
+ | '------------------------------------------------' | ||
+ | </code> | ||
One can login to the machine as it does with aurora. | One can login to the machine as it does with aurora. | ||
Additionally admins can login to a separate monitor machine to check the status of hardware components. This machine will also send warnings in case of issues. | Additionally admins can login to a separate monitor machine to check the status of hardware components. This machine will also send warnings in case of issues. | ||
+ | |||
+ | <code> | ||
+ | |||
+ | ____ __ | ||
+ | | | |==| | ||
+ | .------Admins------>|____| | | HW Monitor | ||
+ | | /::::/ |__| | ||
+ | | | | ||
+ | __ _ |<monitoring> | ||
+ | o [__]|=| v | ||
+ | /|\ /::/|_| ________ | ||
+ | / \ |==|=====| | ||
+ | Own laptop / teddi | | | | ||
+ | | | | | | ||
+ | | | | | ALICE new machine | ||
+ | '--Researchers-->| | | | ||
+ | | |====°| | ||
+ | |__|_____| | ||
+ | | ||
+ | | ||
+ | |||
+ | </code> | ||
Florido will take care of OS installation and hardware maintenance/monitoring. This info will be shared with ALICE admins. | Florido will take care of OS installation and hardware maintenance/monitoring. This info will be shared with ALICE admins. | ||
ALICE admins will have access as root and control any parameter they want. | ALICE admins will have access as root and control any parameter they want. | ||
+ | |||
+ | ===== TODO ===== | ||
+ | |||
+ | * Configure networking: DONE :-D | ||
+ | * Configure maintenance interface: DONE :-D | ||
+ | * Configure monitoring server: DONE :-D | ||
+ | * Install system: DONE :-D | ||
+ | * Create RAID6: DONE :-D | ||
+ | * Find best settings, discuss with users DONE :-D | ||
+ | * Configure system: ongoing FIXME | ||
+ | * Configure firewall DONE :-D | ||
+ | * Test fallback operating system in case of failure FIXME | ||
+ | * Configure config management system. Maybe chance to test ansible :?: | ||
+ | * Create admin users :?: | ||
+ | * Format and connect RAID6 disk DONE :-D | ||
+ | * User services (tasks mainly to be done by admin users): :?: | ||
+ | * Create users :?: | ||
+ | * Create folders :?: | ||
+ | * Share folders across servers (requires installing new services) :?: | ||
+ | * Install ALICE software :?: | ||
+ | * More details here :?: | ||
+ | * Move data :?: | ||
===== Logbook ===== | ===== Logbook ===== | ||
+ | |||
+ | |||
+ | ==== 20206** ==== | ||
+ | * new entry | ||
+ | |||
+ | |||
+ | ==== 2020612 ==== | ||
+ | * Formatted RAID volume as xfs <code bash>mkfs.xfs -f -d su=1m,sw=5 -L alicedisk /dev/sda</code> | ||
+ | * Shows a warning but should be harmless | ||
+ | | su=1m | 1MB strip size | | ||
+ | | sw=5 | 5 RAID disks (+2 parity) | | ||
+ | | -L alicedisk | xfs label | | ||
+ | | /dev/sda | the raid shown as a device to the kernel | | ||
+ | * created folder ''/disk'' | ||
+ | * created entry in fstab for disk based on UUID:<code ini># RAID disk | ||
+ | UUID=0d4a40e5-084e-404f-9219-6c3645929ec2 /disk xfs rw,seclabel,relatime,attr2,inode64,sunit=2048,swidth=10240,noquota 0 0</code> | ||
+ | ==== 2020609 ==== | ||
+ | * configured server network and hostname: beast | ||
+ | * Installed storcli RAID management tool from Vendor website | ||
+ | * https://www.broadcom.com/products/storage/raid-controllers/megaraid-9460-8i | ||
+ | * https://docs.broadcom.com/docs/007.1316.0000.0000_Unified_StorCLI_PUL.zip | ||
+ | * Sw installed in ''/opt/MegaRAID/storcli/'' | ||
+ | * For brevity added alias ''storcli'' in root .bashrc | ||
+ | * Created array volume with 1024k strip size for very large files <code bash>storcli /c0 add vd r6 name=alicedisk drives=133:0-6 Strip=1024</code> | ||
+ | * Note: initialization takes 13h | ||
+ | * Raid volume name in storcli:''/c0/v0'' | ||
+ | * Created hotspare (disk that kicks in if one breaks)<code bash>storcli /c0/e133/s7 add hotsparedrive</code> | ||
+ | * Written few notes on storcli in [[it_tips:storcliqr|MegaRAID storcli QR]] | ||
+ | |||
+ | ==== 2020603 ==== | ||
+ | * Configured server network. Current hostname: alice | ||
+ | * Configured remote access to monitor using X2GO client-server technology. Works nicely. | ||
+ | * Reset maintenance interface. Tested ok. Custom tools not working but web browser interface ok. | ||
+ | * Installed system. Configured RAID1 on system disks. Upgraded system. | ||
+ | * Configured basic firewall. | ||
+ | |||
+ | ==== 2020602 ==== | ||
+ | * Created monitor machine | ||
+ | * Placed monitor machine in C165 | ||
+ | * Connected monitor machine to server | ||
+ | * Configured RAID6. Experimental. More info needed from users. Disks working but disk lights not blinking, contacted Compliq. | ||
+ | * Configured remote access to monitor. Not working, better solution required | ||
+ | * Configured remote access to server maintenance interface. Not working, requires reset. First attempt failed due to network config missing. | ||
+ | |||
+ | ==== 20200529 ==== | ||
+ | * Server arrived. Placed in C165. | ||
+ | * Inspected hardware and disk bays | ||
+ | * Provided electric connectivity | ||
+ | * Provided network connectivity | ||
+ | * First boot successfull |