This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
alicecrunchwip [2020/06/03 15:31] florido |
alicecrunchwip [2021/01/19 17:09] (current) florido [TODO] |
||
---|---|---|---|
Line 48: | Line 48: | ||
Additionally admins can login to a separate monitor machine to check the status of hardware components. This machine will also send warnings in case of issues. | Additionally admins can login to a separate monitor machine to check the status of hardware components. This machine will also send warnings in case of issues. | ||
+ | |||
+ | <code> | ||
+ | |||
+ | ____ __ | ||
+ | | | |==| | ||
+ | .------Admins------>|____| | | HW Monitor | ||
+ | | /::::/ |__| | ||
+ | | | | ||
+ | __ _ |<monitoring> | ||
+ | o [__]|=| v | ||
+ | /|\ /::/|_| ________ | ||
+ | / \ |==|=====| | ||
+ | Own laptop / teddi | | | | ||
+ | | | | | | ||
+ | | | | | ALICE new machine | ||
+ | '--Researchers-->| | | | ||
+ | | |====°| | ||
+ | |__|_____| | ||
+ | | ||
+ | | ||
+ | |||
+ | </code> | ||
Florido will take care of OS installation and hardware maintenance/monitoring. This info will be shared with ALICE admins. | Florido will take care of OS installation and hardware maintenance/monitoring. This info will be shared with ALICE admins. | ||
Line 54: | Line 76: | ||
===== TODO ===== | ===== TODO ===== | ||
- | * Configure networking: DONE | + | * Configure networking: DONE :-D |
- | * Configure maintenance interface: DONE | + | * Configure maintenance interface: DONE :-D |
- | * Configure monitoring server: DONE | + | * Configure HW monitoring server: ongoing FIXME |
- | * Install system: DONE | + | * Configure machine os and services DONE :-D |
- | * Create RAID6: ongoing | + | * Configure hardware monitoring FIXME |
- | * Find best settings, discuss with users | + | * Configure user access to monitor FIXME |
- | * Configure system: ongoing | + | * add NIS auth but with local folders? VS dedicated user with access to specific hw? |
- | * Configure firewall | + | * Install system: DONE :-D |
- | * Configure config management system. Maybe chance to test ansible | + | * Create RAID6: DONE :-D |
- | * Create admin users | + | * Find best settings, discuss with users DONE :-D |
- | * Format and connect RAID6 disk | + | * Configure system: ongoing DONE :-D |
+ | * Configure firewall DONE :-D | ||
+ | * Test fallback operating system in case of failure FIXME | ||
+ | * Configure config management system. Testing **ansible** DONE :-D | ||
+ | * Enable snmp monitoring DONE :-D | ||
+ | * add server to monitor DONE :-D | ||
+ | * Create admin users ongoing DONE :-D | ||
+ | * generic alice user DONE :-D | ||
+ | * give admin permissions to relevant alice users DONE :-D | ||
+ | * Format and connect RAID6 disk DONE :-D | ||
+ | * Configure RAID monitoring DONE :-D | ||
+ | * User services (tasks mainly to be done by admin users): DONE :-D | ||
+ | * Create users DONE :-D | ||
+ | * Create folders DONE :-D | ||
+ | * Share folders across servers (requires installing new services) :?: | ||
+ | * Install ALICE software DONE :-D by ALICE members | ||
+ | * More details here :?: | ||
+ | * Move data DONE :-D helped with more performing script | ||
+ | * Documentation | ||
+ | * Write a page on how to manage the system FIXME | ||
+ | * Started on [[:alicebeast]] | ||
+ | * Must be only accessible by admins | ||
+ | * Must include system description above here | ||
+ | * Must describe common procedures | ||
+ | * Ask if users want remote X11 - NO | ||
===== Logbook ===== | ===== Logbook ===== | ||
- | + | ==== 202101** ==== | |
- | ==== 20206** ==== | + | |
* new entry | * new entry | ||
+ | |||
+ | ==== 20210119 ==== | ||
+ | * Configured RAID monitoring | ||
+ | |||
+ | ==== 2020616 ==== | ||
+ | * Configured system to be managed remotely via ansible | ||
+ | * Changed sshd_config | ||
+ | * Added management keys to root user | ||
+ | * Management currently done via my workstation, may change in the future. | ||
+ | * Installed basic software | ||
+ | * Created alice admin user | ||
+ | * Installed basic build software | ||
+ | |||
+ | ==== 2020612 ==== | ||
+ | * Formatted RAID volume as xfs <code bash>mkfs.xfs -f -d su=1m,sw=5 -L alicedisk /dev/sda</code> | ||
+ | * Shows a warning but should be harmless | ||
+ | |||
+ | ^ su=1m | 1MB strip size | | ||
+ | ^ sw=5 | 5 RAID disks (+2 parity) | | ||
+ | ^ -L alicedisk | xfs label | | ||
+ | ^ /dev/sda | the raid shown as a device to the kernel | | ||
+ | |||
+ | * created folder ''/disk'' | ||
+ | * created entry in fstab for disk based on UUID:<code ini># RAID disk | ||
+ | UUID=0d4a40e5-084e-404f-9219-6c3645929ec2 /disk xfs rw,seclabel,relatime,attr2,inode64,sunit=2048,swidth=10240,noquota 0 0</code> | ||
+ | ==== 2020609 ==== | ||
+ | * configured server network and hostname: beast | ||
+ | * Installed storcli RAID management tool from Vendor website | ||
+ | * https://www.broadcom.com/products/storage/raid-controllers/megaraid-9460-8i | ||
+ | * https://docs.broadcom.com/docs/007.1316.0000.0000_Unified_StorCLI_PUL.zip | ||
+ | * Sw installed in ''/opt/MegaRAID/storcli/'' | ||
+ | * For brevity added alias ''storcli'' in root .bashrc | ||
+ | * Created array volume with 1024k strip size for very large files <code bash>storcli /c0 add vd r6 name=alicedisk drives=133:0-6 Strip=1024</code> | ||
+ | * Note: initialization takes 13h | ||
+ | * Raid volume name in storcli:''/c0/v0'' | ||
+ | * Created hotspare (disk that kicks in if one breaks)<code bash>storcli /c0/e133/s7 add hotsparedrive</code> | ||
+ | * Written few notes on storcli in [[it_tips:storcliqr|MegaRAID storcli QR]] | ||
==== 2020603 ==== | ==== 2020603 ==== | ||
Line 77: | Line 158: | ||
* Reset maintenance interface. Tested ok. Custom tools not working but web browser interface ok. | * Reset maintenance interface. Tested ok. Custom tools not working but web browser interface ok. | ||
* Installed system. Configured RAID1 on system disks. Upgraded system. | * Installed system. Configured RAID1 on system disks. Upgraded system. | ||
+ | * Configured basic firewall. | ||
==== 2020602 ==== | ==== 2020602 ==== |