User Tools

Site Tools


iridium_cluster:wip

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
iridium_cluster:wip [2014/01/30 09:43]
florido [DONE (as 29th Jan 2014)]
iridium_cluster:wip [2015/09/15 16:47] (current)
florido [System related]
Line 9: Line 9:
 ===== Roadmap ===== ===== Roadmap =====
  
-:!: new schedule set January 29th2014 :!:+:!: new schedule set September 9th2015 :!:
  
-==== TODO ====+  * 1) Reconfigure service disks on frontend with a more flexible technology, e.g. lvm.  
 +    * Due w38 
 +  * 2) Reinstall service machines with lxc instead of kvm/​virtualization,​  
 +    * Due w38 
 +  * 3) Connect to the newly bought storage at Lunarc  
 +    * Due October 2015 
 +  * 4) Connect to the default storage DDN at Lunarc  
 +    * Due October 2015 
 +  * 5) Configure the new nodes to integrate into Lunarc  
 +    * Due January 2016 
 +  * 6) Configure access to be integrated into Lunarc  
 +    * Due January 2016
  
-QUERY STATUS: By the end of August: ​ 
  
-  * understand how to access other kind of datasets (e.g. where/how to store PRESPEC-AGATA dataset from Nuclear Physics) 
-    * **ONGOING** Luis is working on Nuclear Physics data. Waiting for update. 
  
-All the following tasks were due end of November 2013. 
-Rescheduled by the end of February 2014: 
  
-  ​* 1) Setup n1 and n2 as test nodes. This includes: +==== DONE (as 9th Sep 2015) ==== 
-    * Configure direct connection of nodes to the internet +  ​* 1) understand how to interact with existing grid storage present at Lunarc. Update: Lunarc suggested to benchmark current connection. If not enough, interact with Jens to understand how direct connection can be achieved. Lunarc only has data, no metadata or index. 
-    * LDAP+OTP authentication on nodes. Status: sent domain name to lunarc. Received first bit of info from Lunarc. +    * Outcome: users prefer to interact directly with the data and move it on the cluster storage. Use of FAX is not possible yet both because it's an experimental technology and also because FAX is only installed in Tier0 or Tier1. The storage should be one of these. 
-  * 2) understand how to interact with existing grid storage present at Lunarc. UpdateLunarc suggested to benchmark current connection. If not enoughinteract with Jens to understand how direct connection can be achieved. Lunarc only has data, no metadata or index. To be done once batch system is in place+  * 4) Start batch/grid jobs tests with users from particle physics and nuclear physics. In this phase we will be able to see how resource management should be done for an optimal use of the cluster. Missing: ATLAS RTEs. 
-  * 3) install batch system.+    * Outcome: not so succesful due to the cluster not being a tier2 and its connection to grid storage not optimal. Also, disk problems prevented optimal operation. 
 +  * 5) Setup n1 and n2 as test nodes. This includes: 
 +    * Configure direct connection of nodes to the internet ​DONE 
 +    * LDAP+OTP authentication on nodes. Status: sent domain name to lunarc. Received first bit of info from Lunarc. ​NOT DONEimpractical for usethere was no time to set it up. To be scheduled for next iteration and new Iridium nodes. 
 + 
 + 
 + 
 +==== DONE (as 17th Mar 2014) ==== 
 + 
 +  * 3) install grid interfaces for researchers to run test grid jobs. 
 + 
 +==== DONE (as 31st Jan 2014) ==== 
 +  * 2) install batch system.
     * add to salt config:     * add to salt config:
       * create folders for slurm       * create folders for slurm
       * add slurm ports to iptables, wrt machines/​services:​ 6817,​6818,​7321       * add slurm ports to iptables, wrt machines/​services:​ 6817,​6818,​7321
     * set correct node values in ''​slurm.conf''​     * set correct node values in ''​slurm.conf''​
- +  ​
-Rescheduled end of February 2014/ beginning of March: +
-  * 4) install grid interfaces for researchers to run test grid jobs. +
-  * 5) Start batch/grid jobs tests with users from particle physics and nuclear physics. In this phase we will be able to see how resource management should be done for an optimal use of the cluster.+
  
 ==== DONE (as 29th Jan 2014) ==== ==== DONE (as 29th Jan 2014) ====
Line 75: Line 90:
  
   * Faulty PSU on node Chassis 1 replaced with Alarik spare. Waiting for replacement part to come.   * Faulty PSU on node Chassis 1 replaced with Alarik spare. Waiting for replacement part to come.
 +  * Faulty motherboard on n11 was completely replaced.
  
 ---- ----
  
 ===== Useful links ===== ===== Useful links =====
 +
 +==== Hardware related ====
 +  * Information on the RAID controller on storage: ​
 +    * http://​hwraid.le-vert.net/​wiki/​LSIMegaRAIDSAS
 +    * Utility mentioned up there can be found on LSI logic website painfully. I finally find it here:
 +      * http://​www.lsi.com/​support/​Pages/​download-results.aspx?​keyword=megacli
 +    * Cheat sheet for the above utility:
 +      * http://​erikimh.com/​megacli-cheatsheet/​
  
 ==== System related ==== ==== System related ====
Line 91: Line 115:
   * CENTOS DHCP config: http://​www.krizna.com/​centos/​install-configure-dhcp-server-centos-6/​   * CENTOS DHCP config: http://​www.krizna.com/​centos/​install-configure-dhcp-server-centos-6/​
   * :!: this is for NFS3. Would be better to use nfs4 and limit portmapper to NIS. CENTOS NFS config: http://​www.malaya-digital.org/​setup-a-minimal-centos-6-64-bit-nfs-server/​   * :!: this is for NFS3. Would be better to use nfs4 and limit portmapper to NIS. CENTOS NFS config: http://​www.malaya-digital.org/​setup-a-minimal-centos-6-64-bit-nfs-server/​
 +  * lvm volume groups guide: https://​www.centos.org/​docs/​5/​html/​Cluster_Logical_Volume_Manager/​index.html
 +  * lxc-libvirt containers creation: https://​wiki.centos.org/​HowTos/​LXC-on-CentOS6
  
 ==== Software related ==== ==== Software related ====
iridium_cluster/wip.1391071408.txt.gz · Last modified: 2014/01/30 09:43 by florido