User Tools

Site Tools


iridium_cluster:wip

This is an old revision of the document!


Iridium cluster Work In Progress

Iridium cluster is a computing facility to serve researchers from Particle Physics and Nuclear Physics at Lund University.

It is currently maintained by Florido Paganelli and Luis Sarmiento (aka Pico) with the help of the Lunarc team.

Roadmap

:!: new schedule set September 9th, 2015 :!:

  • 1) Reconfigure service disks on frontend with a more flexible technology, e.g. lvm.
    • Due w38
  • 2) Reinstall service machines with lxc instead of kvm/virtualization,
    • Due w38
  • 3) Connect to the newly bought storage at Lunarc
    • Due October 2015
  • 4) Connect to the default storage DDN at Lunarc
    • Due October 2015
  • 5) Configure the new nodes to integrate into Lunarc
    • Due January 2016
  • 6) Configure access to be integrated into Lunarc
    • Due January 2016

DONE (as 9th Sep 2015)

  • 1) understand how to interact with existing grid storage present at Lunarc. Update: Lunarc suggested to benchmark current connection. If not enough, interact with Jens to understand how direct connection can be achieved. Lunarc only has data, no metadata or index.
    • Outcome: users prefer to interact directly with the data and move it on the cluster storage. Use of FAX is not possible yet both because it's an experimental technology and also because FAX is only installed in Tier0 or Tier1. The storage should be one of these.
  • 4) Start batch/grid jobs tests with users from particle physics and nuclear physics. In this phase we will be able to see how resource management should be done for an optimal use of the cluster. Missing: ATLAS RTEs.
    • Outcome: not so succesful due to the cluster not being a tier2 and its connection to grid storage not optimal. Also, disk problems prevented optimal operation.
  • 5) Setup n1 and n2 as test nodes. This includes:
    • Configure direct connection of nodes to the internet DONE
    • LDAP+OTP authentication on nodes. Status: sent domain name to lunarc. Received first bit of info from Lunarc. NOT DONE: impractical for use, there was no time to set it up. To be scheduled for next iteration and new Iridium nodes.

DONE (as 17th Mar 2014)

  • 3) install grid interfaces for researchers to run test grid jobs.

DONE (as 31st Jan 2014)

  • 2) install batch system.
    • add to salt config:
      • create folders for slurm
      • add slurm ports to iptables, wrt machines/services: 6817,6818,7321
    • set correct node values in slurm.conf

DONE (as 29th Jan 2014)

  • Physical direct connection of nodes n1/n2 to the internet done by Robert
  • started batch system installation
    • created user for slurm
    • used n10 as guinea pig for configuration

DONE (as 15th Oct 2013)

  • 1) Understand LDAP authentication as it is done in Lunarc

DONE (as 15th Sept 2013)

  • 1) Understand LDAP authentication as it is done in Lunarc
  • start testing programme. Each early tester gets one or two nodes.

DONE (as 20th August 2013)

  • 1) setup a direct ssh access to at least one of the computing nodes
  • 2) configure storage server with minimal services for users (e.g. home folders)
  • 3) install application software for researchers to run test jobs. This will include the use of CERNVM and needs some coordination with Lunarc.
    • independent from Lunarc. Meeting suggested to find alternative solutions. Luis set up of SALT.
    • Luis prepared automation of installation on all nodes.
    • cvmfs installation on one node succesful. installation automation is ongoing task.

Tech documents

Description of the cluster. Some of these documents might have restricted access. Contact me if you need to access those.

  • Iridium cluster summary [to be uploaded]
  • graphical view of cluster elements cluster_plan.png obsolete
  • graphical view of planned cluster software cluster_plan-sw.png obsolete

Captain's log

All the work that has been done, day by day.

Moved to another page because it was to big: captainslog

Issues

  • Faulty PSU on node Chassis 1 replaced with Alarik spare. Waiting for replacement part to come.
  • Faulty motherboard on n11 was completely replaced.

Howtos

Various other stuff

Wanted packages on nodes

iridium_cluster/wip.1441808173.txt.gz · Last modified: 2015/09/09 14:16 by florido