User Tools

Site Tools


iridium_cluster:wip

This is an old revision of the document!


Iridium cluster Work In Progress

Iridium cluster is a computing facility to serve researchers from Particle Physics and Nuclear Physics at Lund University.

It is currently maintained by Florido Paganelli and Luis Sarmiento (aka Pico) with the help of the Lunarc team.

Roadmap

:!: new schedule set March 17th, 2014 :!:

TODO

QUERY STATUS: By the end of August:

  • understand how to access other kind of datasets (e.g. where/how to store PRESPEC-AGATA dataset from Nuclear Physics)
    • ONGOING Luis is working on Nuclear Physics data. Waiting for update.

All the following tasks were due end of November 2013. Note: Priority changed. Batch system and grid interface comes before Lunarc new authentication method. Rescheduled by the end of March 2014:

  • 1) understand how to interact with existing grid storage present at Lunarc. Update: Lunarc suggested to benchmark current connection. If not enough, interact with Jens to understand how direct connection can be achieved. Lunarc only has data, no metadata or index. To be done once batch system is in place.
  • 4) Start batch/grid jobs tests with users from particle physics and nuclear physics. In this phase we will be able to see how resource management should be done for an optimal use of the cluster. Missing: ATLAS RTEs
  • 5) Setup n1 and n2 as test nodes. This includes:
    • Configure direct connection of nodes to the internet
    • LDAP+OTP authentication on nodes. Status: sent domain name to lunarc. Received first bit of info from Lunarc.

DONE (as 17th Mar 2014)

  • 3) install grid interfaces for researchers to run test grid jobs.

DONE (as 31st Jan 2014)

  • 2) install batch system.
    • add to salt config:
      • create folders for slurm
      • add slurm ports to iptables, wrt machines/services: 6817,6818,7321
    • set correct node values in slurm.conf

DONE (as 29th Jan 2014)

  • Physical direct connection of nodes n1/n2 to the internet done by Robert
  • started batch system installation
    • created user for slurm
    • used n10 as guinea pig for configuration

DONE (as 15th Oct 2013)

  • 1) Understand LDAP authentication as it is done in Lunarc

DONE (as 15th Sept 2013)

  • 1) Understand LDAP authentication as it is done in Lunarc
  • start testing programme. Each early tester gets one or two nodes.

DONE (as 20th August 2013)

  • 1) setup a direct ssh access to at least one of the computing nodes
  • 2) configure storage server with minimal services for users (e.g. home folders)
  • 3) install application software for researchers to run test jobs. This will include the use of CERNVM and needs some coordination with Lunarc.
    • independent from Lunarc. Meeting suggested to find alternative solutions. Luis set up of SALT.
    • Luis prepared automation of installation on all nodes.
    • cvmfs installation on one node succesful. installation automation is ongoing task.

Tech documents

Description of the cluster. Some of these documents might have restricted access. Contact me if you need to access those.

  • Iridium cluster summary [to be uploaded]
  • graphical view of cluster elements cluster_plan.png obsolete
  • graphical view of planned cluster software cluster_plan-sw.png obsolete

Captain's log

All the work that has been done, day by day.

Moved to another page because it was to big: captainslog

Issues

  • Faulty PSU on node Chassis 1 replaced with Alarik spare. Waiting for replacement part to come.

Howtos

Various other stuff

Wanted packages on nodes

iridium_cluster/wip.1397653013.txt.gz · Last modified: 2014/04/16 12:56 by florido