User Tools

Site Tools


iridium_cluster:data

Moving data to and from the cluster

Please read the section Common files organization before going through this section.

Rules of thumb

:!: Please read this carefully. :!:

When moving data to the shared folders, please follow these common sense rules:

  • Create folders for everything you want to share.
  • If the data has been produced by you, is nice to create a folder with your name and place everything in it.
  • If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about.
  • Don't overdo. Only copy data you/your colleagues need. This is a shared facility.
  • Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility.
  • Don't expect contents of the scratch folder to be always there. We still have no policy for that but we will have meetings in which we decide about it.

Data transfer solutions

Here's some solutions to move data to the cluster. 1-3 are generic data transfer tools. 4-5 are GRID oriented data transfer tools (mostly for Particle Physicists)

These marked with 8-) are my favourite — Florido Paganelli 2013/08/27 20:20

Generic storage

Solution 1: scp,sftp,lsftp

  • Pros:
    • easy
    • only needs terminal
    • available almost everywhere
    • progress indicator
  • Cons:
    • not reliable. If connection goes down one must restart the entire transfer.
    • does not work with GRID storage
    • slow

Example:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to n12.iridium shared folders

  scp ubuntu-12.04.2-desktop-i386.iso n12.iridium:/nfs/shared/pp/

Solution 2: rsync

8-)

  • Pros:
    • Reliable. If connection goes down will resume from where it stopped.
    • Minimizes amount of transferred data by compressing it
    • only needs terminal
    • available on most GNU/Linux platforms
    • a bit faster
  • Cons:
    • Awkward command line
    • bad logs
    • poor progress indicator on many files
    • available on windows but needs special installation
    • does not work with GRID storage

Syntax:

  rsync -avz -e 'ssh -l <username>' --progress source destination

However, the progress indicator is not very good and most of the time slows down the transfers in the purpose of writing to standard output. Therefore I suggest you either redirect the standard error and output:

  rsync -avz -e 'ssh -l <username>' --progress source destination &> rsyncoutput.log

Or even better, use rsync own log file instead:

  rsync -avz -e 'ssh -l <username>' --log-file=rsyncoutput.log source destination

check the contents of the logfile now and then to see the status:

  tail rsyncoutput.log

Examples:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to pptest-iridium shared folders

  rsync -avz -e 'ssh -l pflorido' --progress ubuntu-12.04.2-desktop-amd64.iso pptest-iridium.lunarc.lu.se:/nfs/software/pp/

Note on the trailing slashes /:

source without trailing slash on source will create localdir remotely:

  rsync -avz -e 'ssh -l pflorido' --progress localdir pptest-iridium.iridium:/nfs/software/pp/

source with trailing slash on source will NOT create localdir remotely but will copy the contents of localdir remotely

  rsync -avz -e 'ssh -l pflorido' --progress localdir/ pptest-iridium.iridium:/nfs/software/pp/

Trailing slash on destination doesn't have any effect.

Solution 3: FileZilla

  • Pros:
    • Reliable. Tries to resume if connection went down.
    • Visual interface
    • Available for both GNU/Linux and windows
  • Cons:
    • Visual interface :D
    • good logs
    • progress bar ^_^
    • does not work with GRID storage

More about it: https://filezilla-project.org/download.php?type=client

GRID storage

Solution 4: NorduGrid ARC tools (arccp, arcls, arcrm)

  • Pros:
    • works with GRID storage
    • similar to cp
  • Cons:
    • doesn't work with ATLAS datasets (yet ;-) )

See also http://www.hep.lu.se/grid/localgroupdisk.html for more information on how to use Lund local GRID storage.

Example:

To copy files to/from the storage, use the srm: protocol and arccp tool:

arccp srm://srm.swegrid.se/atlas/disk/atlaslocalgroupdisk/lund/data11_7TeV/NTUP_SUSY/f354_m765_p486/data11_7TeV.00178109.physics_JetTauEtmiss.merge.NTUP_SUSY.f354_m765_p486_tid292683_00/NTUP_SUSY.292683._000131.root.1 file:///tmp/NTUP_SUSY.292683._000131.root.1 

Solution 5: Rucio or dq2 tools

  • Pros:
    • works with GRID storage
  • Cons:
    • works only with ATLAS datasets

If you have and ATLAS dataset, the best is to transfer it to the local LUND Grid storage first, and then to the cluster directly if needed. To do that you need to submit a DaTRi request

This page contains all you need to know on how to use the local storage: http://www.hep.lu.se/grid/localgroupdisk.html

To move the dataset from any ATLAS grid storage to Iridium, you are recommended to use Rucio, the successor of DQ2. Use the following:

To enable RUCIO tools, you'll need to:

  1. copy and configure you GRID certificate on Iridium.
  2. run setupATLAS
  3. run localSetupRucioClients
  4. login to the GRID using arcproxy -S atlas or voms-proxy-init as one would do on lxplus.cern.ch.

The RUCIO official documentation is here: http://rucio.cern.ch/cli_examples.html

If you still want to use dq2 tools, here's how:

To enable dq2 tools, you'll need to:

  1. copy and configure you GRID certificate on Iridium.
  2. run setupATLAS
  3. run localSetupDQ2Client
  4. login to the GRID using arcproxy or voms-proxy-init as one would do on lxplus.cern.ch .

Information about dq2 on CERN Twiki (only visible if you have a CERN account): https://twiki.cern.ch/twiki/bin/view/AtlasComputing/DQ2ClientsHowTo


iridium_cluster/data.txt · Last modified: 2016/06/28 13:34 by florido