User Tools

Site Tools


iridium_cluster:data

This is an old revision of the document!


Moving data to and from the cluster

Please read the section Common files organization before going through this section.

Rules of thumb

:!: Please read this carefully. :!:

When moving data to the shared folders, please follow these common sense rules:

  • Create folders for everything you want to share.
  • If the data has been produced by you, is nice to create a folder with your name and place everything in it.
  • If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about.
  • Don't overdo. Only copy data you/your colleagues need. This is a shared facility.
  • Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility.
  • Don't expect contents of the scratch folder to be always there. We still have no policy for that but we will have meetings in which we decide about it.

Data transfer solutions

Here's some solutions to move data to the cluster. 1-3 are generic data transfer tools. 4-5 are GRID oriented data transfer tools (mostly for Particle Physicists)

These marked with 8-) are my favourite — Florido Paganelli 2013/08/27 20:20

Generic storage

Solution 1: scp,sftp,lsftp

  • Pros:
    • easy
    • only needs terminal
    • available almost everywhere
    • progress indicator
  • Cons:
    • not reliable. If connection goes down one must restart the entire transfer.
    • does not work with GRID storage
    • slow

Example:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to n12.iridium shared folders

  scp ubuntu-12.04.2-desktop-i386.iso n12.iridium:/nfs/shared/pp/

Solution 2: rsync

8-)

  • Pros:
    • Reliable. If connection goes down will resume from where it stopped.
    • Minimizes amount of transferred data by compressing it
    • only needs terminal
    • available on most GNU/Linux platforms
    • a bit faster
  • Cons:
    • Awkward command line
    • bad logs
    • poor progress indicator on many files
    • available on windows but needs special installation
    • does not work with GRID storage

Syntax:

  rsync -avz -e 'ssh -l <username>' --progress source destination

Examples:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to pptest-iridium shared folders

  rsync -avz -e 'ssh -l pflorido' --progress ubuntu-12.04.2-desktop-amd64.iso pptest-iridium.lunarc.lu.se:/nfs/software/pp/

Note on the trailing slashes /:

source without trailing slash on source will create localdir remotely:

  rsync -avz -e 'ssh -l pflorido' --progress localdir pptest-iridium.iridium:/nfs/software/pp/

source with trailing slash on source will NOT create localdir remotely but will copy the contents of localdir remotely

  rsync -avz -e 'ssh -l pflorido' --progress localdir/ pptest-iridium.iridium:/nfs/software/pp/

Trailing slash on destination doesn't have any effect.

Solution 3: FileZilla

  • Pros:
    • Reliable. Tries to resume if connection went down.
    • Visual interface
    • Available for both GNU/Linux and windows
  • Cons:
    • Visual interface :D
    • good logs
    • progress bar ^_^
    • does not work with GRID storage

More about it: https://filezilla-project.org/download.php?type=client

GRID storage

Solution 4: NorduGrid ARC tools (arccp, arcls, arcrm)

  • Pros:
    • works with GRID storage
    • similar to cp
  • Cons:
    • doesn't work with ATLAS datasets (yet ;-) )

See also http://www.hep.lu.se/grid/localgroupdisk.html for more information on how to use Lund local GRID storage.

Example:

To copy files to/from the storage, use the srm: protocol and arccp tool:

arccp srm://srm.swegrid.se/atlas/disk/atlaslocalgroupdisk/lund/data11_7TeV/NTUP_SUSY/f354_m765_p486/data11_7TeV.00178109.physics_JetTauEtmiss.merge.NTUP_SUSY.f354_m765_p486_tid292683_00/NTUP_SUSY.292683._000131.root.1 file:///tmp/NTUP_SUSY.292683._000131.root.1 

Solution 5: dq2 tools

  • Pros:
    • works with GRID storage
  • Cons:
    • works with ATLAS datasets

To enable dq2 tools, you'll need to:

  1. copy and configure you GRID certificate on Iridium.
  2. run setupATLAS
  3. run localSetupDQ2Client
  4. login to the GRID using arcproxy or voms-proxy-init as one would do on lxplus.cern.ch .

Information about dq2 on CERN Twiki (only visible if you have a CERN account): https://twiki.cern.ch/twiki/bin/view/AtlasComputing/DQ2ClientsHowTo


iridium_cluster/data.1401880605.txt.gz · Last modified: 2014/06/04 11:16 by florido

Accessibility Statement