User Tools

Site Tools


iridium_cluster:data

This is an old revision of the document!


Moving data to and from the cluster

Please read the section Common files organization before going through this section.

Rules of thumb

:!: Please read this carefully. :!:

When moving data to the shared folders, please follow these common sense rules:

  • Create folders for everything you want to share.
  • If the data has been produced by you, is nice to create a folder with your name and place everything in it.
  • If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about.
  • Don't overdo. Only copy data you/your colleagues need. This is a shared facility.
  • Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility.
  • Don't expect contents of the scratch folder to be always there. We still have no policy for that but we will have meetings in which we decide about it.

Data transfer solutions

Here's some solutions to move data to the cluster. 1-3 are generic data transfer tools. 4-5 are GRID oriented data transfer tools (mostly for Particle Physicists)

These marked with 8-) are my favourite — Florido Paganelli 2013/08/27 20:20

Generic storage

Solution 1: scp,sftp,lsftp

  • Pros:
    • easy
    • only needs terminal
    • available almost everywhere
    • progress indicator
  • Cons:
    • not reliable. If connection goes down one must restart the entire transfer.
    • does not work with GRID storage
    • slow

Example:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to n12.iridium

  scp ubuntu-12.04.2-desktop-i386.iso n12.iridium:/nfs/shared/pp/

Solution 2: rsync

8-)

  • Pros:
    • Reliable. If connection goes down will resume from where it stopped.
    • Minimizes amount of transferred data by compressing it
    • only needs terminal
    • available on most GNU/Linux platforms
    • a bit faster
  • Cons:
    • Awkward command line
    • bad logs
    • poor progress indicator on many files
    • available on windows but needs special installation
    • does not work with GRID storage

Example:

Moving ubuntu-12.04.2-desktop-amd64.iso from my local machine to n12.iridium

  rsync -avz --progress ubuntu-12.04.2-desktop-amd64.iso n12.iridium:/nfs/software/pp/

Solution 3: FileZilla

  • Pros:
    • Reliable. Tries to resume if connection went down.
    • Visual interface
    • Available for both GNU/Linux and windows
  • Cons:
    • Visual interface :D
    • good logs
    • progress bar ^_^
    • does not work with GRID storage

More about it: https://filezilla-project.org/download.php?type=client

GRID storage

Solution 4: NorduGrid ARC tools (arccp, arcls, arcrm)

  • Pros:
    • works with GRID storage
    • similar to cp
  • Cons:
    • doesn't work with ATLAS datasets (yet ;-) )

See also http://www.hep.lu.se/grid/localgroupdisk.html for more information on how to use Lund local GRID storage.

Example:

To copy files to/from the storage, use the srm: protocol and arccp tool:

arccp srm://srm.swegrid.se/atlas/disk/atlaslocalgroupdisk/lund/data11_7TeV/NTUP_SUSY/f354_m765_p486/data11_7TeV.00178109.physics_JetTauEtmiss.merge.NTUP_SUSY.f354_m765_p486_tid292683_00/NTUP_SUSY.292683._000131.root.1 file:///tmp/NTUP_SUSY.292683._000131.root.1 

Solution 5: dq2 tools

  • Pros:
    • works with GRID storage
  • Cons:
    • works with ATLAS datasets
    • uncommon command line interface (but some are used to it)

Example:



iridium_cluster/data.1399993393.txt.gz · Last modified: 2014/05/13 15:03 by florido

Accessibility Statement