====== Moving data to and from the cluster ====== Please read the section [[iridium_cluster:basic_information#Common files organization]] before going through this section. ===== Rules of thumb ===== :!: **Please read this carefully.** :!: When moving data to the shared folders, please follow these common sense rules: * Create folders for everything you want to share. * If the data has been produced by you, is nice to create a folder with your name and place everything in it. * If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about. * Don't overdo. Only copy data you/your colleagues need. This is a shared facility. * Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility. * Don't expect contents of the ''scratch'' folder to be always there. We still have no policy for that but we will have meetings in which we decide about it. ===== Data transfer solutions ===== Here's some solutions to move data to the cluster. 1-3 are generic data transfer tools. 4-5 are GRID oriented data transfer tools (mostly for Particle Physicists) These marked with 8-) are my favourite --- //[[:Florido Paganelli]] 2013/08/27 20:20// ==== Generic storage ==== === Solution 1: scp,sftp,lsftp === * **Pros:** * easy * only needs terminal * available almost everywhere * progress indicator * **Cons:** * not reliable. If connection goes down one must restart the entire transfer. * does **not work** with GRID storage * slow //Example:// Moving ''ubuntu-12.04.2-desktop-amd64.iso'' from my local machine to ''n12.iridium'' shared folders scp ubuntu-12.04.2-desktop-i386.iso n12.iridium:/nfs/shared/pp/ === Solution 2: rsync === 8-) * **Pros:** * Reliable. If connection goes down will resume from where it stopped. * Minimizes amount of transferred data by compressing it * only needs terminal * available on most GNU/Linux platforms * a bit faster * **Cons:** * Awkward command line * bad logs * poor progress indicator on many files * available on windows but needs special installation * does **not work** with GRID storage Syntax: rsync -avz -e 'ssh -l ' --progress source destination However, the progress indicator is not very good and most of the time slows down the transfers in the purpose of writing to standard output. Therefore I suggest you either **redirect the standard error and output**: rsync -avz -e 'ssh -l ' --progress source destination &> rsyncoutput.log Or even better, use **rsync own log file** instead: rsync -avz -e 'ssh -l ' --log-file=rsyncoutput.log source destination check the contents of the logfile now and then to see the status: tail rsyncoutput.log //Examples:// Moving ''ubuntu-12.04.2-desktop-amd64.iso'' from my local machine to ''pptest-iridium'' shared folders rsync -avz -e 'ssh -l pflorido' --progress ubuntu-12.04.2-desktop-amd64.iso pptest-iridium.lunarc.lu.se:/nfs/software/pp/ Note on the trailing slashes **/**: source **without** trailing slash on source **will create** //localdir// remotely: rsync -avz -e 'ssh -l pflorido' --progress localdir pptest-iridium.iridium:/nfs/software/pp/ source **with** trailing slash on source **will NOT create** //localdir// remotely but will **copy the contents** of //localdir// remotely rsync -avz -e 'ssh -l pflorido' --progress localdir/ pptest-iridium.iridium:/nfs/software/pp/ Trailing slash on destination doesn't have any effect. === Solution 3: FileZilla === * **Pros:** * Reliable. Tries to resume if connection went down. * Visual interface * Available for both GNU/Linux and windows * **Cons:** * Visual interface :D * good logs * progress bar ^_^ * does **not work** with GRID storage More about it: https://filezilla-project.org/download.php?type=client ==== GRID storage ==== === Solution 4: NorduGrid ARC tools (arccp, arcls, arcrm) === * **Pros:** * works with GRID storage * similar to cp * **Cons:** * doesn't work with ATLAS datasets (yet ;-) ) See also http://www.hep.lu.se/grid/localgroupdisk.html for more information on how to use Lund local GRID storage. //Example:// To copy files to/from the storage, use the ''srm:'' protocol and ''arccp'' tool: arccp srm://srm.swegrid.se/atlas/disk/atlaslocalgroupdisk/lund/data11_7TeV/NTUP_SUSY/f354_m765_p486/data11_7TeV.00178109.physics_JetTauEtmiss.merge.NTUP_SUSY.f354_m765_p486_tid292683_00/NTUP_SUSY.292683._000131.root.1 file:///tmp/NTUP_SUSY.292683._000131.root.1 === Solution 5: Rucio or dq2 tools === * **Pros:** * works with GRID storage * **Cons:** * works only with ATLAS datasets If you have and ATLAS dataset, the best is to transfer it to the local LUND Grid storage first, and then to the cluster directly if needed. To do that you need to submit a DaTRi request This page contains all you need to know on how to use the local storage: http://www.hep.lu.se/grid/localgroupdisk.html To move the dataset from any ATLAS grid storage to Iridium, you are recommended to use Rucio, the successor of DQ2. Use the following: To enable RUCIO tools, you'll need to: - copy and configure you GRID certificate on Iridium. - run ''setupATLAS'' - run ''localSetupRucioClients'' - login to the GRID using ''arcproxy -S atlas'' or ''voms-proxy-init'' as one would do on //lxplus.cern.ch//. The RUCIO official documentation is here: http://rucio.cern.ch/cli_examples.html If you still want to use dq2 tools, here's how: To enable dq2 tools, you'll need to: - copy and configure you GRID certificate on Iridium. - run ''setupATLAS'' - run ''localSetupDQ2Client'' - login to the GRID using ''arcproxy'' or ''voms-proxy-init'' as one would do on //lxplus.cern.ch// . Information about dq2 on CERN Twiki (only visible if you have a CERN account): https://twiki.cern.ch/twiki/bin/view/AtlasComputing/DQ2ClientsHowTo ----