User Tools

Site Tools


aurora_cluster:moving_data

This is an old revision of the document!


Moving data to and from the cluster

Please read the section Storage before reading this section.

When moving data to the shared folders, please follow these common sense rules:

  • Create folders for everything you want to share.
  • If the data has been produced by you, is nice to create a folder with your username and place everything in it.
  • If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about.
  • Don't overdo. Only copy data you/your colleagues need. This is a shared facility.
  • Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility.
  • Don't expect contents of any scratch folder to be always there. At the moment, however, there is no deletion policy for that.

Moving data for users of Mathematical Physics and generic Lunarc users

Users of Mathematical Physics, as well as any other Lunarc user, can use their favorite tool to download and upload either from your own workstation, the Aurora front-end or the Aurora computing nodes. You can read about some of those tools on the Move data to and from the Iridium Cluster pages.

Moving data for users of Nuclear, Theoretical and Particle Physics

Users of these division can access the special node fs2-hep to be used for downloads or uploads.

These users (in particular Particle and Theorerical Physics) might need to download huge amount of data and therefore it was our objective to offload the Lunarc internal network and the usage of computing nodes as mere downloader nodes.

fs2-hep has a direct very fast connection to the internet for downloads and uploads.

:!:NOTE:!: incoming connections from the internet are rejected. This node can download FROM and upload TO the internet but cannot be accessed directly as a server to retrieve or upload data from OUTSIDE Lunarc. In other words, it is not possible to directly connect TO fs2-hep from the internet via sftp/ssh/rsync. You can only run those on fs2-hep itself. Read more about this in Uploading/Downloading data to/from Aurora from your laptop or workstation.

An overview of the upload/download components are shown in the slide below: source: https://docs.google.com/presentation/d/1agBLlMrMe3Pu1RGou5ut5LE0dgzeXFGztKu4Gjn_QBE/edit?usp=sharing

Using the downloader node

  1. Login to aurora.lunarc.lu.se
  2. Login to fs2-hep:
    ssh fs2-hep
  3. Start screen:
    screen
  4. Choose one of the upload/download methods below.
  5. The download destination MUST be one of the /projects/hep/ folders or your home folder. All other folders are not writeable by your user. Everything in /tmp will be deleted regularly.

The picture below shows the various steps.

Uploading/Downloading data to/from an external source from Aurora

  1. Use your favourite download software. Some suggestions are available at Moving data to and from Iridium
  2. Use your home folder or one of the /projects/hep folders as a destination folder. Any other path is not writable by your user. The /tmp folder will be deleted regularly so you should not use that. /projects/hep/fs2 is accessible by everyone, while /projects/hep/fs3 and /projects/hep/fs4 is dedicated storage for the ATLAS project.

Uploading/Downloading data to/from Aurora from your laptop or workstation

This can be done only for small files (order of tens of gigabytes), small data rates (slow transfers). You don't need to use fs2-hep, but you can go through Aurora's frontend. the For example, from your laptop:

sftp myfile aurora.lunarc.lu.se:/projects/hep/fs2/shared/np/myfolder/myfile

For big files (Hundreds of gigabytes up) you should use fs2-hep as described below. Aurora is not a storage facility, therefore is not meant to be accessed by external sources to do data movement. If you move big data via the Aurora frontend it is extremely slow and will slow down your colleagues work. Also, Aurora frontend managers might interrupt your transfers if they see it is taking too much time. I strongly recommend to follow the instructions at Uploading/Downloading data to/from an external source from Aurora above instead, and eventually run an ssh/ftp server on your own laptop or workstation, or ask the sysadmin for a convenient form of online storage.

For resources that can be stored on the GRID, you should definitely stage them on the Lund GRID storage instead, a few ways described under Downloading/Uploading data to/from the GRID to Aurora, so that you can access them from all over the world in the fastest way possible.

Downloading/Uploading data to/from the GRID from Aurora

Please read the dedicated page Moving data between GRID and Aurora

aurora_cluster/moving_data.1501523259.txt.gz · Last modified: 2017/07/31 17:47 by florido