Differences

This shows you the differences between two versions of the page.

--- iridium_cluster:testingphase2 [2014/03/21 18:30]
florido created
+++ iridium_cluster:testingphase2 [2014/03/21 19:37] (current)
florido
@@ Line 5: / Line 5: @@
 ===== Purpose of the testing phase starting March 2014 =====
-:!: **Please read this section carefully.** :!:
+Testing phase 1 show that is possible to run jobs interactively by logging into the single nodes directly.
-During this testing phase, we will allow direct access to nodes of the cluster.
+In this second testing phase:
+  * two nodes will be used for interactive testing of code, and directly accessible from the internet.
-For **security reasons**, it is not easy to access the cluster now, but **don't be scared!** By the end of September we will find a solution to access the cluster facilities in a better way.
+  * a batch system will be installed and powered by the [[http://www.nordugrid.org/arc|ARC software]] for the researchers to be able to submit multiple jobs. Access to the cluster, allocation on the nodes will be taken care by ARC and the underlying batch system.
-**This test round has the following objectives**:
-  * Test the performance of the single nodes
-  * Test the validity of a single node as a testing environment for analysis code development
-  * Address any issue about missing software or functionality on each node
-  * Get feedback from users to make the service better
-**How will that work?**
-  * During this test round, every researcher will be assigned a **set of nodes** (typically two) that he/she can use for computational tasks.
-  * Any kind of drastic change (adding missing libraries or per-node software) cannot be done by the user, but will be done in cooperation with the administrators.
-  * Administrators will help researchers to solve issue step-by-step.
-** How will it change AFTER the testing phase?**
-  * Users will be able to directly access in an easier way **only two** of the nodes for the purpose of **testing their code** and **submitting batch jobs**.
-  * Batch jobs will be processed by the **cluster batch interface**, enabling to use the **full power of all the nodes alltogether**, in a fair way for all users.
-In short, the current setup does not maximize the computing power the cluster offers, and does not allow fair share. But allows researchers to start using the facility.
-We plan to finish the node testing phase by **mid September**, and to start testing the batch interface by the **end of September**.
-Thanks for your help during this testing time and we hope to have a lot of fun!
-===== Things one needs to know =====
-==== Short summary of what the cluster is ====
-The cluster is currently composed of three elements: a **gateway**, a **storage server** and a set of **12 nodes**.
-  * the **gateway** is used for users to access the cluster.
-  * the **storage server** is used to maintain user home folders, software and data. See [[#Common files organization]] below for details.
-  * the **nodes** are the place where you will actually run your code. Each node has:
-    * a simple name: //nX//, where //X// is the number of the node. i.e. //n1// is node number //1//.
-    * 16 cores
-    * 64GB of RAM
-    * access to all folder served by the storage server. This means that a researcher will have her own home folder regardless of the node she's logging in.
-For the time being, there has been no time to setup direct access to the cluster from the internet. Only two machines are allowed to access the cluster, and these are:
-  * for Nuclear Physicists, //alpha.nuclear.lu.se//. Contact Pico to get access.
-  * for Particle Physicists, //tjatte.hep.lu.se//. Contact [[:Florido Paganelli]] to get access. Detailed instructions are shown later in this page.
-==== Common files organization ====
-Every node of the cluster can access the shared storage. All users can access the shared storage, but can only access areas assigned to the working groups they belong.
-The shared storage is organized as follows:
-^ Folder name ^ Folder location ^ Folder purpose ^ expected filesize ^ Description ^ Subfolders ^
-^ users | ''/nfs/users'' | User homes | files smaller than 100MB each | This folder contains each user's private home folder. In this folder one should save his own code and eventually private data. Data that can also be used by others and the single files are bigger than 100MB should **not** be in this folder. Use the **shared** folder instead. | /<username> each user her own folder|
-^ **software** | ''/nfs/software'' | Application software  | files smaller than 100MB each | This folder hosts software that is not accessible via cvmfs (see later). This usually includes user/project specific libraries and frameworks. | ''/np'' for Nuclear Physics users |
-^ ::: | ::: | ::: | ::: | ::: | ''/pp'' for Particle Physics users |
-^ **shared** | ''/nfs/shared/'' | Data that will stay for long term | Any file, especially big ones | This folder should be used for long-term stored data. For example, data needed for the whole duration of a phd project or shared among people belonging to the same research group. | ''/np'' for Nuclear Physics users |
-^ ::: | ::: | ::: | ::: | ::: | ''/pp'' for Particle Physics users |
-^ **scratch** | ''/nfs/scratch/'' | Data that will stay for short term | Any file, especially big ones | This folder should be used for short-term stored data. For example, data needed for a week long calculation or temporary calculation. This folder should be considered unreliable as its contents will be purged from time to time. The cleanup interval is yet to be decided | ''/np'' for Nuclear Physics users |
-^ ::: | ::: | ::: | ::: | ::: | ''/pp'' for Particle Physics users |
-^ **cvmfs** | ''/cvmfs'' | Special folder containing CERN maintained software | user cannot write | This special folder is dedicated to software provided by CERN. This folder is read-only. Usually the content of this folder are managed via specific scripts that a user can run. If you need to add some software that you cannot find, contact the administrators. | ''/geant4.cern.ch'' for Nuclear Physics users |
-^ ::: | ::: | ::: | ::: | ::: | ''/atlas.cern.ch'' for Particle Physics users |
-==== User groups ====
-Three main UNIX user groups are defined, as follows:
-^ User group ^ Who belongs to it ^ Group hierarchy ^
-^ npusers | Researchers belonging to **Nuclear Physics** | primary |
-^ ppusers | Researchers belonging to **Particle Physics** | primary |
-^ clusterusers | All users accessing the cluster | secondary |
-Group hierarchy tells how your files are created. Whenever you create a file, its default ownership will be:
-  * user: your username
-  * group: group you belong
-----
-===== Accessing Testing nodes =====
-As said, currently access to nodes must happen via special machines.
-A typical access routine is the following:
-  - Access the special machines for your division.
-  - login to the iridium access gateway
-  - login to one of the nodes you are assigned to
-  - setup the work environment
-Let's see those in details.
-==== 1) Access the special machines for your division. ====
-=== Particle Physics ===
-simply run:
-<code>
-ssh <username>@tjatte.hep.lu.se
-</code>
-where ''<username>'' is the same used to login to //teddi// or to your own laptop.
-=== Nuclear Physics ===
-coming soon
-==== 2) login to the iridium access gateway ====
-simply run:
-<code>
-ssh <username>@iridium.lunarc.lu.se
-</code>
-where ''<username>'' is the your username on the cluster as given by the administrators.
-You will be accessing a special shell in which you'll see which node you are assigned to. Assigned nodes can also be seen in [[#Assigned Nodes]]
-:!: for **Particle Physicists**, the username is the same used to login to //teddi// or to your own laptop.
-==== 3) login to one of the nodes you are assigned to ====
-simply run:
-<code>
-ssh <username>@nX
-</code>
-where
-  * ''<username>'' is the your username on the cluster as given by the administrators.
-  * X is one of the node you're assigned in [[#Assigned Nodes]]
-:!: **NOTE** :!: : There is no checking upon login. you **can** login into a node that is **not** assigned to you. **PLEASE DON'T DO**. Please check. Security enforcement can be done but is not the purpose of this testing phase. If you encounter issues, we will be able to reduce access accordingly.
-==== 4) setup the work environment ====
-Administrators provided scripts for quick setup of your work enviroment.
-Just execute the command in the column //Script to run// at the shell prompt, or add it to your ''.bashrc'' or ''.bash_profile'' file so that is executed every time you login.
-The following are active now:
-^ Environment | Script to run | Description |
-^ ATLAS Experiment environment | ''setupATLAS'' | Will setup all the neeeded environment variables for ATLAS experiment, and present a selection of other environments that the user can setup. |
-===== Tips'n'Tricks =====
-Suggestions on how to make your life easier when using the cluster.
-==== Tips to speedup logging in ====
-One can speedup logging in by configuring her/his own ssh client. This will also help in scp-ing data to the cluster.
-=== Particle Physics ===
-My suggestion for Particle Physicists is to copy this piece of code inside their own ''.ssh/config'' file, and change it to your specific needs:
-<code>
-# access tjatte
-Host tjatte
-HostName tjatte.hep.lu.se
-User <username on tjatte>
-ForwardX11 yes
-# directly access iridium gateway
-Host iridiumgw
-User <Username on iridium>
-ForwardX11 yes
-ProxyCommand ssh -q tjatte nc iridium.lunarc.lu.se 22
-# directly access node X
-Host nX.iridium
-User <Username on iridium>
-ForwardX11 yes
-ProxyCommand ssh -q iridiumgw nc nX 22
-# directly access node Y
-Host nY.iridium
-User <Username on iridium>
-ForwardX11 yes
-ProxyCommand ssh -q iridiumgw nc nY 22
-</code>
-**Example:** My user is ''florido''. In the template above, I would change all the ''<Username ...>'' to ''florido'', and ''nX'' to ''n12''.
-then to login to //n12// I will do:
-  ssh n12.iridium
-And I will have to input 3 passwords: one for tjatte, one for the gateway and one for the node.
-If you want to access the cluster nodes from outside the division, you must go through //teddi// and eventually copy the above setup in your home ''.ssh'' folder.
-If you don't have an account on teddi or direct access to some other division machine, you should ask me to create one.
-Where X and Y is the nodes you're allowed to run.
-note that with the above you will be requested to input as many password as the number of machines in the connection. A way to ease this pain is to [[##speedup_login_by_using_ssh_keys|copy ssh keys to the nodes.]]
-Copying ssh keys to the gateway is not (yet) possible, hence you will always need two passwords: one for the ssh key and one for the gateway.
-=== Nuclear Physics ===
-coming soon
-== References: ==
-  * http://sshmenu.sourceforge.net/articles/transparent-mulithop.html
-==== Speedup login by using ssh keys ====
-An alternative method of authenticating via ssh is by using ssh keys. It will ease the pain of writing many passwords. The only password you will need is to unlock your key.
-:!: ** PLEASE DO NOT USE PASSWORDLESS KEYS. IT IS A GREAT SECURITY RISK. ** :!:
-Read about them here:
-https://wiki.archlinux.org/index.php/SSH_Keys
-==== How not to loose all your job because you closed a ssh terminal ====
-Use **screen**. //GNU screen// is an amazing tool that opens a remote terminal that is independent on your ssh connection. If the connection drops or you accidentally close the ssh window, it will still run your jobs on the cluster.
-A quick and dirty tutorial can be read [[:it_tips#screen|here]], but there's plenty more on the internet.
-----
-===== Moving data to the cluster =====
-Please read the section [[#Common files organization]] before going through this section.
-==== Rules of thumb ====
-:!: **Please read this carefully.** :!:
-When moving data to the shared folders, please follow these common sense rules:
-  * Create folders for everything you want to share.
-  * If the data has been produced by you, is nice to create a folder with your name and place everything in it.
-  * If the data belongs to some specific experiment, dataset or the like, create a folder name that is consistent with that and that is easy for everybody to understand what that is about.
-  * Don't overdo. Only copy data you/your colleagues need. This is a shared facility.
-  * Don't remove other user's files unless you advice them and they're ok with it. This is a shared facility.
-  * Don't expect contents of the ''scratch'' folder to be always there. We still have no policy for that but we will have meetings in which we decide about it.
-==== Data transfer solutions ====
-Here's some solutions to move data to the cluster. 1-3 are generic data transfer tools. 4-5 are GRID oriented data transfer tools (mostly for Particle Physicists)
-These marked with 8-) are my favourite  --- //[[:Florido Paganelli]] 2013/08/27 20:20//
-=== Solution 1: scp,sftp,lsftp ===
-  * **Pros:**
-    * easy
-    * only needs terminal
-    * available almost everywhere
-    * progress indicator
-  * **Cons:**
-    * not reliable. If connection goes down one must restart the entire transfer.
-    * does **not work** with GRID storage
-    * slow
-//Example://
-Moving ''ubuntu-12.04.2-desktop-amd64.iso'' from my local machine to ''n12.iridium''
-<code>
-  scp ubuntu-12.04.2-desktop-i386.iso n12.iridium:/nfs/shared/pp/
-</code>
-=== Solution 2: rsync  ===
--)
-  * **Pros:**
-    * Reliable. If connection goes down will resume from where it stopped.
-    * Minimizes amount of transferred data by compressing it
-    * only needs terminal
-    * available on most GNU/Linux platforms
-    * a bit faster
-  * **Cons:**
-    * Awkward command line
-    * bad logs
-    * poor progress indicator on many files
-    * available on windows but needs special installation
-    * does **not work** with GRID storage
-//Example://
-Moving ''ubuntu-12.04.2-desktop-amd64.iso'' from my local machine to ''n12.iridium''
-<code>
-  rsync -avz --progress ubuntu-12.04.2-desktop-amd64.iso n12.iridium:/nfs/software/pp/
-</code>
-=== Solution 3: FileZilla ===
-  * **Pros:**
-    * Reliable. Tries to resume if connection went down.
-    * Visual interface
-    * Available for both GNU/Linux and windows
-  * **Cons:**
-    * Visual interface :D
-    * good logs
-    * progress bar ^_^
-    * does **not work** with GRID storage
-More about it: https://filezilla-project.org/download.php?type=client
-=== Solution 4: NorduGrid ARC tools (arccp, arcls, arcrm) ===
-  * **Pros:**
-    * works with GRID storage
-  * **Cons:**
-    * doesn't work with ATLAS datasets (yet ;-) )
-    * uncommon command line interface
-//Example://
-<code>
-</code>
-=== Solution 5: dq2 tools ===
-  * **Pros:**
-    * works with GRID storage
-  * **Cons:**
-    * works with ATLAS datasets
-    * uncommon command line interface (but some are used to it)
-//Example://
-<code>
-</code>
-----
-===== Assigned Nodes =====
-^Researcher^ nodes ^
-^Pico | n1, [[n7]] |
-^Lene | n2, n8 |
-^Oleksandr | n6, n12 |
-^Anthony | n3, n9 |
-^Anders | n4, n10 |
-^Inga | n5, n11 |
-  * nodes n1-n6 run //CentOS 6.4//
-  * nodes n7-n12 run //Scientific Linux 6.4//
+During this transition phase the researchers will still be able to login directly on the single nodes, but through the testing nodes.

pfwiki

User Tools

Site Tools

Differences

Page Tools