Last modified by flenners on 2025-06-24 16:56

From version 17.1
edited by flenners
on 2022-05-20 09:17
Change comment: There is no comment for this version
To version 11.1
edited by greving
on 2021-10-26 11:23
Change comment: There is no comment for this version

Summary

Details

Page properties
Title
... ... @@ -1,1 +1,1 @@
1 -00 - How to login to Maxwell
1 +02 - How to login to Maxwell and start RecoGUI
Author
... ... @@ -1,1 +1,1 @@
1 -XWiki.flenners
1 +XWiki.greving
Content
... ... @@ -1,212 +1,160 @@
1 -The DESY has a quite powerful compute cluster called the Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.Maxwell Cluster.WebHome||shape="rect"]], however as this can be confusing sometimes, we will try to condensate this to a step by step manual.
1 += {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}**Short Version: ** =
2 2  
3 +Terminal:
3 3  
5 +(% class="code" %)
6 +(((
7 +salloc ~-~-partition=psx ~-~-nodes=1 –-time=06:00:00
8 +\\(if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}}(%%) )
9 +\\ssh max-bla123
10 +\\module load anaconda
11 +\\source activate ~~/envs/tomopy
12 +\\spyder&
13 +)))
4 4  
5 -{{toc/}}
15 +\\
6 6  
7 -= {{id name="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =
17 +\\
8 8  
9 -During you beamtime you will encounter multiple systems, where you will need two different types of accounts:
19 +{{code linenumbers="true" collapse="true"}}
20 +salloc --partition=psx --nodes=1 –-time=06:00:00
10 10  
11 -== {{id name="00-HowtologintoMaxwell-TheDOORAccount"/}}The DOOR Account ==
22 +ssh max-bla123
12 12  
13 -Before you arrive you have to create a DOOR account and do all the safety trainings. This account is also being used for the gamma-portal, where you can manage you beamtime data, grant access to other users and manage FTP access. However this account does not work with other resources. For this you will have to request a second account:
24 +module load anaconda
14 14  
15 -== {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account ==
26 +source activate ~/envs/tomopy
16 16  
17 -If you decide during a beamtime, you want to have access to the cluster, tell your local contact so, and they will request a PSX account for you. With this you will get access to the Kerberos, Windows and afs resources at DESY, which includes the cluster.
18 -
19 -= {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =
20 -
21 -== {{id name="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structure of the Cluster ==
22 -
23 -=== {{id name="00-HowtologintoMaxwell-Overview"/}}Overview ===
24 -
25 -The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node
26 -
27 -=== {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes ===
28 -
29 -If you have successfully obtained an PSX account you can get started. The entrance node are:
30 -\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]]  (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon)
31 -
32 -[[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]]  (in any case)
33 -
34 -These nodes are **not **for processing, as you will share them with many other users. So please do not do anything computational intensive on them, like reconstruction or visualization. Viewing images is ok.
35 -
36 -=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 ===
37 -
38 -The cluster uses the software FastX2 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly.
39 -
40 -If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal.
41 -
42 -=== {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions ===
43 -
44 -Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad:
45 -
46 -{{code}}
47 -my-partitions
28 +spyder&
48 48  {{/code}}
49 49  
50 -Your result will look something like this:
31 +\\
51 51  
52 -[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2" alt="image2021-5-4_10-28-14.png"]]
33 +Spyder:
53 53  
54 -== {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM ==
35 +Open RecoGUI,
55 55  
56 -The access to the resources of the cluster is managed via a scheduler, SLURM.
37 +(Right click on tab: "Set console working directory") (to be removed)
57 57  
58 -SLURM schedules the access to nodes and can revokes access if higher priority jobs come.
39 +Green Arrow to start program
59 59  
60 -=== {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition ===
41 +\\
61 61  
62 -Here you cannot be kicked out of your allocation. However, only few nodes are in this partition and you can also only allocate few in parallel (2021: 5). Some of them have GPUs available.
43 += {{id name="00-HowtologintoMaxwell-LongVersion:"/}}**Long Version: ** =
63 63  
64 -=== {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition ===
45 +\\
65 65  
66 -Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes.
47 +**Login to max-nova**: E.g. from browser [[https:~~/~~/max-nova.desy.de:3443/>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]]
67 67  
68 -=== {{id name="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition ===
49 +\\
69 69  
70 -Like all, but with GPUs
51 +Click on "**Launch Session**" and "**XFCE**" Icon
71 71  
72 -=== {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition ===
53 +[[image:attach:image2021-4-27_13-55-52.png||height="250"]]
73 73  
74 -For Jupyter Hub
75 -
76 76  \\
77 77  
78 -== {{id name="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}Connecting to the Cluster ==
57 +**Open a Terminal**, e.g. from the icon at the bottom of your desktop. You can also open it via right click → "Open Terminal here" directly on your desktop or from any folder.
79 79  
80 -Connect to an entrance node via FastX. You will automatically be assigned to a node when you start a session via a load balancer (max-display001-003, max-nova001-002)
59 +[[image:attach:image2021-4-27_13-58-35.png||height="250"]]
81 81  
82 -[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-4-27_13-55-52.png||queryString="version=1&modificationDate=1619524552546&api=v2" alt="image2021-4-27_13-55-52.png"]]
83 -
84 -Choose a graphic interface and look around.
85 -
86 86  \\
87 87  
88 -== {{id name="00-HowtologintoMaxwell-DataStorage"/}}Data Storage ==
63 +Now you can **allocate a node** for yourself, so you will have enough memory and power for your reconstruction.
89 89  
90 -The Maxwell cluster knows many storage systems. The most important are:
65 +(% class="code" %)
66 +(((
67 +salloc ~-~-partition=psx ~-~-nodes=1 ~-~-time=06:00:00
68 +)))
91 91  
92 -Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this.
70 +\\
93 93  
94 -The GPFS: here all the beamtime data are stored.
72 +You will get a node for 6 hours, you can also choose longer or shorter times.
95 95  
96 -=== {{id name="00-HowtologintoMaxwell-GPFS"/}}GPFS ===
74 +It can take some time before you get a node, then it will tell you which node is reserved for you. (Example: max-exfl069)
97 97  
98 -Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id>
76 +\\
99 99  
100 -In there you will find a substructure:
78 +Now you can  **login via ssh** on this node:
101 101  
102 -* raw: raw measurement data. Only applicant and beamtime leader can write/delete there
103 -* processed: for all processed data
104 -* scratch_cc: scratch folder w/o backup
105 -* shared: for everything else
80 +(% class="code" %)
81 +(((
82 +ssh max-exfl069
83 +)))
106 106  
107 -The GPFS has regular snapshots. The whole capacity of this is huge (several PB)
85 +Enter your password.
108 108  
109 -== {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node ==
87 +\\
110 110  
111 -If you want to do some processing, there are two ways to start a job in SLURM:
89 +EXAMPLE:
112 112  
113 -1. Interactive
114 -1. Batch
91 +[[image:attach:image2021-4-27_13-52-11.png||height="125"]]
115 115  
116 -In both cases you are the only person working on the node, so use it as much as you like.
93 +Hint: Please use partition=psx, if you use =all, the connection might close while you are working if someone with higher priority needs the node you are working on.
117 117  
118 -=== {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job ===
95 +\\
119 119  
120 -To get a node you have to allocate one via SLURM e.g. use:
97 +Now you are on a different node [[image:http://confluence.desy.de/s/de_DE/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/_/images/icons/emoticons/smile.svg||title="(Lächeln)" border="0" class="emoticon emoticon-smile"]].
121 121  
122 -{{code}}
123 -salloc -N 1 -p psx -t 1-05:00:00
124 -{{/code}}
99 +\\
125 125  
126 -Looking at the individual options:
101 +You first have to **load the anaconda module:**
127 127  
128 -* salloc: specifies you want a live allocation
129 -* -N 1: for one node
130 -* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all
131 -* -t 1-05:00:00: for the duration of 1 day and 5h
132 -* Other options could be: ~-~-mem=500GB with at least 500GB of memory
133 -* ... see the SLURM documentation for more options
103 +(% class="code" %)
104 +(((
105 +module load anaconda/3
106 +\\
107 +)))
134 134  
135 -If your job is scheduled you see your assigned node and can connect via ssh to it. (in the rare case where you do not see anything use my-jobs to find out the host name).
109 +and **activate your virtual environment**, depending on where you installed it:
136 136  
137 -=== {{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Starting a batch job ===
111 +(% class="code" %)
112 +(((
113 +source activate ~~/envs/tomopy
114 +)))
138 138  
139 -For a batch job you need a small shell script describing what you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk)
116 +\\
140 140  
141 -With a batch job, you can also start an array job, where the same task is executed on multiple servers in parallel.
118 +~~/ takes you back to your home directory. In this case, the environment "tomopy" was installed in the home directory in the folder "envs". 
142 142  
143 -An example for such a script:
120 +\\
144 144  
145 -{{code}}
146 -#!/bin/bash
147 -#SBATCH --time 0-01:00:00
148 -#SBATCH --nodes 1
149 -#SBATCH --partition all,ps
150 -#SBATCH --array 1-80
151 -#SBATCH --mem 250GB
152 -#SBATCH --job-name ExampleScript
122 +now you can **start spyder**:
153 153  
124 +(% class="code" %)
125 +(((
126 +spyder&
127 +)))
154 154  
155 -source /etc/profile.d/modules.sh
156 -echo "SLURM_JOB_ID $SLURM_JOB_ID"
157 -echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID"
158 -echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID"
159 -echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT"
160 -echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX"
161 -echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN"
129 +\\
162 162  
163 -module load maxwell gcc/8.2
131 +EXAMPLE: (virtual environment in "envs/p36"
164 164  
165 -.local/bin/ipython3 --pylab=qt5 PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID
133 +[[image:attach:image2021-4-27_13-53-35.png||height="71"]]
166 166  
167 -exit
168 -
169 -
170 -{{/code}}
171 -
172 172  \\
173 173  
174 -To run this use
137 +You can also start another terminal e.g. if you want to look at your data / reconstructions in fiji.
175 175  
176 -{{code}}
177 -sbatch ./your_script.sh
178 -{{/code}}
179 -
180 180  \\
181 181  
182 -=== {{id name="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewing you allocations ===
141 +Hint: You can check your partition via
183 183  
184 -To view your pending or running allocations you can use:
143 +(% class="code" %)
144 +(((
145 +my-partitions
146 +)))
185 185  
186 -{{code}}
187 -squeue -u <username>
148 +You should have access to psx.
188 188  
189 -or
150 +[[image:attach:image2021-5-4_10-28-14.png]]
190 190  
191 -my-jobs
192 -{{/code}}
193 -
194 194  \\
195 195  
196 -=== {{id name="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}What is realistic in terms of resources ===
154 +For further information also check:
197 197  
198 -To be fair, you will not get 100 nodes every time you want them. Especially during a user run, the machines are often quite busy. But if you design your scripts to be tolerant to sudden cancellation, it is still worth trying if you profit from massive parallelization.
156 +[[doc:IS.Maxwell.WebHome]]
199 199  
200 -If you want to do some small processing, use one of the psx nodes. This should work most of the time.
201 -
202 202  \\
203 203  
204 -== {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}Granting Data Access to other Beamtimes ==
205 -
206 -If you have to add other users to a past beamtime, this can be done via the gamma-portal. After adding the accounts, these people have to make sure to log off from **all **FastX sessions, etc. to update the permissions.
207 -
208 208  \\
209 -
210 -\\
211 -
212 -\\
Confluence.Code.ConfluencePageClass[0]
Id
... ... @@ -1,1 +1,1 @@
1 -305838808
1 +235027533
Title
... ... @@ -1,1 +1,1 @@
1 -00 - How to login to Maxwell
1 +02 - How to login to Maxwell and start RecoGUI
URL
... ... @@ -1,1 +1,1 @@
1 -https://confluence.desy.de/spaces/P5I/pages/305838808/00 - How to login to Maxwell
1 +https://confluence.desy.de/spaces/P5I/pages/235027533/02 - How to login to Maxwell and start RecoGUI