Changes for page 00 - How to login to Maxwell
Last modified by flenners on 2025-06-24 16:56
Summary
-
Page properties (2 modified, 0 added, 0 removed)
-
Objects (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Author
-
... ... @@ -1,1 +1,1 @@ 1 -XWiki.r iedelmi1 +XWiki.greving - Content
-
... ... @@ -1,268 +1,103 @@ 1 - TheDESY has a quite powerful compute cluster calledthe Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.MaxwellCluster.WebHome||shape="rect"]], howeverasthis can be confusing sometimes, we will trytocondensate this toastep by step manual.1 += {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}**Short Version: ** = 2 2 3 +Terminal: 3 3 5 +(% class="code" %) 6 +((( 7 +salloc ~-~-partition=psx ~-~-nodes=1 –-time=06:00:00 8 +\\(if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}}(%%) ) 9 +\\ssh max-bla123 10 +\\module load anaconda 11 +\\source activate ~~/envs/tomopy 12 +\\spyder& 13 +))) 4 4 5 - {{toc/}}15 +\\ 6 6 7 - = {{id name="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =17 +\\ 8 8 9 -During you beamtime you will encounter multiple systems, where you will need two different types of accounts: 19 +{{code linenumbers="true" collapse="true"}} 20 +salloc --partition=psx --nodes=1 –-time=06:00:00 10 10 11 - =={{id name="00-HowtologintoMaxwell-TheDOORAccount"/}}The DOOR Account ==22 +ssh max-bla123 12 12 13 - Before you arrive you have to create a DOOR account (Institution: Physik Department E17) and doall the safety trainings. This account is also beingused for the gamma-portal, where you can manage you beamtimedata,grantaccess toother users andmanage FTP access. However this account does not work with other resources. For this you will have to request a second account:24 +module load anaconda 14 14 15 - =={{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}ThePSX Account==26 +source activate ~/envs/tomopy 16 16 17 -If you decide during a beamtime, you want to have access to the cluster, tell your local contact so, and they will request a PSX account for you. With this you will get access to the Kerberos, Windows and afs resources at DESY, which includes the cluster. 18 - 19 -= {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster = 20 - 21 -== {{id name="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structure of the Cluster == 22 - 23 -=== {{id name="00-HowtologintoMaxwell-Overview"/}}Overview === 24 - 25 -The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node 26 - 27 -=== {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes === 28 - 29 -If you have successfully obtained an PSX account you can get started. The entrance node are: 30 -\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon) 31 - 32 -[[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]] (in any case) 33 - 34 -These nodes are **not **for processing, as you will share them with many other users. So please do not do anything computational intensive on them, like reconstruction or visualization. Viewing images is ok. 35 - 36 -=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 === 37 - 38 -The cluster uses the software FastX2 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly. 39 - 40 -If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal. 41 - 42 -=== {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions === 43 - 44 -Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad: 45 - 46 -{{code}} 47 -my-partitions 28 +spyder& 48 48 {{/code}} 49 49 50 -Your result will look something like this: 51 - 52 -[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2" alt="image2021-5-4_10-28-14.png"]] 53 - 54 -== {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM == 55 - 56 -The access to the resources of the cluster is managed via a scheduler, SLURM. 57 - 58 -SLURM schedules the access to nodes and can revokes access if higher priority jobs come. 59 - 60 -=== {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition === 61 - 62 -Here you cannot be kicked out of your allocation. However, only few nodes are in this partition and you can also only allocate few in parallel (2021: 5). Some of them have GPUs available. 63 - 64 -=== {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition === 65 - 66 -Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes. 67 - 68 -=== {{id name="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition === 69 - 70 -Like all, but with GPUs 71 - 72 -=== {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition === 73 - 74 -For Jupyter Hub 75 - 76 -See also the section on python. 77 - 78 -=== {{id name="00-HowtologintoMaxwell-HzgPartition"/}}Hzg Partition === 79 - 80 -Ask someone from hereon how to use them (e.g. [[Riedel, Mirko>>url:https://wiki.tum.de/display/~~ga78nig||shape="rect"]] ) 81 - 82 82 \\ 83 83 84 - == {{idname="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}Connecting to the Cluster ==33 +Spyder: 85 85 86 - Connect to an entrance node via FastX. You will automatically be assigned to a node when you start a session via a load balancer (max-display001-003,max-nova001-002)35 +Open RecoGUI, 87 87 88 - [[image:attach:P5I.UserGuide\:NanoCT.4\. ReconstructionGuide.00 - HowtologintoMaxwell.WebHome@image2021-4-27_13-55-52.png||queryString="version=1&modificationDate=1619524552546&api=v2"alt="image2021-4-27_13-55-52.png"]]37 +(Right click on tab: "Set console working directory") (to be removed) 89 89 90 - Choose a graphic interfaceandlookaround.39 +Green Arrow to start program 91 91 92 92 \\ 93 93 94 -= ={{id name="00-HowtologintoMaxwell-DataStorage"/}}Data Storage ==43 += {{id name="00-HowtologintoMaxwell-LongVersion:"/}}**Long Version: ** = 95 95 96 -The Maxwell cluster knows many storage systems. The most important are: 97 - 98 -Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this. 99 - 100 -The GPFS: here all the beamtime data are stored. 101 - 102 -=== {{id name="00-HowtologintoMaxwell-GPFS"/}}GPFS === 103 - 104 -Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id> 105 - 106 -In there you will find a substructure: 107 - 108 -* raw: raw measurement data. Only applicant and beamtime leader can write/delete there 109 -* processed: for all processed data 110 -* scratch_cc: scratch folder w/o backup 111 -* shared: for everything else 112 - 113 -The GPFS has regular snapshots. The whole capacity of this is huge (several PB) 114 - 115 -== {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node == 116 - 117 -If you want to do some processing, there are two ways to start a job in SLURM: 118 - 119 -1. Interactive 120 -1. Batch 121 - 122 -In both cases you are the only person working on the node, so use it as much as you like. 123 - 124 -=== {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job === 125 - 126 -To get a node you have to allocate one via SLURM e.g. use: 127 - 128 -{{code}} 129 -salloc -N 1 -p psx -t 1-05:00:00 130 -{{/code}} 131 - 132 -Looking at the individual options: 133 - 134 -* salloc: specifies you want a live allocation 135 -* -N 1: for one node 136 -* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all 137 -* -t 1-05:00:00: for the duration of 1 day and 5h 138 -* Other options could be: ~-~-mem=500GB with at least 500GB of memory 139 -* ... see the SLURM documentation for more options 140 - 141 -If your job is scheduled you see your assigned node and can connect via ssh to it. (in the rare case where you do not see anything use my-jobs to find out the host name). 142 - 143 -=== {{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Starting a batch job === 144 - 145 -For a batch job you need a small shell script describing what you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk) 146 - 147 -With a batch job, you can also start an array job, where the same task is executed on multiple servers in parallel. 148 - 149 -An example for such a script: 150 - 151 -{{code}} 152 -#!/bin/bash 153 -#SBATCH --time 0-01:00:00 154 -#SBATCH --nodes 1 155 -#SBATCH --partition all,ps 156 -#SBATCH --array 1-80 157 -#SBATCH --mem 250GB 158 -#SBATCH --job-name ExampleScript 159 - 160 - 161 -source /etc/profile.d/modules.sh 162 -echo "SLURM_JOB_ID $SLURM_JOB_ID" 163 -echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID" 164 -echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID" 165 -echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT" 166 -echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX" 167 -echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN" 168 - 169 -module load maxwell gcc/8.2 170 - 171 -.local/bin/ipython3 --pylab=qt5 PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID 172 - 173 -exit 174 - 175 - 176 -{{/code}} 177 - 178 178 \\ 179 179 180 - Torun thisuse47 +**Login to max-nova**: E.g. from browser [[https:~~/~~/max-nova.desy.de:3443/>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] 181 181 182 -{{code}} 183 -sbatch ./your_script.sh 184 -{{/code}} 185 - 186 186 \\ 187 187 188 - === {{idname="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewingyouallocations ===51 +Click on "**Launch Session**" and "**XFCE**" Icon 189 189 190 - To view your pendingor runningallocations youcanuse:53 +[[image:attach:image2021-4-27_13-55-52.png||height="250"]] 191 191 192 -{{code}} 193 -squeue -u <username> 194 - 195 -or 196 - 197 -my-jobs 198 -{{/code}} 199 - 200 200 \\ 201 201 202 - === {{idname="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}Whatisrealisticintermsofresources===57 +**Open a Terminal**, e.g. from the icon at the bottom of your desktop. You can also open it via right click → "Open Terminal here" directly on your desktop or from any folder. 203 203 204 - To be fair, you will notget 100 nodes every time you wantthem. Especially during a user run, the machinesareoften quite busy.But if you design your scripts to be tolerant to sudden cancellation, it is still worthtrying if you profit from massive parallelization.59 +[[image:attach:image2021-4-27_13-58-35.png||height="250"]] 205 205 206 -If you want to do some small processing, use one of the psx nodes. This should work most of the time. 207 - 208 208 \\ 209 209 210 - == {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}GrantingData Accessto otherBeamtimes==63 +Now you can **allocate a node** for yourself, so you will have enough memory and power for your reconstruction. 211 211 212 -If you have to add other users to a past beamtime, this can be done via the gamma-portal. After adding the accounts, these people have to make sure to log off from **all **FastX sessions, etc. to update the permissions. 213 - 214 -= {{id name="00-HowtologintoMaxwell-StartingarecoGUI"/}}Starting a reco GUI = 215 - 216 -== {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}Short Version: == 217 - 218 -Terminal: 219 - 220 220 (% class="code" %) 221 221 ((( 222 -salloc ~-~-partition=psx ~-~-nodes=1 –-time=06:00:00 223 -\\(if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}}(%%) ) 224 -\\ssh max-bla123 225 -\\module load anaconda 226 -\\source activate ~~/envs/tomopy 227 -\\spyder& 67 +salloc ~-~-partition=psx ~-~-nodes=1 ~-~-time=06:00:00 228 228 ))) 229 229 230 230 \\ 231 231 232 - \\72 +You will get a node for 6 hours, you can also choose longer or shorter times. 233 233 234 -{{code linenumbers="true" collapse="true"}} 235 -salloc --partition=psx --nodes=1 –-time=06:00:00 74 +It can take some time before you get a node, then it will tell you which node is reserved for you. (Example: max-exfl069) 236 236 237 - ssh max-bla12376 +\\ 238 238 239 - moduleloadanaconda78 +Now you can **login via ssh** on this node: 240 240 241 -source activate ~/envs/tomopy 80 +(% class="code" %) 81 +((( 82 +ssh max-exfl069 83 +))) 242 242 243 -spyder& 244 -{{/code}} 85 +Enter your password. 245 245 246 246 \\ 247 247 248 - Spyder:89 +EXAMPLE: 249 249 250 - OpenRecoGUI,91 +[[image:attach:image2021-4-27_13-52-11.png||height="125"]] 251 251 252 - (Rightclickontab:"Set console workingdirectory")(tobe removed)93 +Hint: Please use partition=psx, if you use =all, the connection might close while you are working if someone with higher priority needs the node you are working on. 253 253 254 -Green Arrow to start program 255 - 256 256 \\ 257 257 258 - =={{idname="00-HowtologintoMaxwell-LongerVersion"/}}LongerVersion==97 +Now you are on a different node [[image:http://confluence.desy.de/s/de_DE/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/_/images/icons/emoticons/smile.svg||title="(Lächeln)" border="0" class="emoticon emoticon-smile"]]. 259 259 260 260 \\ 261 261 262 -Login to maxwell and allocate a node from the psx partition. 263 - 264 -\\ 265 - 266 266 You first have to **load the anaconda module:** 267 267 268 268 (% class="code" %) ... ... @@ -303,4 +303,23 @@ 303 303 304 304 \\ 305 305 141 +Hint: You can check your partition via 142 + 143 +(% class="code" %) 144 +((( 145 +my-partitions 146 +))) 147 + 148 +You should have access to psx. 149 + 150 +[[image:attach:image2021-5-4_10-28-14.png]] 151 + 306 306 \\ 153 + 154 +For further information also check: 155 + 156 +[[doc:IS.Maxwell.WebHome]] 157 + 158 +\\ 159 + 160 +\\
- Confluence.Code.ConfluencePageClass[0]
-
- Id
-
... ... @@ -1,1 +1,1 @@ 1 -23502753 41 +235027533 - URL
-
... ... @@ -1,1 +1,1 @@ 1 -https://confluence.desy.de/spaces/P5I/pages/23502753 4/02 - How to login to Maxwell and start RecoGUI1 +https://confluence.desy.de/spaces/P5I/pages/235027533/02 - How to login to Maxwell and start RecoGUI