Changes for page 00 - How to login to Maxwell
Last modified by flenners on 2025-06-24 16:56
Summary
-
Page properties (3 modified, 0 added, 0 removed)
-
Objects (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 -0 0- How to login to Maxwell1 +02 - How to login to Maxwell and start RecoGUI - Author
-
... ... @@ -1,1 +1,1 @@ 1 -XWiki. flenners1 +XWiki.greving - Content
-
... ... @@ -1,223 +1,160 @@ 1 - TheDESY has a quite powerful compute cluster calledthe Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.MaxwellCluster.WebHome||shape="rect"]], howeverasthis can be confusing sometimes, we will trytocondensate this toastep by step manual.1 += {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}**Short Version: ** = 2 2 3 +Terminal: 3 3 5 +(% class="code" %) 6 +((( 7 +salloc ~-~-partition=psx ~-~-nodes=1 –-time=06:00:00 8 +\\(if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}}(%%) ) 9 +\\ssh max-bla123 10 +\\module load anaconda 11 +\\source activate ~~/envs/tomopy 12 +\\spyder& 13 +))) 4 4 5 - {{toc/}}15 +\\ 6 6 7 -= {{id name="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account = 8 - 9 -During you beamtime you will encounter multiple systems, where you will need two different types of accounts: 10 - 11 -== {{id name="00-HowtologintoMaxwell-TheDOORAccount"/}}The DOOR Account == 12 - 13 -Before you arrive you have to create a DOOR account and do all the safety trainings. This account is also being used for the gamma-portal, where you can manage you beamtime data, grant access to other users and manage FTP access. However this account does not work with other resources. For this you will have to request a second account: 14 - 15 -== {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account == 16 - 17 -If you decide during a beamtime, you want to have access to the cluster, tell your local contact so, and they will request a PSX account for you. With this you will get access to the Kerberos, Windows and afs resources at DESY, which includes the cluster. 18 - 19 19 \\ 20 20 21 -After you got the account, you have to change the initial password within 6 days. For this, go to [[https:~~/~~/passwd.desy.de/>>url:https://passwd.desy.de/||shape="rect"]] and log in with your user name and initial password (you do not need any OTP when you sign in for the first time). Then agree to the terms and change your password. 19 +{{code linenumbers="true" collapse="true"}} 20 +salloc --partition=psx --nodes=1 –-time=06:00:00 22 22 23 - ={{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =22 +ssh max-bla123 24 24 25 - == {{id name="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structureofthe Cluster ==24 +module load anaconda 26 26 27 - ==={{id name="00-HowtologintoMaxwell-Overview"/}}Overview ===26 +source activate ~/envs/tomopy 28 28 29 -The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node 30 - 31 -=== {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes === 32 - 33 -If you have successfully obtained an PSX account you can get started. The entrance node are: 34 -\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon) 35 - 36 -[[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]] (in any case) 37 - 38 -These nodes are **not **for processing, as you will share them with many other users. So please do not do anything computational intensive on them, like reconstruction or visualization. Viewing images is ok. 39 - 40 -=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 === 41 - 42 -The cluster uses the software FastX2 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly. 43 - 44 -If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal. 45 - 46 -=== {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions === 47 - 48 -Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad: 49 - 50 -{{code}} 51 -my-partitions 28 +spyder& 52 52 {{/code}} 53 53 54 - Your result will look something like this:31 +\\ 55 55 56 - [[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2" alt="image2021-5-4_10-28-14.png"]]33 +Spyder: 57 57 58 - == {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM==35 +Open RecoGUI, 59 59 60 - Theaccessto theresourcesoftheclusterismanagedvia a scheduler, SLURM.37 +(Right click on tab: "Set console working directory") (to be removed) 61 61 62 - SLURM schedulestheaccesstonodesand canrevokesaccess if higherpriority jobs come.39 +Green Arrow to start program 63 63 64 - === {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition ===41 +\\ 65 65 66 - Hereyou cannot be kicked outof your allocation. However,only few nodes are in thispartitionand you canalsoonly allocate few in parallel (2021:5).Some of them have GPUs available.43 += {{id name="00-HowtologintoMaxwell-LongVersion:"/}}**Long Version: ** = 67 67 68 - === {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition ===45 +\\ 69 69 70 - Very large numberof nodes available andyoucanallocate many in parallel (2021:100).Howevereach allocationcanberevokedwithoutawarning if s.o.with higher priority comes.This is very commontohappen.Ifyou want tousethispartition, besure to design your jobaccordingly. Only CPU nodes.47 +**Login to max-nova**: E.g. from browser [[https:~~/~~/max-nova.desy.de:3443/>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] 71 71 72 - === {{id name="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition ===49 +\\ 73 73 74 - Likeall,but withGPUs51 +Click on "**Launch Session**" and "**XFCE**" Icon 75 75 76 - === {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition===53 +[[image:attach:image2021-4-27_13-55-52.png||height="250"]] 77 77 78 -For Jupyter Hub 79 - 80 80 \\ 81 81 82 - =={{idname="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}ConnectingtotheCluster==57 +**Open a Terminal**, e.g. from the icon at the bottom of your desktop. You can also open it via right click → "Open Terminal here" directly on your desktop or from any folder. 83 83 84 - Connect to an entrance node via FastX. You will automatically be assigned toanode when you startasession via a load balancer (max-display001-003, max-nova001-002)59 +[[image:attach:image2021-4-27_13-58-35.png||height="250"]] 85 85 86 -[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-4-27_13-55-52.png||queryString="version=1&modificationDate=1619524552546&api=v2" alt="image2021-4-27_13-55-52.png"]] 87 - 88 -Choose a graphic interface and look around. 89 - 90 90 \\ 91 91 92 - =={{idname="00-HowtologintoMaxwell-DataStorage"/}}DataStorage==63 +Now you can **allocate a node** for yourself, so you will have enough memory and power for your reconstruction. 93 93 94 -The Maxwell cluster knows many storage systems. The most important are: 65 +(% class="code" %) 66 +((( 67 +salloc ~-~-partition=psx ~-~-nodes=1 ~-~-time=06:00:00 68 +))) 95 95 96 - Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this.70 +\\ 97 97 98 - TheGPFS:hereallthebeamtimedata arestored.72 +You will get a node for 6 hours, you can also choose longer or shorter times. 99 99 100 - ==={{idname="00-HowtologintoMaxwell-GPFS"/}}GPFS ===74 +It can take some time before you get a node, then it will tell you which node is reserved for you. (Example: max-exfl069) 101 101 102 - Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id>76 +\\ 103 103 104 - Inthereyouwillfinda substructure:78 +Now you can **login via ssh** on this node: 105 105 106 - *raw: raw measurement data. Only applicant and beamtimeleadercan write/deletethere107 - * processed: for all processed data108 - *scratch_cc:scratchfolder w/o backup109 - * shared: for everything else80 +(% class="code" %) 81 +((( 82 +ssh max-exfl069 83 +))) 110 110 111 - The GPFS has regular snapshots. Thewholecapacity of thisishuge (severalPB)85 +Enter your password. 112 112 113 - == {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node ==87 +\\ 114 114 115 - If you want to do some processing, there are two ways to start a job in SLURM:89 +EXAMPLE: 116 116 117 -1. Interactive 118 -1. Batch 91 +[[image:attach:image2021-4-27_13-52-11.png||height="125"]] 119 119 120 - Inbothcases youare the onlypersonworking on thenode,souseitasmuchasyoulike.93 +Hint: Please use partition=psx, if you use =all, the connection might close while you are working if someone with higher priority needs the node you are working on. 121 121 122 - === {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job ===95 +\\ 123 123 124 - Togeta nodeyou havetoallocateonevia SLURMe.g.use:97 +Now you are on a different node [[image:http://confluence.desy.de/s/de_DE/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/_/images/icons/emoticons/smile.svg||title="(Lächeln)" border="0" class="emoticon emoticon-smile"]]. 125 125 126 -{{code}} 127 -salloc -N 1 -p psx -t 1-05:00:00 128 -{{/code}} 99 +\\ 129 129 130 - Lookingat theindividualoptions:101 +You first have to **load the anaconda module:** 131 131 132 -* salloc: specifies you want a live allocation 133 -* -N 1: for one node 134 -* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all 135 -* -t 1-05:00:00: for the duration of 1 day and 5h 136 -* ((( 137 -Other options could be: ~-~-mem=500GB with at least 500GB of memory, 138 - 139 139 (% class="code" %) 140 140 ((( 141 -if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}} 105 +module load anaconda/3 106 +\\ 142 142 ))) 143 -))) 144 -* ... see the SLURM documentation for more options 145 145 146 - Ifyourjobis scheduledyou see your assignednodeandcanconnectvia ssh to it. (in the rare casewhere youdo not see anything use my-jobsto findoutthe host name).109 +and **activate your virtual environment**, depending on where you installed it: 147 147 148 -=== {{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Starting a batch job === 111 +(% class="code" %) 112 +((( 113 +source activate ~~/envs/tomopy 114 +))) 149 149 150 - For a batch job you need a small shell script describing what you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk)116 +\\ 151 151 152 - Witha batchjob,youcan alsostartan arrayjob,wherethesame taskisexecutedonmultipleserversinparallel.118 +~~/ takes you back to your home directory. In this case, the environment "tomopy" was installed in the home directory in the folder "envs". 153 153 154 - An example for such a script:120 +\\ 155 155 156 -{{code}} 157 -#!/bin/bash 158 -#SBATCH --time 0-01:00:00 159 -#SBATCH --nodes 1 160 -#SBATCH --partition all,ps 161 -#SBATCH --array 1-80 162 -#SBATCH --mem 250GB 163 -#SBATCH --job-name ExampleScript 122 +now you can **start spyder**: 164 164 124 +(% class="code" %) 125 +((( 126 +spyder& 127 +))) 165 165 166 -source /etc/profile.d/modules.sh 167 -echo "SLURM_JOB_ID $SLURM_JOB_ID" 168 -echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID" 169 -echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID" 170 -echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT" 171 -echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX" 172 -echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN" 129 +\\ 173 173 174 - moduleloadmaxwellgcc/8.2131 +EXAMPLE: (virtual environment in "envs/p36" 175 175 176 - .local/bin/ipython3--pylab=qt5PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID133 +[[image:attach:image2021-4-27_13-53-35.png||height="71"]] 177 177 178 -exit 179 - 180 - 181 -{{/code}} 182 - 183 183 \\ 184 184 185 - Torun thisuse137 +You can also start another terminal e.g. if you want to look at your data / reconstructions in fiji. 186 186 187 -{{code}} 188 -sbatch ./your_script.sh 189 -{{/code}} 190 - 191 191 \\ 192 192 193 - === {{id name="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewingyou allocations===141 +Hint: You can check your partition via 194 194 195 -To view your pending or running allocations you can use: 143 +(% class="code" %) 144 +((( 145 +my-partitions 146 +))) 196 196 197 -{{code}} 198 -squeue -u <username> 148 +You should have access to psx. 199 199 200 - or150 +[[image:attach:image2021-5-4_10-28-14.png]] 201 201 202 -my-jobs 203 -{{/code}} 204 - 205 205 \\ 206 206 207 - === {{id name="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}Whatisrealistic in termsofresources===154 +For further information also check: 208 208 209 - To be fair, you will not get 100 nodes every time you want them. Especially during a user run, the machines are often quite busy.But if you design your scripts tobe tolerant to sudden cancellation, it is still worth trying if you profit frommassiveparallelization.156 +[[doc:IS.Maxwell.WebHome]] 210 210 211 -If you want to do some small processing, use one of the psx nodes. This should work most of the time. 212 - 213 213 \\ 214 214 215 -== {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}Granting Data Access to other Beamtimes == 216 - 217 -If you have to add other users to a past beamtime, this can be done via the gamma-portal. After adding the accounts, these people have to make sure to log off from **all **FastX sessions, etc. to update the permissions. 218 - 219 219 \\ 220 - 221 -\\ 222 - 223 -\\
- Confluence.Code.ConfluencePageClass[0]
-
- Id
-
... ... @@ -1,1 +1,1 @@ 1 -3 614573321 +235027533 - Title
-
... ... @@ -1,1 +1,1 @@ 1 -0 0- How to login to Maxwell1 +02 - How to login to Maxwell and start RecoGUI - URL
-
... ... @@ -1,1 +1,1 @@ 1 -https://confluence.desy.de/spaces/P5I/pages/3 61457332/00- How to login to Maxwell1 +https://confluence.desy.de/spaces/P5I/pages/235027533/02 - How to login to Maxwell and start RecoGUI