Changes for page 00 - How to login to Maxwell
Last modified by flenners on 2025-06-24 16:56
Summary
-
Page properties (2 modified, 0 added, 0 removed)
-
Objects (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 - 00 -How tologintoMaxwell1 +How to start the RecoGUI - Content
-
... ... @@ -1,217 +1,126 @@ 1 - TheDESY has a quite powerful compute cluster calledthe Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.MaxwellCluster.WebHome||shape="rect"]], howeverasthis can be confusing sometimes, we will trytocondensate this toastep by step manual.1 += {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}**Short Version: ** = 2 2 3 +Terminal: 3 3 5 +(% class="code" %) 6 +((( 7 +salloc ~-~-partition=all ~-~-nodes=1 –time=06:00:00 8 +\\ssh max-bla123 9 +\\module load anaconda 10 +\\source activate ~~/envs/tomopy 11 +\\spyder& 12 +))) 4 4 5 - {{toc/}}14 +\\ 6 6 7 - = {{idname="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =16 +Spyder: 8 8 9 - During you beamtime you will encounter multiplesystems, where you willneedtwo different types of accounts:18 +Open RecoGUI, 10 10 11 - =={{idname="00-HowtologintoMaxwell-TheDOORAccount"/}}TheDOOR Account==20 +(Right click on tab: "Set console working directory") (to be removed) 12 12 13 - Beforeyou arriveyou have to create a DOOR accountand do all the safety trainings. This account is also being used forthe gamma-portal,whereyou can manage you beamtime data, grant access toother users and manage FTP access. Howeverthisaccount does not work withotherresources. For this you will have to request a second account:22 +Green Arrow to start program 14 14 15 - == {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account ==24 +\\ 16 16 17 - Ifyou decideduringabeamtime, youwanttohave access to the cluster, tell your local contact so,and theywillrequest a PSX account for you. With this you willget access to the Kerberos, Windows andafs resourcesat DESY, which includesthecluster.26 += {{id name="00-HowtologintoMaxwell-LongVersion:"/}}**Long Version: ** = 18 18 19 - = {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =28 +\\ 20 20 21 - == {{idname="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structure of the Cluster==30 +**Login to max-nova**: E.g. from browser [[https:~~/~~/max-nova.desy.de:3443/>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] 22 22 23 - === {{id name="00-HowtologintoMaxwell-Overview"/}}Overview ===32 +\\ 24 24 25 - The MaxwellCluster has (status 2021) more than 750 nodesinit. Toorganizethis, you cannot access any node directly, but youhaveto request compute resources at first. You thencanconnect form an entrance nodetoyoucomputenode34 +Click on "**Launch Session**" and "**XFCE**" Icon 26 26 27 - === {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes===36 +[[image:attach:image2021-4-27_13-55-52.png||height="250"]] 28 28 29 -If you have successfully obtained an PSX account you can get started. The entrance node are: 30 -\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon) 38 +\\ 31 31 32 - [[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]](in anycase)40 +**Open a Terminal**, e.g. from the icon at the bottom of your desktop. You can also open it via right click → "Open Terminal here" directly on your desktop or from any folder. 33 33 34 - These nodes are **not **for processing, as you will share themwith many other users. So please do notdo anything computational intensive on them, like reconstruction or visualization.Viewingimagesis ok.42 +[[image:attach:image2021-4-27_13-58-35.png||height="250"]] 35 35 36 - === {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 ===44 +\\ 37 37 38 - The cluster uses the softwareFastX2 forconnectionandvirtualdesktop. To gettherightversionofthis,usethe web interface,login,andinthebottom rightcornerisadownloadlinkforthe desktopclient. The versionhasto match exactly towork properly.46 +Now you can **allocate a node** for yourself, so you will have enough memory and power for your reconstruction. 39 39 40 -If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal. 48 +(% class="code" %) 49 +((( 50 +salloc ~-~-partition=all ~-~-nodes=1 ~-~-time=06:00:00 51 +))) 41 41 42 - === {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions ===53 +\\ 43 43 44 - Starting from an entrance node, youcanconnecttoacomputenode.As there are multiple levels ofprioritiesetc.the nodes are organizesin partitions. You canonlyaccess someof these.Toview which one,open a terminaland use thecommad:55 +You will get a node for 6 hours, you can also choose longer or shorter times. 45 45 46 -{{code}} 47 -my-partitions 48 -{{/code}} 57 +It can take some time before you get a node, then it will tell you which node is reserved for you. (Example: max-exfl069) 49 49 50 - Your result will look something like this:59 +\\ 51 51 52 - [[image:attach:P5I.User Guide\:NanoCT.4\.ReconstructionGuide.00 - How tologinto Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2"alt="image2021-5-4_10-28-14.png"]]61 +Now you can **login via ssh** on this node: 53 53 54 -== {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM == 63 +(% class="code" %) 64 +((( 65 +ssh max-exfl069 66 +))) 55 55 56 - The accessto theresourcesof the clusteris managed via ascheduler, SLURM.68 +Enter your password. 57 57 58 - SLURM schedules the access to nodes and can revokes access if higher priority jobs come.70 +\\ 59 59 60 - === {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSXPartition ===72 +EXAMPLE: 61 61 62 - Here you cannot be kicked out of yourallocation. However, only few nodesare inthis partitionand youcan also only allocate fewin parallel (2021:5).Some of them have GPUs available.74 +[[image:attach:image2021-4-27_13-52-11.png||height="125"]] 63 63 64 - === {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition ===76 +\\ 65 65 66 - Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes.78 +\\ 67 67 68 - ==={{idname="00-HowtologintoMaxwell-AllgpuPartition"/}}AllgpuPartition===80 +Now you are on a different node [[image:http://confluence.desy.de/s/de_DE/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/_/images/icons/emoticons/smile.svg||title="(Lächeln)" border="0" class="emoticon emoticon-smile"]]. 69 69 70 -Like all, but with GPUs 71 - 72 -=== {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition === 73 - 74 -For Jupyter Hub 75 - 76 76 \\ 77 77 78 - =={{idname="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}Connectingtothe Cluster==84 +You first have to **load the anaconda module:** 79 79 80 -Connect to an entrance node via FastX. You will automatically be assigned to a node when you start a session via a load balancer (max-display001-003, max-nova001-002) 81 - 82 -[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-4-27_13-55-52.png||queryString="version=1&modificationDate=1619524552546&api=v2" alt="image2021-4-27_13-55-52.png"]] 83 - 84 -Choose a graphic interface and look around. 85 - 86 +(% class="code" %) 87 +((( 88 +module load anaconda/3 86 86 \\ 90 +))) 87 87 88 - == {{idname="00-HowtologintoMaxwell-DataStorage"/}}DataStorage==92 +and **activate your virtual environment**, depending on where you installed it: 89 89 90 -The Maxwell cluster knows many storage systems. The most important are: 91 - 92 -Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this. 93 - 94 -The GPFS: here all the beamtime data are stored. 95 - 96 -=== {{id name="00-HowtologintoMaxwell-GPFS"/}}GPFS === 97 - 98 -Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id> 99 - 100 -In there you will find a substructure: 101 - 102 -* raw: raw measurement data. Only applicant and beamtime leader can write/delete there 103 -* processed: for all processed data 104 -* scratch_cc: scratch folder w/o backup 105 -* shared: for everything else 106 - 107 -The GPFS has regular snapshots. The whole capacity of this is huge (several PB) 108 - 109 -== {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node == 110 - 111 -If you want to do some processing, there are two ways to start a job in SLURM: 112 - 113 -1. Interactive 114 -1. Batch 115 - 116 -In both cases you are the only person working on the node, so use it as much as you like. 117 - 118 -=== {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job === 119 - 120 -To get a node you have to allocate one via SLURM e.g. use: 121 - 122 -{{code}} 123 -salloc -N 1 -p psx -t 1-05:00:00 124 -{{/code}} 125 - 126 -Looking at the individual options: 127 - 128 -* salloc: specifies you want a live allocation 129 -* -N 1: for one node 130 -* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all 131 -* -t 1-05:00:00: for the duration of 1 day and 5h 132 -* ((( 133 -Other options could be: ~-~-mem=500GB with at least 500GB of memory, 134 - 135 135 (% class="code" %) 136 136 ((( 137 - if youneedgpu: (% class="bash plain" %){{codelanguage="none"}}--constraint=P100{{/code}}96 +source activate ~~/envs/tomopy 138 138 ))) 139 -))) 140 -* ... see the SLURM documentation for more options 141 141 142 - If your job is scheduled you see your assigned node and can connect via ssh to it. (in the rare case where you do not see anything use my-jobs to find out the host name).99 +\\ 143 143 144 - ==={{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Startinga batchjob===101 +~~/ takes you back to your home directory. In this case, the environment "tomopy" was installed in the home directory in the folder "envs". 145 145 146 -For a batch job you need a small shell script describing what you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk) 147 - 148 -With a batch job, you can also start an array job, where the same task is executed on multiple servers in parallel. 149 - 150 -An example for such a script: 151 - 152 -{{code}} 153 -#!/bin/bash 154 -#SBATCH --time 0-01:00:00 155 -#SBATCH --nodes 1 156 -#SBATCH --partition all,ps 157 -#SBATCH --array 1-80 158 -#SBATCH --mem 250GB 159 -#SBATCH --job-name ExampleScript 160 - 161 - 162 -source /etc/profile.d/modules.sh 163 -echo "SLURM_JOB_ID $SLURM_JOB_ID" 164 -echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID" 165 -echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID" 166 -echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT" 167 -echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX" 168 -echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN" 169 - 170 -module load maxwell gcc/8.2 171 - 172 -.local/bin/ipython3 --pylab=qt5 PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID 173 - 174 -exit 175 - 176 - 177 -{{/code}} 178 - 179 179 \\ 180 180 181 - Torunthisuse105 +now you can **start spyder**: 182 182 183 -{{code}} 184 -sbatch ./your_script.sh 185 -{{/code}} 107 +(% class="code" %) 108 +((( 109 +spyder& 110 +))) 186 186 187 187 \\ 188 188 189 - ==={{id name="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewing you allocations===114 +EXAMPLE: (virtual environment in "envs/p36" 190 190 191 - To view your pendingor runningallocations youcanuse:116 +[[image:attach:image2021-4-27_13-53-35.png||height="71"]] 192 192 193 -{{code}} 194 -squeue -u <username> 195 - 196 -or 197 - 198 -my-jobs 199 -{{/code}} 200 - 201 201 \\ 202 202 203 - ==={{idname="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}Whatisrealisticintermsofresources===120 +You can also start another terminal e.g. if you want to look at your data / reconstructions in fiji. 204 204 205 -To be fair, you will not get 100 nodes every time you want them. Especially during a user run, the machines are often quite busy. But if you design your scripts to be tolerant to sudden cancellation, it is still worth trying if you profit from massive parallelization. 206 - 207 -If you want to do some small processing, use one of the psx nodes. This should work most of the time. 208 - 209 209 \\ 210 210 211 -== {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}Granting Data Access to other Beamtimes == 212 - 213 -If you have to add other users to a past beamtime, this can be done via the gamma-portal. After adding the accounts, these people have to make sure to log off from **all **FastX sessions, etc. to update the permissions. 214 - 215 215 \\ 216 216 217 217 \\
- Confluence.Code.ConfluencePageClass[0]
-
- Id
-
... ... @@ -1,1 +1,1 @@ 1 - 3614573071 +204941724 - Title
-
... ... @@ -1,1 +1,1 @@ 1 - 00 -How tologintoMaxwell1 +How to start the RecoGUI - URL
-
... ... @@ -1,1 +1,1 @@ 1 -https://confluence.desy.de/spaces/P5I/pages/ 361457307/00 -How tologintoMaxwell1 +https://confluence.desy.de/spaces/P5I/pages/204941724/How to start the RecoGUI