Changes for page 00 - How to login to Maxwell
Last modified by flenners on 2025-06-24 16:56
Summary
-
Page properties (2 modified, 0 added, 0 removed)
-
Attachments (0 modified, 0 added, 8 removed)
-
Objects (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 - 00 -How tologintoMaxwell1 +How to start the RecoGUI - Content
-
... ... @@ -1,214 +1,128 @@ 1 - TheDESY has a quite powerful compute cluster calledthe Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.MaxwellCluster.WebHome||shape="rect"]], howeverasthis can be confusing sometimes, we will trytocondensate this toastep by step manual.1 += {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}**Short Version: ** = 2 2 3 +Terminal: 3 3 5 +(% class="code" %) 6 +((( 7 +salloc ~-~-partition=all ~-~-nodes=1 –time=06:00:00 8 +\\ssh max-bla123 9 +\\module load anaconda 10 +\\source activate ~~/envs/tomopy 11 +\\spyder& 12 +))) 4 4 5 - {{toc/}}14 +\\ 6 6 7 - = {{idname="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =16 +Spyder: 8 8 9 - During you beamtime you will encounter multiplesystems, where you willneedtwo different types of accounts:18 +Open RecoGUI, 10 10 11 - =={{idname="00-HowtologintoMaxwell-TheDOORAccount"/}}TheDOOR Account==20 +(Right click on tab: "Set console working directory") (to be removed) 12 12 13 - Beforeyou arriveyou have to create a DOOR accountand do all the safety trainings. This account is also being used forthe gamma-portal,whereyou can manage you beamtime data, grant access toother users and manage FTP access. Howeverthisaccount does not work withotherresources. For this you will have to request a second account:22 +Green Arrow to start program 14 14 15 - == {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account ==24 +\\ 16 16 17 - Ifyou decideduringabeamtime, youwanttohave access to the cluster, tell your local contact so,and theywillrequest a PSX account for you. With this you willget access to the Kerberos, Windows andafs resourcesat DESY, which includesthecluster.26 += {{id name="00-HowtologintoMaxwell-LongVersion:"/}}**Long Version: ** = 18 18 28 +\\ 19 19 20 - After yougot the account,you havetochange the initial password within 6 days.Forthis, goto[[https:~~/~~/passwd.desy.de/>>url:https://passwd.desy.de/||shape="rect"]]and log in with your user name and initial password (you do not need any OTP when you sign in for the first time). Then agree to the terms and change your password.30 +**Login to max-nova**: E.g. from browser [[https:~~/~~/max-nova.desy.de:3443/>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] 21 21 22 - = {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =32 +\\ 23 23 24 - == {{idname="00-HowtologintoMaxwell-StructureoftheCluster"/}}StructureoftheCluster==34 +Click on "**Launch Session**" and "**XFCE**" Icon 25 25 26 - === {{id name="00-HowtologintoMaxwell-Overview"/}}Overview===36 +[[image:attach:image2021-4-27_13-55-52.png||height="250"]] 27 27 28 - The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node38 +\\ 29 29 30 - ==={{idname="00-HowtologintoMaxwell-EntranceNodes"/}}EntranceNodes ===40 +**Open a Terminal**, e.g. from the icon at the bottom of your desktop. You can also open it via right click → "Open Terminal here" directly on your desktop or from any folder. 31 31 32 -If you have successfully obtained an PSX account you can get started. The entrance node are: 33 - 42 +[[image:attach:image2021-4-27_13-58-35.png||height="250"]] 34 34 35 - [[https:~~/~~/max-display.desy.de:3389/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]] (in any case)44 +\\ 36 36 37 - Thesenodesare**not **forprocessing,as you willsharethem with manyotherusers.So pleasedonotdoanything computationalintensive on them, likereconstructionor visualization.Viewing images is ok.46 +Now you can **allocate a node** for yourself, so you will have enough memory and power for your reconstruction. 38 38 39 -=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X3 === 48 +(% class="code" %) 49 +((( 50 +salloc ~-~-partition=all ~-~-nodes=1 ~-~-time=06:00:00 51 +))) 40 40 41 - The cluster uses the software FastX3 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly.53 +\\ 42 42 43 - If you want to add a connection in the desktop client, clicktheplus, selectweb, use theaddressabove (including theport),andyourusernameand force ssh authentication.Then youcan chooseif you wanta virtualdesktop (XFCE) oraterminal.55 +You will get a node for 6 hours, you can also choose longer or shorter times. 44 44 45 - ==={{idname="00-HowtologintoMaxwell-Partitions"/}}Partitions===57 +It can take some time before you get a node, then it will tell you which node is reserved for you. (Example: max-exfl069) 46 46 47 - Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad:59 +\\ 48 48 49 -{{code}} 50 -my-partitions 51 -{{/code}} 61 +Now you can **login via ssh** on this node: 52 52 53 -Your result will look something like this: 63 +(% class="code" %) 64 +((( 65 +ssh max-exfl069 66 +))) 54 54 55 - [[image:attach:P5I.UserGuide\:NanoCT.4\. Reconstruction Guide.00 - Howtologin to Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2"]]68 +Enter your password. 56 56 57 - == {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM ==70 +\\ 58 58 59 - The access to the resources of the cluster is managed via a scheduler, SLURM.72 +EXAMPLE: 60 60 61 - SLURM schedules theaccessto nodesandcan revokes access ifhigherpriority jobs come.74 +[[image:attach:image2021-4-27_13-52-11.png||height="125"]] 62 62 63 - === {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition ===76 +\\ 64 64 65 - Here you cannot be kicked out of your allocation. However, only few nodes are in this partition and you can also only allocate few in parallel (2021: 5). Some of them have GPUs available.78 +\\ 66 66 67 - ==={{idname="00-HowtologintoMaxwell-AllPartition"/}}AllPartition===80 +Now you are on a different node [[image:http://confluence.desy.de/s/de_DE/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/_/images/icons/emoticons/smile.svg||title="(Lächeln)" border="0" class="emoticon emoticon-smile"]]. 68 68 69 - Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes.82 +\\ 70 70 71 - ==={{idname="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition===84 +You first have to **load the anaconda module:** 72 72 73 -Like all, but with GPUs 86 +(% class="code" %) 87 +((( 88 +module load anaconda/3 89 +\\ 90 +))) 74 74 75 - === {{idname="00-HowtologintoMaxwell-JhubPartition"/}}JhubPartition===92 +and **activate your virtual environment**, depending on where you installed it: 76 76 77 -For Jupyter Hub 94 +(% class="code" %) 95 +((( 96 +source activate ~~/envs/tomopy 97 +))) 78 78 99 +\\ 79 79 80 - =={{id name="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}Connectingto theCluster==101 +~~/ takes you back to your home directory. In this case, the environment "tomopy" was installed in the home directory in the folder "envs". 81 81 82 - Connect to an entrance node via FastX. You will automatically be assigned to a node when you start a session via a load balancer (max-display001-003, max-nova001-002)103 +\\ 83 83 84 - [[image:attach:P5I.User Guide\: NanoCT.4\.ReconstructionGuide.00 - Howto login to Maxwell.WebHome@image2021-4-27_13-55-52.png||queryString="version=1&modificationDate=1619524552546&api=v2"]]105 +now you can **start spyder**: 85 85 86 -Choose a graphic interface and look around. 87 - 88 - 89 -== {{id name="00-HowtologintoMaxwell-DataStorage"/}}Data Storage == 90 - 91 -The Maxwell cluster knows many storage systems. The most important are: 92 - 93 -Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this. 94 - 95 -The GPFS: here all the beamtime data are stored. 96 - 97 -=== {{id name="00-HowtologintoMaxwell-GPFS"/}}GPFS === 98 - 99 -Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id> 100 - 101 -In there you will find a substructure: 102 - 103 -* raw: raw measurement data. Only applicant and beamtime leader can write/delete there 104 -* processed: for all processed data 105 -* scratch_cc: scratch folder w/o backup 106 -* shared: for everything else 107 - 108 -The GPFS has regular snapshots. The whole capacity of this is huge (several PB) 109 - 110 -== {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node == 111 - 112 -If you want to do some processing, there are two ways to start a job in SLURM: 113 - 114 -1. Interactive 115 -1. Batch 116 - 117 -In both cases you are the only person working on the node, so use it as much as you like. 118 - 119 -=== {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job === 120 - 121 -To get a node you have to allocate one via SLURM e.g. use: 122 - 123 -{{code}} 124 -salloc -N 1 -p psx -t 1-05:00:00 125 -{{/code}} 126 - 127 -Looking at the individual options: 128 - 129 -* salloc: specifies you want a live allocation 130 -* -N 1: for one node 131 -* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all 132 -* -t 1-05:00:00: for the duration of 1 day and 5h 133 -* ((( 134 -Other options could be: ~-~-mem=500GB with at least 500GB of memory, 135 - 136 136 (% class="code" %) 137 137 ((( 138 - if you need gpu: (% class="bashplain" %){{codelanguage="none"}}--constraint=P100{{/code}}109 +spyder& 139 139 ))) 140 -))) 141 -* ... see the SLURM documentation for more options 142 142 143 - If your job is scheduled you see your assigned node and can connect via ssh to it. (in the rare case where you do not see anything use my-jobs to find out the host name).112 +\\ 144 144 145 - ==={{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Startinga batch job ===114 +EXAMPLE: (virtual environment in "envs/p36" 146 146 147 - Forabatchjob you need a small shell script describingwhat you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk)116 +[[image:attach:image2021-4-27_13-53-35.png||height="71"]] 148 148 149 - With a batch job, you can also start an array job, where the same task is executed on multiple servers in parallel.118 +\\ 150 150 151 - Anexample forsuchascript:120 +You can also start another terminal e.g. if you want to look at your data / reconstructions in fiji. 152 152 153 -{{code}} 154 -#!/bin/bash 155 -#SBATCH --time 0-01:00:00 156 -#SBATCH --nodes 1 157 -#SBATCH --partition all,ps 158 -#SBATCH --array 1-80 159 -#SBATCH --mem 250GB 160 -#SBATCH --job-name ExampleScript 122 +\\ 161 161 124 +\\ 162 162 163 -source /etc/profile.d/modules.sh 164 -echo "SLURM_JOB_ID $SLURM_JOB_ID" 165 -echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID" 166 -echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID" 167 -echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT" 168 -echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX" 169 -echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN" 126 +\\ 170 170 171 -module load maxwell gcc/8.2 172 - 173 -.local/bin/ipython3 --pylab=qt5 PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID 174 - 175 -exit 176 - 177 - 178 -{{/code}} 179 - 180 - 181 -To run this use 182 - 183 -{{code}} 184 -sbatch ./your_script.sh 185 -{{/code}} 186 - 187 - 188 -=== {{id name="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewing you allocations === 189 - 190 -To view your pending or running allocations you can use: 191 - 192 -{{code}} 193 -squeue -u <username> 194 - 195 -or 196 - 197 -my-jobs 198 -{{/code}} 199 - 200 - 201 -=== {{id name="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}What is realistic in terms of resources === 202 - 203 -To be fair, you will not get 100 nodes every time you want them. Especially during a user run, the machines are often quite busy. But if you design your scripts to be tolerant to sudden cancellation, it is still worth trying if you profit from massive parallelization. 204 - 205 -If you want to do some small processing, use one of the psx nodes. This should work most of the time. 206 - 207 - 208 -== {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}Granting Data Access to other Beamtimes == 209 - 210 -If you have to add other users to a past beamtime, this can be done via the gamma-portal (by PI, leader or beamline scientist). After adding the accounts, these people have to make sure to log off from **all **FastX sessions, etc. to update the permissions. 211 - 212 - 213 - 214 - 128 +\\
- image2021-4-27_13-51-12.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -349.0 KB - Content
- image2021-4-27_13-51-35.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -349.0 KB - Content
- image2021-4-27_13-52-11.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -47.1 KB - Content
- image2021-4-27_13-53-35.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -24.7 KB - Content
- image2021-4-27_13-55-52.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -99.9 KB - Content
- image2021-4-27_13-58-35.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -285.1 KB - Content
- image2021-5-4_10-27-13.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -278.7 KB - Content
- image2021-5-4_10-28-14.png
-
- Author
-
... ... @@ -1,1 +1,0 @@ 1 -XWiki.flenners - Size
-
... ... @@ -1,1 +1,0 @@ 1 -278.8 KB - Content
- Confluence.Code.ConfluencePageClass[0]
-
- Id
-
... ... @@ -1,1 +1,1 @@ 1 -204941 4971 +204941724 - Title
-
... ... @@ -1,1 +1,1 @@ 1 - 00 -How tologintoMaxwell1 +How to start the RecoGUI - URL
-
... ... @@ -1,1 +1,1 @@ 1 -https://confluence.desy.de/spaces/P5I/pages/204941 497/00 -How tologintoMaxwell1 +https://confluence.desy.de/spaces/P5I/pages/204941724/How to start the RecoGUI