02 - How to login to Maxwell and start RecoGUI

1

The DESY has a quite powerful compute cluster called the Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.Maxwell Cluster.WebHome||shape="rect"]], however as this can be confusing sometimes, we will try to condensate this to a step by step manual.

2

= {{id name="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =

8

9

During you beamtime you will encounter multiple systems, where you will need two different types of accounts:

10

11

== {{id name="00-HowtologintoMaxwell-TheDOORAccount"/}}The DOOR Account ==

12

13

Before you arrive you have to create a DOOR account (Institution: Physik Department E17) and do all the safety trainings. This account is also being used for the gamma-portal, where you can manage you beamtime data, grant access to other users and manage FTP access. However this account does not work with other resources. For this you will have to request a second account:

14

15

== {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account ==

16

17

If you decide during a beamtime, you want to have access to the cluster, tell your local contact so, and they will request a PSX account for you. With this you will get access to the Kerberos, Windows and afs resources at DESY, which includes the cluster.

18

19

= {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =

20

21

== {{id name="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structure of the Cluster ==

22

23

=== {{id name="00-HowtologintoMaxwell-Overview"/}}Overview ===

24

25

The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node

26

27

=== {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes ===

28

29

If you have successfully obtained an PSX account you can get started. The entrance node are:

30

\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh||shape="rect"]] (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon)

31

32

[[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh||shape="rect"]] (in any case)

33

34

These nodes are **not **for processing, as you will share them with many other users. So please do not do anything computational intensive on them, like reconstruction or visualization. Viewing images is ok.

35

36

=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 ===

37

38

The cluster uses the software FastX2 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly.

39

40

If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal.

41

42

=== {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions ===

43

44

Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad:

my-partitions

Your result will look something like this:

51

52

[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-5-4_10-28-14.png||queryString="version=1&modificationDate=1620116894626&api=v2" alt="image2021-5-4_10-28-14.png"]]

53

54

== {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM ==

55

56

The access to the resources of the cluster is managed via a scheduler, SLURM.

57

58

SLURM schedules the access to nodes and can revokes access if higher priority jobs come.

59

60

=== {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition ===

61

62

Here you cannot be kicked out of your allocation. However, only few nodes are in this partition and you can also only allocate few in parallel (2021: 5). Some of them have GPUs available.

63

64

=== {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition ===

65

66

Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes.

67

68

=== {{id name="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition ===

69

70

Like all, but with GPUs

71

72

=== {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition ===

For Jupyter Hub

Wiki source code of 02 - How to login to Maxwell and start RecoGUI

Applications

Navigation

author	version	line-number	content
		1	The DESY has a quite powerful compute cluster called the Maxwell cluster. The documentation can be found here [[https:~~/~~/confluence.desy.de/display/MXW/Maxwell+Cluster>>doc:MXW.Maxwell Cluster.WebHome\|\|shape="rect"]], however as this can be confusing sometimes, we will try to condensate this to a step by step manual.
		2
		3
		4
		5	{{toc/}}
		6
		7	= {{id name="00-HowtologintoMaxwell-GettingaDESYAccount"/}}Getting a DESY Account =
		8
		9	During you beamtime you will encounter multiple systems, where you will need two different types of accounts:
		10
		11	== {{id name="00-HowtologintoMaxwell-TheDOORAccount"/}}The DOOR Account ==
		12
		13	Before you arrive you have to create a DOOR account (Institution: Physik Department E17) and do all the safety trainings. This account is also being used for the gamma-portal, where you can manage you beamtime data, grant access to other users and manage FTP access. However this account does not work with other resources. For this you will have to request a second account:
		14
		15	== {{id name="00-HowtologintoMaxwell-ThePSXAccount"/}}The PSX Account ==
		16
		17	If you decide during a beamtime, you want to have access to the cluster, tell your local contact so, and they will request a PSX account for you. With this you will get access to the Kerberos, Windows and afs resources at DESY, which includes the cluster.
		18
		19	= {{id name="00-HowtologintoMaxwell-UsingtheCluster"/}}Using the Cluster =
		20
		21	== {{id name="00-HowtologintoMaxwell-StructureoftheCluster"/}}Structure of the Cluster ==
		22
		23	=== {{id name="00-HowtologintoMaxwell-Overview"/}}Overview ===
		24
		25	The Maxwell Cluster has (status 2021) more than 750 nodes in it. To organize this, you cannot access any node directly, but you have to request compute resources at first. You then can connect form an entrance node to you compute node
		26
		27	=== {{id name="00-HowtologintoMaxwell-EntranceNodes"/}}Entrance Nodes ===
		28
		29	If you have successfully obtained an PSX account you can get started. The entrance node are:
		30	\\[[https:~~/~~/max-nova.desy.de:3443/auth/ssh>>url:https://max-nova.desy.de:3443/auth/ssh\|\|shape="rect"]] (if you have access to the nova resources, most likely the case if your beamtime was in cooperation with the Helmholtz Zentrum Hereon)
		31
		32	[[https:~~/~~/max-display.desy.de:3443/auth/ssh>>url:https://max-display.desy.de:3443/auth/ssh\|\|shape="rect"]] (in any case)
		33
		34	These nodes are not for processing, as you will share them with many other users. So please do not do anything computational intensive on them, like reconstruction or visualization. Viewing images is ok.
		35
		36	=== {{id name="00-HowtologintoMaxwell-FastX2"/}}Fast X2 ===
		37
		38	The cluster uses the software FastX2 for connection and virtual desktop. To get the right version of this, use the web interface, log in, and in the bottom right corner is a download link for the desktop client. The version has to match exactly to work properly.
		39
		40	If you want to add a connection in the desktop client, click the plus, select web, use the address above (including the port), and your username and force ssh authentication. Then you can choose if you want a virtual desktop (XFCE) or a terminal.
		41
		42	=== {{id name="00-HowtologintoMaxwell-Partitions"/}}Partitions ===
		43
		44	Starting from an entrance node, you can connect to a compute node. As there are multiple levels of priorities etc. the nodes are organizes in partitions. You can only access some of these. To view which one, open a terminal and use the commad:
		45
		46	{{code}}
		47	my-partitions
		48	{{/code}}
		49
		50	Your result will look something like this:
		51
		52	[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-5-4_10-28-14.png\|\|queryString="version=1&modificationDate=1620116894626&api=v2" alt="image2021-5-4_10-28-14.png"]]
		53
		54	== {{id name="00-HowtologintoMaxwell-SLURM"/}}SLURM ==
		55
		56	The access to the resources of the cluster is managed via a scheduler, SLURM.
		57
		58	SLURM schedules the access to nodes and can revokes access if higher priority jobs come.
		59
		60	=== {{id name="00-HowtologintoMaxwell-PSXPartition"/}}PSX Partition ===
		61
		62	Here you cannot be kicked out of your allocation. However, only few nodes are in this partition and you can also only allocate few in parallel (2021: 5). Some of them have GPUs available.
		63
		64	=== {{id name="00-HowtologintoMaxwell-AllPartition"/}}All Partition ===
		65
		66	Very large number of nodes available and you can allocate many in parallel (2021: 100). However each allocation can be revoked without a warning if s.o. with higher priority comes. This is very common to happen. If you want to use this partition, be sure to design your job accordingly. Only CPU nodes.
		67
		68	=== {{id name="00-HowtologintoMaxwell-AllgpuPartition"/}}Allgpu Partition ===
		69
		70	Like all, but with GPUs
		71
		72	=== {{id name="00-HowtologintoMaxwell-JhubPartition"/}}Jhub Partition ===
		73
		74	For Jupyter Hub
		75
		76	See also the section on python.
		77
		78	=== {{id name="00-HowtologintoMaxwell-HzgPartition"/}}Hzg Partition ===
		79
		80	Ask someone from hereon how to use them (e.g. [[Riedel, Mirko>>url:https://wiki.tum.de/display/~~ga78nig\|\|shape="rect"]] )
		81
		82	\\
		83
		84	== {{id name="00-HowtologintoMaxwell-ConnectingtotheCluster"/}}Connecting to the Cluster ==
		85
		86	Connect to an entrance node via FastX. You will automatically be assigned to a node when you start a session via a load balancer (max-display001-003, max-nova001-002)
		87
		88	[[image:attach:P5I.User Guide\: NanoCT.4\. Reconstruction Guide.00 - How to login to Maxwell.WebHome@image2021-4-27_13-55-52.png\|\|queryString="version=1&modificationDate=1619524552546&api=v2" alt="image2021-4-27_13-55-52.png"]]
		89
		90	Choose a graphic interface and look around.
		91
		92	\\
		93
		94	== {{id name="00-HowtologintoMaxwell-DataStorage"/}}Data Storage ==
		95
		96	The Maxwell cluster knows many storage systems. The most important are:
		97
		98	Your User Folder: This has a hard limit of 30 GB. Be sure not to exceed this.
		99
		100	The GPFS: here all the beamtime data are stored.
		101
		102	=== {{id name="00-HowtologintoMaxwell-GPFS"/}}GPFS ===
		103
		104	Usually you can find you data at: /asap3/petra3/gpfs/<beamline>/<year>/data/<beamtime_id>
		105
		106	In there you will find a substructure:
		107
		108	* raw: raw measurement data. Only applicant and beamtime leader can write/delete there
		109	* processed: for all processed data
		110	* scratch_cc: scratch folder w/o backup
		111	* shared: for everything else
		112
		113	The GPFS has regular snapshots. The whole capacity of this is huge (several PB)
		114
		115	== {{id name="00-HowtologintoMaxwell-HowtoGetaComputeNode"/}}How to Get a Compute Node ==
		116
		117	If you want to do some processing, there are two ways to start a job in SLURM:
		118
		119	1. Interactive
		120	1. Batch
		121
		122	In both cases you are the only person working on the node, so use it as much as you like.
		123
		124	=== {{id name="00-HowtologintoMaxwell-StartinganInteractiveJob"/}}Starting an Interactive Job ===
		125
		126	To get a node you have to allocate one via SLURM e.g. use:
		127
		128	{{code}}
		129	salloc -N 1 -p psx -t 1-05:00:00
		130	{{/code}}
		131
		132	Looking at the individual options:
		133
		134	* salloc: specifies you want a live allocation
		135	* -N 1: for one node
		136	* -p psx: on the psx partition. You can also add multiple separated with a comma: -p psx,all
		137	* -t 1-05:00:00: for the duration of 1 day and 5h
		138	* Other options could be: ~-~-mem=500GB with at least 500GB of memory
		139	* ... see the SLURM documentation for more options
		140
		141	If your job is scheduled you see your assigned node and can connect via ssh to it. (in the rare case where you do not see anything use my-jobs to find out the host name).
		142
		143	=== {{id name="00-HowtologintoMaxwell-Startingabatchjob"/}}Starting a batch job ===
		144
		145	For a batch job you need a small shell script describing what you want to do. You do not see the job directly, but the output is written to a log file (and results can be stored on disk)
		146
		147	With a batch job, you can also start an array job, where the same task is executed on multiple servers in parallel.
		148
		149	An example for such a script:
		150
		151	{{code}}
		152	#!/bin/bash
		153	#SBATCH --time 0-01:00:00
		154	#SBATCH --nodes 1
		155	#SBATCH --partition all,ps
		156	#SBATCH --array 1-80
		157	#SBATCH --mem 250GB
		158	#SBATCH --job-name ExampleScript
		159
		160
		161	source /etc/profile.d/modules.sh
		162	echo "SLURM_JOB_ID $SLURM_JOB_ID"
		163	echo "SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID"
		164	echo "SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID"
		165	echo "SLURM_ARRAY_TASK_COUNT $SLURM_ARRAY_TASK_COUNT"
		166	echo "SLURM_ARRAY_TASK_MAX $SLURM_ARRAY_TASK_MAX"
		167	echo "SLURM_ARRAY_TASK_MIN $SLURM_ARRAY_TASK_MIN"
		168
		169	module load maxwell gcc/8.2
		170
		171	.local/bin/ipython3 --pylab=qt5 PathToYourScript/Script.py $SLURM_ARRAY_TASK_ID
		172
		173	exit
		174
		175
		176	{{/code}}
		177
		178	\\
		179
		180	To run this use
		181
		182	{{code}}
		183	sbatch ./your_script.sh
		184	{{/code}}
		185
		186	\\
		187
		188	=== {{id name="00-HowtologintoMaxwell-Viewingyouallocations"/}}Viewing you allocations ===
		189
		190	To view your pending or running allocations you can use:
		191
		192	{{code}}
		193	squeue -u <username>
		194
		195	or
		196
		197	my-jobs
		198	{{/code}}
		199
		200	\\
		201
		202	=== {{id name="00-HowtologintoMaxwell-Whatisrealisticintermsofresources"/}}What is realistic in terms of resources ===
		203
		204	To be fair, you will not get 100 nodes every time you want them. Especially during a user run, the machines are often quite busy. But if you design your scripts to be tolerant to sudden cancellation, it is still worth trying if you profit from massive parallelization.
		205
		206	If you want to do some small processing, use one of the psx nodes. This should work most of the time.
		207
		208	\\
		209
		210	== {{id name="00-HowtologintoMaxwell-GrantingDataAccesstootherBeamtimes"/}}Granting Data Access to other Beamtimes ==
		211
		212	If you have to add other users to a past beamtime, this can be done via the gamma-portal. After adding the accounts, these people have to make sure to log off from all FastX sessions, etc. to update the permissions.
		213
		214	= {{id name="00-HowtologintoMaxwell-StartingarecoGUI"/}}Starting a reco GUI =
		215
		216	== {{id name="00-HowtologintoMaxwell-ShortVersion:"/}}Short Version: ==
		217
		218	Terminal:
		219
		220	(% class="code" %)
		221	(((
		222	salloc ~-~-partition=psx ~-~-nodes=1 –-time=06:00:00
		223	\\(if you need gpu: (% class="bash plain" %){{code language="none"}}--constraint=P100{{/code}}(%%) )
		224	\\ssh max-bla123
		225	\\module load anaconda
		226	\\source activate ~~/envs/tomopy
		227	\\spyder&
		228	)))
		229
		230	\\
		231
		232	\\
		233
		234	{{code linenumbers="true" collapse="true"}}
		235	salloc --partition=psx --nodes=1 –-time=06:00:00
		236
		237	ssh max-bla123
		238
		239	module load anaconda
		240
		241	source activate ~/envs/tomopy
		242
		243	spyder&
		244	{{/code}}
		245
		246	\\
		247
		248	Spyder:
		249
		250	Open RecoGUI,
		251
		252	(Right click on tab: "Set console working directory") (to be removed)
		253
		254	Green Arrow to start program
		255
		256	\\
		257
		258	== {{id name="00-HowtologintoMaxwell-LongerVersion"/}}Longer Version ==
		259
		260	\\
		261
		262	Login to maxwell and allocate a node from the psx partition.
		263
		264	\\
		265
		266	You first have to load the anaconda module:
		267
		268	(% class="code" %)
		269	(((
		270	module load anaconda/3
		271	\\
		272	)))
		273
		274	and activate your virtual environment, depending on where you installed it:
		275
		276	(% class="code" %)
		277	(((
		278	source activate ~~/envs/tomopy
		279	)))
		280
		281	\\
		282
		283	~~/ takes you back to your home directory. In this case, the environment "tomopy" was installed in the home directory in the folder "envs".
		284
		285	\\
		286
		287	now you can start spyder:
		288
		289	(% class="code" %)
		290	(((
		291	spyder&
		292	)))
		293
		294	\\
		295
		296	EXAMPLE: (virtual environment in "envs/p36"
		297
		298	[[image:attach:image2021-4-27_13-53-35.png\|\|height="71"]]
		299
		300	\\
		301
		302	You can also start another terminal e.g. if you want to look at your data / reconstructions in fiji.
		303
		304	\\
		305
		306	\\