1710
Comment:
|
2733
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
At Zeuthen a cluster of 16 dual opteron machines is available. It is integrated into the SGE batch system. The documentation in ["Batch System Usage"] applies to it. | At Zeuthen, two clusters are available, one with 16 dual Opteron machines connected by Infiniband and one with 8 dual Xeons and Myrinet. They are integrated into the SGE batch system. The documentation in ["Batch System Usage"] applies to them. |
Line 7: | Line 7: |
Applications for the cluster must be compiled on a 64 bit machine, at the moment, this means either lx64 or linfini. There are MPI versions for the GCC, Intel and PGI compilers installed: | === Infiniband === Applications for the cluster must be compiled on a 64 bit machine, at the moment, this is lx64 only. There are MPI versions for the GCC, Intel and PGI compilers installed: |
Line 18: | Line 19: |
=== Myrinet === Applications for this cluster can be compiled on the pub.ifh.de machines. There are MPI versions for the GCC, Intel and PGI compilers installed: /opt/mpich/gcc/bin/mpicc /opt/mpich/intel/bin/mpicc /opt/mpich/pgi/bin/mpicc Compilers for C++ and Fortran are available as well. |
|
Line 21: | Line 32: |
A job script designated for a parallel job needs to specify the parallel environment and the number of required CPUs. The parameter looks like this: | A job script designated for a parallel job needs to specify the parallel environment and the number of required CPUs. The parameter looks like this for the Infiniband cluster: |
Line 25: | Line 36: |
It is important to request the right limit for memory with the parameter h_vmem. The machines have 3673204k of RAM and by default two jobs are executed on one node, so the maximal amount of memory is 1650M per process. | On the Myrinet cluster, it is similar: #$ -pe mpichgm-ppn2 4 Be sure to call the right mpirun version. On the Infiniband cluster use: /usr/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines yourapp On the Myrinet cluster use: /opt/mpich/gcc/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines yourapp It is important to request the right limit for memory with the parameter h_vmem. The Opteron machines have 3.3G of RAM and by default two jobs are executed on one node, so the maximal amount of memory is 1650M per process: #$ -l h_vmem=1650M The Xeons have 922.5M of RAM. |
Line 38: | Line 69: |
== Further documentation == [http://www-zeuthen.desy.de/technisches_seminar/texte/Technisches_Seminar_Waschk.pdf HPC-Clusters at DESY Zeuthen] , 11/22/06, technical seminar |
Usage of the Linux Clusters at DESY Zeuthen
At Zeuthen, two clusters are available, one with 16 dual Opteron machines connected by Infiniband and one with 8 dual Xeons and Myrinet. They are integrated into the SGE batch system. The documentation in ["Batch System Usage"] applies to them.
Building Applications
Infiniband
Applications for the cluster must be compiled on a 64 bit machine, at the moment, this is lx64 only. There are MPI versions for the GCC, Intel and PGI compilers installed:
/usr/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpicc
/usr/local/ibgd/mpi/osu/intel/mvapich-0.9.5/bin/mpicc
/usr/local/ibgd/mpi/osu/pgi/mvapich-0.9.5/bin/mpicc
Compilers for C++ and Fortran are available as well.
Myrinet
Applications for this cluster can be compiled on the pub.ifh.de machines. There are MPI versions for the GCC, Intel and PGI compilers installed:
/opt/mpich/gcc/bin/mpicc
/opt/mpich/intel/bin/mpicc
/opt/mpich/pgi/bin/mpicc
Compilers for C++ and Fortran are available as well.
Batch System Access
A job script designated for a parallel job needs to specify the parallel environment and the number of required CPUs. The parameter looks like this for the Infiniband cluster:
#$ -pe mpich-ppn2 4
On the Myrinet cluster, it is similar:
#$ -pe mpichgm-ppn2 4
Be sure to call the right mpirun version. On the Infiniband cluster use:
/usr/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines yourapp
On the Myrinet cluster use:
/opt/mpich/gcc/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines yourapp
It is important to request the right limit for memory with the parameter h_vmem.
The Opteron machines have 3.3G of RAM and by default two jobs are executed on one node, so the maximal amount of memory is 1650M per process:
#$ -l h_vmem=1650M
The Xeons have 922.5M of RAM.
AFS Access
The application binary must be available to all nodes, that's why it should be placed in an AFS directory.
Be aware that the batch system renews the AFS token, but only on the node that starts the first process (node 0). That's why you should access the AFS from that node. An example scenario looks like this:
- Copy data from AFS to node 0.
- Copy it with scp to the nodes that need it to the directory $TMPDIR, the machine names are in $TMPDIR/machines
- Run your MPI job.
- Copy the results with scp from the local discs to node 0.
- Copy the data from node 0 to AFS.
Further documentation
[http://www-zeuthen.desy.de/technisches_seminar/texte/Technisches_Seminar_Waschk.pdf HPC-Clusters at DESY Zeuthen] , 11/22/06, technical seminar