{"id":1788,"date":"2017-11-28T14:19:50","date_gmt":"2017-11-28T14:19:50","guid":{"rendered":"http:\/\/35.198.183.193\/?page_id=1788"},"modified":"2018-04-25T12:41:30","modified_gmt":"2018-04-25T12:41:30","slug":"slurm-job-scheduler","status":"publish","type":"page","link":"http:\/\/escience.sdu.dk\/index.php\/slurm-job-scheduler\/","title":{"rendered":"Slurm job scheduler"},"content":{"rendered":"<p>At ABACUS 2.0 we use <a href=\"http:\/\/slurm.schedmd.com\/\">Slurm<\/a> for scheduling jobs.<\/p>\n<p>In general, ABACUS 2.0 is intended to be used for batch jobs, i.e. jobs which run without any user intervention. It is also possible to run interactive jobs&#8212;this is described <a href=\"interactive\">here<\/a>.<\/p>\n<p>A typical user usage scenario is the following<\/p>\n<ul>\n<li>The user logs in to <code>fe.deic.sdu.dk<\/code>.<\/li>\n<li>A previous job script is edited with new parameters.<\/li>\n<li>The job script is submitted to the job queue.<\/li>\n<li>The user logs out<\/li>\n<li>Later, after the job has completed, the user logs in again to retrieve the result.<\/li>\n<\/ul>\n<p>Job scripts as mentioned in Steps 2 and 3 contains both details of which computer resources are needed (number and types of nodes, etc.) and details on which application should be run and how (name and version of the application, input and output, etc.).<\/p>\n<h4 id=\"general-commands\">General commands<\/h4>\n<p>You can use <code>man<\/code> to get further documentation one the commands mentioned later:<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> man COMMAND\r\n<\/code><\/pre>\n<\/div>\n<p>Try the following commands<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> man sbatch\r\n<span class=\"gp\">testuser@fe1:~$<\/span> man squeue\r\n<span class=\"gp\">testuser@fe1:~$<\/span> man scancel\r\n<\/code><\/pre>\n<\/div>\n<h4 id=\"accounts\">Accounts<\/h4>\n<p>To see which accounts are available to you, including how many node hours are available, use the command <code>abc-quota<\/code>:<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> abc-quota\r\n\r\n<span class=\"go\">Available node hours per account\/user<\/span>\r\n<span class=\"go\">=====================================<\/span>\r\n\r\n<span class=\"go\">Account\/user |   Quota   Avail | UsedPeriod   % of Qt | UsedMonth<\/span>\r\n<span class=\"go\">------------ + ------- ------- + ---------- --------- + ---------<\/span>\r\n\r\n<span class=\"go\">test00_gpu   |   2,000   1,220 |        780    39.4 % |       650<\/span>\r\n<span class=\"go\"> otheruser   |                 |         80     4.4 % |        50<\/span>\r\n<span class=\"go\"> testuser *  |                 |        700    35.0 % |       600<\/span>\r\n\r\n<span class=\"go\">...<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p>In this case, <code>testuser<\/code> can use the account&nbsp;<code>test00_gpu<\/code>. Within this accounting period, the user&nbsp;<code>testuser<\/code> has used 700 node hours, and the <code>test00_gpu&nbsp;<\/code>account has used in total used 780 hours. 1,220 node hours are still available. As shown in the column <code>UsedMonth<\/code> most node hours have been used during this month.<\/p>\n<h4 id=\"submitting-jobs\">Submitting jobs<\/h4>\n<p>The following is a minimal job script &#8211; it generates a lot of random numbers and then sorts them. See later for a more realistic job script. For any job script, you should specify the account to use, the number of nodes you want (default 1), and the maximum wall time (at most 24 hours):<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"c\">#! \/bin\/bash<\/span>\r\n<span class=\"c\">#<\/span>\r\n<span class=\"c\">#SBATCH --account test00_gpu      # account<\/span>\r\n<span class=\"c\">#SBATCH --nodes 1                 # number of nodes<\/span>\r\n<span class=\"c\">#SBATCH --time 2:00:00            # max time (HH:MM:SS)<\/span>\r\n\r\n<span class=\"k\">for<\/span> i in <span class=\"o\">{<\/span>1..10000<span class=\"o\">}<\/span><span class=\"p\">;<\/span> <span class=\"k\">do<\/span>\r\n  <span class=\"nb\">echo<\/span> <span class=\"nv\">$RANDOM<\/span> &gt;&gt; random.txt\r\n<span class=\"k\">done<\/span>\r\n\r\nsort random.txt\r\n<\/code><\/pre>\n<\/div>\n<p>Note that Slurm parameters must be specified at the top of the file before any real commands. Further, <code>#SBATCH<\/code> must appear at the start of the line written exactly as <code>#SBATCH<\/code>.<\/p>\n<p>To submit the job, write the above contents to a file, e.g.<code>myscript.sh<\/code>, and run the command:<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> sbatch myscript.sh\r\n<\/code><\/pre>\n<\/div>\n<p>You can also add extra options for <code>sbatch<\/code> overriding the values in the script itself, e.g.,<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> sbatch --time 4:00:00 myscript.sh\r\n<\/code><\/pre>\n<\/div>\n<h4 id=\"information-on-jobs\">Information on jobs<\/h4>\n<p>List all current, running or pending jobs for the user <code>testuser<\/code>:<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> squeue -u testuser\r\n<span class=\"gp\">testuser@fe1:~$<\/span> squeue -u testuser -t RUNNING\r\n<span class=\"gp\">testuser@fe1:~$<\/span> squeue -u testuser -t PENDING\r\n<\/code><\/pre>\n<\/div>\n<p>List detailed information for a job (sometimes useful for troubleshooting):<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> scontrol show jobid -dd &lt;jobid&gt;\r\n<\/code><\/pre>\n<\/div>\n<p>To cancel a single job, all jobs or all pending jobs for the user&nbsp;<code>testuser<\/code>:<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"gp\">testuser@fe1:~$<\/span> scancel &lt;jobid&gt;\r\n<span class=\"gp\">testuser@fe1:~$<\/span> scancel -u testuser\r\n<span class=\"gp\">testuser@fe1:~$<\/span> scancel -u testuser -t PENDING\r\n<\/code><\/pre>\n<\/div>\n<h4 id=\"interactive-jobs\">Interactive jobs<\/h4>\n<p>It is also possible to run interactive jobs on ABACUS 2.0, i.e. jobs where you using a GUI or using the command line use one or more of our compute nodes as if you were sitting at your own computer.<\/p>\n<p>How to do can be seen <a href=\"interactive\">here<\/a>.<\/p>\n<h4 id=\"jobscript-tips\">Jobscript tips<\/h4>\n<ul>\n<li><em>Walltime<\/em> <code>--time<\/code>: Set the maximum wall time as low as possible enables Slurm to possibly pack your job on idle nodes currently waiting for a large job to start.<\/li>\n<li><em>Nodes<\/em> <code>--nodes<\/code>: If your job can be flexible, use a range of the number of nodes needed to run the job, e.g.<code>--nodes=4-6<\/code>. In this case your job starts running when at least 4 nodes are available. If at that time 5 or 6 nodes are available, your job gets all of them.<\/li>\n<li><em>Tasks per node<\/em>, <code>--ntasks-per-node<\/code>: Use this to select how many MPI ranks you want per node, e.g., 24 if you want one rank per cpu core or 2 if you want one mpi rank per gpu card.<\/li>\n<\/ul>\n<p>Note that you do not need to specify the following in your job scripts<\/p>\n<ul>\n<li><em>Partition<\/em>: The partition is automatically derived from the account you use, e.g., <code>test00_gpu<\/code> implies the partition <code>gpu<\/code>.<\/li>\n<li><em>Memory<\/em> use, e.g., <code>--mem<\/code> or <code>--mem-per-spu<\/code>: By default you get all the RAM on the nodes you are running.<\/li>\n<li><em>GPU cards<\/em>, i.e., <code>--gres=gpu:2<\/code>: If you are running on a gpu node, you automatically get access to both gpu cards.<\/li>\n<\/ul>\n<h4 id=\"mpi-jobs\">MPI jobs<\/h4>\n<p>For MPI jobs, you should use a combination of <code>--nodes<\/code> and&nbsp;<code>--ntasks-per-node<\/code> to get the number of nodes and MPI ranks per node you want. Both have a default value of 1.<\/p>\n<p>For all MPI implementations available as as module at ABACUS 2.0, the recommended way to start MPI applications is using <code>srun<\/code>, i.e.,&nbsp;<em>not<\/em> <code>mpirun<\/code> or similar.<\/p>\n<div class=\"codehilite\">\n<pre><code><span class=\"c\">#! \/bin\/bash<\/span>\r\n<span class=\"c\">#<\/span>\r\n<span class=\"c\">#SBATCH --account test00_gpu      # account<\/span>\r\n<span class=\"c\">#SBATCH --nodes 4                 # number of nodes<\/span>\r\n<span class=\"c\">#SBATCH --ntasks-per-node 24      # number of MPI tasks per node<\/span>\r\n<span class=\"c\">#SBATCH --time 2:00:00            # max time (HH:MM:SS)<\/span>\r\n\r\n<span class=\"nb\">echo <\/span>Running on <span class=\"s2\">\"<\/span><span class=\"k\">$(<\/span>hostname<span class=\"k\">)<\/span><span class=\"s2\">\"<\/span>\r\n<span class=\"nb\">echo <\/span>Available nodes: <span class=\"s2\">\"<\/span><span class=\"nv\">$SLURM_NODELIST<\/span><span class=\"s2\">\"<\/span>\r\n<span class=\"nb\">echo <\/span>Slurm_submit_dir: <span class=\"s2\">\"<\/span><span class=\"nv\">$SLURM_SUBMIT_DIR<\/span><span class=\"s2\">\"<\/span>\r\n<span class=\"nb\">echo <\/span>Start <span class=\"nb\">time<\/span>: <span class=\"s2\">\"<\/span><span class=\"k\">$(<\/span>date<span class=\"k\">)<\/span><span class=\"s2\">\"<\/span>\r\n\r\n<span class=\"c\"># Load the modules previously used when compiling the application<\/span>\r\nmodule purge\r\nmodule add gcc\/4.8-c7 openmpi\/1.8.4\r\n\r\n<span class=\"c\"># Start in total 4*24 MPI ranks on all available CPU cores<\/span>\r\nsrun my-mpi-application -i input.txt -o output.txt\r\n\r\n<span class=\"nb\">echo <\/span>Done.\r\n<\/code><\/pre>\n<\/div>\n<h4 id=\"further-jobscript-examples\">Further jobscript examples<\/h4>\n<h4 id=\"purely-sequential-job\">Purely sequential job<\/h4>\n<div class=\"codehilite\">\n<pre><code><span class=\"c\">#!\/bin\/bash<\/span>\r\n\r\n<span class=\"c\">#SBATCH --account test00_gpu      # account<\/span>\r\n<span class=\"c\">#SBATCH --nodes 1                 # number of nodes<\/span>\r\n<span class=\"c\">#SBATCH --time 2:00:00            # max time (HH:MM:SS)<\/span>\r\n\r\n.\/serial.exe\r\n<\/code><\/pre>\n<\/div>\n<h4 id=\"amber-gaussian-gromacs-namd-etc\">Amber, Gaussian, Gromacs, Namd, etc<\/h4>\n<p>You can find sample sbatch job scripts in the folder&nbsp;<code>\/opt\/sys\/documentation\/sbatch-scripts\/<\/code> on the ABACUS 2.0 frontend nodes. For the software packages installed on ABACUS 2.0, you can also look at our <a href=\"\/index.php\/supported-applications\/\">software page<\/a> for further information.<\/p>\n<h4 id=\"switches\">Using as few switches as possible<\/h4>\n<p>The InfiniBand switches in ABACUS 2.0 are connected using a&nbsp;<a href=\"\/index.php\/hpc\/\">3D torus<\/a>. By default, Slurm always starts your job as soon as possible. When enough nodes are available for the job, i.e.<br \/>\nthe job is ready to start, Slurm packs the job on the available nodes as good as possible.<\/p>\n<p>If you have a very network intensive job, you may want to ensure that your job is packed as good as possible, even at the cost of the job maybe starting later than would otherwise be possible.<\/p>\n<p>For all the the possible <code>sbatch --switches<\/code> options below, there is a time limit of one hour, i.e., after one hour, the&nbsp;<code>--switches<\/code> option is ignored.<\/p>\n<h6 id=\"sbatch-switches-1\"><code>sbatch --switches 1<\/code><\/h6>\n<p>Run everything using nodes from one switch (at most 16 slim\/fat nodes or 18 gpu nodes)<\/p>\n<h6 id=\"sbatch-switches-2\"><code>sbatch --switches 2<\/code><\/h6>\n<p>Run everything using nodes from at most two neighbour switches (at most 32 slim\/fat nodes or 34 gpu nodes).<\/p>\n<h6 id=\"sbatch-switches-3\"><code>sbatch --switches 3<\/code><\/h6>\n<p>Run everything using nodes from at most 2&#215;2 neighbour switches (at most 64\/72 nodes). For both fat and gpu nodes, there is no need to specify this as there is only 64 respectively 72 nodes available.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At ABACUS 2.0 we use Slurm for scheduling jobs. In general, ABACUS 2.0 is intended to be used for batch jobs, i.e. jobs which run without any user intervention. It is also possible to run<a class=\"moretag\" href=\"http:\/\/escience.sdu.dk\/index.php\/slurm-job-scheduler\/\"> Read more&hellip;<\/a><\/p>\n","protected":false},"author":1,"featured_media":3986,"parent":0,"menu_order":26,"comment_status":"closed","ping_status":"closed","template":"page-templates\/template-fullwidth.php","meta":[],"_links":{"self":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/pages\/1788"}],"collection":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/comments?post=1788"}],"version-history":[{"count":10,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/pages\/1788\/revisions"}],"predecessor-version":[{"id":6398,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/pages\/1788\/revisions\/6398"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/media\/3986"}],"wp:attachment":[{"href":"http:\/\/escience.sdu.dk\/index.php\/wp-json\/wp\/v2\/media?parent=1788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}