How can MATLAB be used?

In the following we describe two main ways to use MATLAB at Abacus.

  • Non-interactive Slurm script (recommended)
  • Semi-interactive using a MATLAB GUI running on your own computer

Note that MATLAB is currently only available for users from some of the Danish universites. For further information, have a look here.

Running MATLAB via a Slurm script

The recommended way to use MATLAB on Abacus is to run it as a non-interactive Slurm batch script.

To do this your MATLAB code must be able to run successfully from the command line prompt without the use of any graphics, i.e., the following must run successfully (given that your MATLAB code is saved in the file matlab_code.m).

sysop@fe1:~$ module add matlab/R2016a # Use any of the available MATLAB versions
sysop@fe1:~$ matlab -nodisplay -r matlab_code

Next, to actually run your code, you must copy all the relevant files to Abacus, and next write a Slurm script as shown below. The sample sbatch script can be found on the Abacus frontend node at the location /opt/sys/documentation/sbatch-scripts/matlab/matlab-R2016a.sh.

#!/bin/bash
#
#SBATCH --nodes 1                 # number of nodes
#SBATCH --time 2:00:00            # max time (HH:MM:SS)

echo Running on "$(hostname)"
echo Available nodes: "$SLURM_NODELIST"
echo Slurm_submit_dir: "$SLURM_SUBMIT_DIR"
echo Start time: "$(date)"

# Load relevant modules
module purge
module add matlab/R2016a

# Run the MATLAB code available in matlab_code.m
# (note the missing .m)
matlab -nodisplay -r matlab_code

echo Done.

Note that by default, this only runs on one compute node even if multiple nodes are specified with e.g., --nodes 8. To use more nodes, you must use MDCS as shown further below.

Running MATLAB via a MATLAB GUI on your own computer

This guide describes how to use MATLAB on Abacus in combination with a MATLAB GUI running on your own computer/laptop.

Requirements
Running MATLAB in this way requires a MATLAB MDCS (MATLAB Distributed Computing Server) license. Most users have such a license available (including e.g., SDU users), but some do not (currently e.g., AAU users).

To check whether you have a valid MDCS license available run the following commands. If no errors are shown, you are ready to go. Otherwise you are welcome to contact us at support@escience.sdu.dk.

testuser@fe1:~$ module purge ; module add matlab/R2017a
...
testuser@fe1:~$ matlab -dmlworker -nodisplay -r exit

< M A T L A B (R) >
Copyright 1984-2017 The MathWorks, Inc.
R2017a (9.2.0.538062) 64-bit (glnxa64)
February 23, 2017

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


Configuration

Download the configuration files corresponding to the MATLAB version running on your own computer.

Unzip/untar the file and place the contents in one of these two locations.

  • If you have the right to do this on your computer, use the subfolder toolbox/local inside the main MATLAB program folder. You can use the command matlabroot inside MATLAB to get the root folder of MATLAB, and then add toolbox/local, e.g.,
    • C:\Program Files\MATLAB\R2017a\toolbox\local
    • /Applications/MATLAB_R2017a.app/toolbox/local
    • /usr/local/matlab/r2017a/toolbox/local

Otherwise use the folder returned by the command userpath inside MATLAB (usually My Documents\MATLAB or Documents\MATLAB).

  • Start/restart MATLAB.
  • Configure MATLAB to run parallel jobs on the SDU cluster by calling configCluster. configCluster only needs to be called once per version of MATLAB (e.g. R2015a, R2015b, etc.)
>> configCluster
Username on ABACUS (e.g. joe): testuser
Clearing all ClusterInfo settings.

Before submitting a job to ABACUS, you must specify the account name.

>> % E.g. set account name to sdutest_slim
>> ClusterInfo.setProjectName('sdutest_slim')

Before submitting a job to ABACUS, you must specify the wall time.

>> % E.g. set wall time to 1 hour
>> ClusterInfo.setWallTime('01:00:00')

>> ClusterInfo.setProjectName('sdutest_slim')
>> ClusterInfo.setWallTime('01:00:00')
>>
>> % Jobs will now default to the cluster rather than running locally

Note that the projectName as specified in ClusterInfo.setProjectName('sdutest_slim') is actually the Slurm account name, and must contain the node type as a suffix, i.e., _slim, _gpu, or _fat.

Credentials
At Abacus we only support access using ssh keys. The first time you submit a job to the cluster, you are asked for the location of your ssh private key file (usually id_rsa found in ~/.ssh – see our guide on how to use SSH keys. You are also asked for the passphrase of the key. Note that MATLAB _cannot_ use a private key in the Windows Putty ppk format. You must use a key in OpenSSH format (the default format on Mac+Linux). For further information, look at our Windows SSH setup guide.

Both the user name and the location of the private key are stored with MATLAB so that you are not prompted for it at a later time. The passphrase is not saved, i.e., you are asked each time you start a new MATLAB session. If your key does not have a passphrase, you simply leave the passphrase field blank when asked.

Serial jobs
Use the batch command to submit asynchronous jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted job. See the example below and see the MATLAB documentation for more help on batch.

Note: In the example below, wait is used to ensure that the job has completed before requesting results. In regular use, one would not use wait, since a job might take an elongated period of time, and the MATLAB session can be used for other work while the submitted job executes.

>> % Get a handle to the cluster
>> c = parcluster;
>>
>> % Submit a batch job to query where MATLAB is running on the cluster
>> % The first time you do this, you are asked for your credentials (see above)
>> j = c.batch(@pwd, 1, {}, 'CurrentFolder', '.');

additionalSubmitArgs =

-n 1 -A sdutest_slim -t 01:00:00 --licenses=matlab:1

>>
>> % Wait for the job to finish before querying for results
>> j.wait
>>
>> % Now that the job has completed, fetch the results
>> j.fetchOutputs{:}

ans =

/gpfs/gss1/home/testuser

>>
>> % No longer need the results, so delete the job
>> j.delete
>>

If you leave out the 'CurrentFolder', '.' part of c.batch, you get a warning about MATLAB not being able to change to a nonexistent directory. This can be ignored.

To retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run. This allows us to fetch the results of completed jobs. Retrieve and view the list of jobs as shown below.

>> % Retrieve the results of past jobs from the cluster
>> jobs = c.Jobs

jobs =

4x1 Job array:

ID Type State FinishDateTime Username Tasks
------------------------------------------------------------------
1 1 independent finished 23-Mar-2017 15:27:47 testuser 1
2 2 pool finished 23-Mar-2017 15:32:03 testuser 1
3 3 pool finished 07-Apr-2017 14:38:36 testuser 17
4 4 independent queued testuser 1

>>

Once we’ve identified the job we want, we can retrieve the results as we’ve done previously. If the job produces an error, we can call the getDebugLog method to view the error log file. The error log can be lengthy and is not shown here. The example below will retrieve the results of job #3.

NOTE: fetchOutputs is used to retrieve function output arguments. Data that has been written to files on the cluster needs be retrieved directly from the file system.

>> % Retrieve the results from the 3rd job
>> j3 = jobs(3);
>> j3.fetchOutputs{:}

ans =

/gpfs/gss1/home/testuser

>> % For debugging, retrieve the output/error log file
>> j3.Parent.getDebugLog(j3.Tasks(1))
LOG FILE OUTPUT:
Executing: /opt/sys/apps/matlab/R2017a/bin/worker

< M A T L A B (R) >
Copyright 1984-2017 The MathWorks, Inc.
R2017a (9.2.0.538062) 64-bit (glnxa64)
February 23, 2017

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

2017-04-07 14:38:34 | About to exit MATLAB normally
2017-04-07 14:38:34 | About to exit with code: 0

>>


Parallel jobs

You can also submit parallel workflows with batch. Let’s use the following example for a parallel job. The file is also available on the cluster as /opt/sys/documentation/sbatch-scripts/matlab/parallel_example.m

%
% parallel_example.m
%

function t = parallel_example(iter)

if nargin==0, iter = 16; end

disp('Start sim');

t0 = tic;
parfor idx = 1:iter
    A(idx) = idx;
    pause(2);
end
t = toc(t0);

disp('Sim completed');

We’ll use the batch command again, but since we’re running a parallel job, we’ll also specify a MATLAB Pool.

>> % 8 workers for 16 sims
>> j = c.batch(@parallel_example, 1, {}, 'pool', 8, 'CurrentFolder', '.');

additionalSubmitArgs =

-n 9 -A sysops_workq -t 01:00:00 --licenses=matlab:9

>>
>> % Wait for the job to finish before querying for results
>> j.wait
>>
>> % Now that the job has completed, fetch the results
>> j.fetchOutputs{:}

ans =

4.6571

>>

The job ran in 4.66 seconds using eight workers. Note that these jobs will always request N+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will consume nine CPU cores.

We’ll run the same simulation, but increase the pool size. Note, for some applications, there will be a diminishing return when allocating too many workers. This time, to retrieve the results at a later time, we’ll keep track of the job ID.

>> % 16 workers for 16 sims
>> j = c.batch(@parallel_example, 1, {}, 'pool', 16, 'CurrentFolder', '.');

additionalSubmitArgs =

-n 17 -A sysops_workq -t 0-1 --licenses=matlab:17

>> % Get the job ID so that we can retrieve the results of the job after quitting
>> id = j.ID

id =

5

>>
>> % clear the "j" variable as if we quit MATLAB
>> clear j
>>

Once we have a handle to the cluster, we can later call the findJob method to search for the job with the specified job ID.

>> % Get a handle to the cluster
>> c = parcluster;
>>
>> % Find the old job
>> j = c.findJob('ID', 5);
>>
>> % Check that the state is finished
>> j.State

ans =

finished

>>
>> % Now that the job has completed, fetch the results
>> j.fetchOutputs{:}

ans =

2.4176

>>
>> % For debugging, retrieve the output/error log file (not included here)
>> j.Parent.getDebugLog(j)

The job now runs in 2.42 seconds using 16 workers. Run the code with different numbers of workers to determine the ideal number to use.

Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor, which can be found under Parallel | Monitor Jobs as shown below.


Configuring jobs
Prior to submitting the job, we can specify:

  • Account/project, e.g. sdutest_slim
  • Email Notification (when the job is running, exiting, or aborting), and
  • Wall time

Specification is done with ClusterInfo. The ClusterInfo class supports tab completion to ease recollection of method names.

>> % configure job: Account, Email Notification, and Wall Time
>> c = parcluster;
>>
>> % use sdutest_slim account
>> ClusterInfo.setProjectName('sdutest_slim')
>>
>> % Specify email notification and end of job
>> % (you can also use e.g., ALL, see --mail-type in man sbatch)
>> ClusterInfo.setEmailNotification('END')
>>
>> % Request 2 hours of wall time
>> ClusterInfo.setWallTime('02:00:00')
>>
>> % 8 workers for 32 sims
>> j = c.batch(@parallel_example, 1, {32}, 'pool', 8, 'CurrentFolder', '.');

additionalSubmitArgs =

-n 9 -A sdutest_slim -t 02:00:00 --mail-type="END" --licenses=matlab:9

Any parameters set with ClusterInfo will be persistent both between jobs and MATLAB sessions. To see the values of the current configuration options, call the state method. To clear a value, assign the property the appropriate empty value (”, [], or false).

>> ClusterInfo.state

Arch :
ClusterHost :
DataParallelism :
DiskSpace :
EmailNotification : END
GpusPerNode :
MemUsage :
PrivateKeyFile : /Users/testuser/.ssh/id_rsa
PrivateKeyFileHasPassPhrase : 1
ProcsPerNode :
ProjectName : sdutest_slim
QueueName :
RequireExclusiveNode : 0
Reservation :
SshPort :
UseGpu : 0
UserDefinedOptions :
UserNameOnCluster : testuser
WallTime : 02:00:00
>>
>> % Turn off email notification
>> ClusterInfo.setEmailNotification('')
>>


To learn more
To learn more about the MATLAB Parallel Computing Toolbox, check out these resources from MathWorks:


MATLAB Hosting Provider Agreement

Abacus 2.0 has a Hosting Provider Agreement with MathWorks that allows us to install MATLAB on our system and have users provide their own licence to access it. In particular, if your university has a licence for MATLAB that you can use on the computer in your office then it will most likely be possible to use it on our systems.

To do this you need to checkout a licence directly from your university’s licence servers – contact us at support@escience.sdu.dk to determine if this has already been setup. Please use your official university e-mail address for this.

If it’s not already set up, we will ask you to contact your licence administrator to ask them to provide us with the details of their licence server.

Currently we have a setup for users from AU, AAU, DTU, KU, and SDU.


Known issues

List of currently known issues.

Parpool does not work
If you try to submit a job using parpool in MATLAB (including if you do a “Parallel pool test” in the MATLAB Cluster Profile Manager), this will fail with an error about parpool not being supported. This is an expected error, and as shown in the error message, you should instead use batch as shown in the examples above.

>> parpool('abacus_remote_r2017a',12)
...
Starting parallel pool (parpool) using the 'abacus_remote_r2017a' profile ...
Error using parpool (line 104). Failed to start a parallel pool.
...
****************************************************
abacus_remote_r2017a does not support calling
>> parpool('abacus_remote_r2017a',12)
Instead, use batch()
>> job = batch(...,'pool',12);
Call
>> doc batch
for more help on using batch.
****************************************************


Warning: Unable to change to requested folder
When submitting jobs using batch, you must also supply two additional parameters (CurrentFolder', '.'), otherwise you get a warning about MATLAB not being able to change to the requested folder. When submitting the job MATLAB records which folder the job was starting from on you own computer/laptop and tries to start the job on Abacus in the same folder. As the folder most probably does not exist, you get a warning. The warning can be ignored.

>> j = c.batch(@pwd, 1, {}); j.wait; j.fetchOutputs{:}

additionalSubmitArgs =

'-n 1 -A sysops_workq -t 01:00:00 --licenses=matlab:1'

Warning: The task with ID 1 issued the following warnings:
Warning: Unable to change to requested folder:
'/Users/testuser/Documents/MATLAB'. Current folder is:
'/gpfs/gss1/home/testuser'.
Reason: Cannot CD to /Users/testuser/Documents/MATLAB
(Name is nonexistent or not a directory).

ans =

'/gpfs/gss1/home/testuser'


Could not connect session fe.deic.sdu.dk
When you try to connect to Abacus from MATLAB on your own computer, you might get an error similar to the following. The error can be due several factors, including in particular:

  • The username is not correct: Check the output of ClusterInfo.state
  • The used ssh private key has not been uploaded to the Abacus admin home page.
  • The SSH Private Key is not in the right format: Check the output of ClusterInfo.state, and check that the referred file is a text file similar in format to the (much shorter) SSH _public_ key you have uploaded to Abacus.
  • In particular for Windows users: Check that the private key is in OpenSSH format and not Putty ppk format (which does not work for MATLAB).

If you have checked the above and cannot make this work, restart MATLAB. Try to connect again and send us the output of ClusterInfo.state together with the exact time you tried to connect.

>> j = c.batch(@pwd, 1, {});
Error using parallel.Cluster/batch (line 154)
Job submission failed because the user supplied IndependentSubmitFcn
(profiles.sdu.abacus.independentSubmitFcn) errored.

Caused by:
Error using parallel.cluster.RemoteClusterAccess.getConnectedAccessWithMirror (line 262)
Could not connect to remote host fe.deic.sdu.dk.
Error using parallel.cluster.RemoteClusterAccess/connect (line 380)
Could not connect session fe.deic.sdu.dk: .