Singularity Containers on Spiedie (advanced)

This guide illustrates the use of Singularity containers to run a job on the GPU compute node on Spiedie.

If you’de like to run the same job, but using Conda instead of Singularity, click here!

Click here for more information on when to use Conda vs. Sinularity vs. Modules

Things covered in this guide:

  1. Creating or pulling Singularity containers
  2. Binding module directories to container environment
  3. Running a GPU-enabled workload

Requirements to complete this guide:

  1. Familiarity with Spiedie (try the quick start if you haven’t)
  2. Local installation of Singularity
  3. Familiarity with shell commands and python

Creating or Pulling Singularity Containers

For this tutorial, we will be using a custom singularity image to run gpu-enabled Tensorflow on a GPU-compute node.

We have a few options for creating our container image. We can either create the image on a local machine and transfer it to Spiedie or we can generate the image on the singularity-hub container repository.

Creating Singularity Containers

If you would like to create the image locally first, you must install Singularity on a machine in which you have root access.

We will use the verified TensorFlow-GPU recipe available on the Singularity Recipe Hub.

Once you have downloaded the recipe, build the image by simply running:

sudo singularity build spiedie_tf_gpu.simg spiedie_tensorflow_gpu.def 

** Note: This may take a few minutes depending on the speed of your machine. The resulting .simg file may >3 GB **

Once the container image is created, we will need to transfer image to Spiedie.

Log in to Spiedie and create a new directory for our new project.

mkdir GPU_Compute_Example

Transfer the image using scp from your local machine:

scp spiedie_tf_gpu.simg

For more data transer instructions, click here

We will be running a simple 5000 element dot product on our P100 GPUs and logging the device placement. You can download here.

Download the source code and transfer it to the project directory from your local machine with:


Once the source code is uploaded, we can write the batch script to submit our job request.

Create a new file in the same directory on Spiedie called,


Using your preferred editor such as nano, emacs, or vim, edit the new file directly on Spiedie.

You can also locally write the script and transfer it once you are done.

The first line in the batch script must be the shebang. So we must have,


Next, we will name our job so we are able to monitor it if we wish to on the slurm queue. To assign a job name add :


This will name the job CUBLASTEST.

Next, we will assign output file to log all the standard output from our program.

#SBATCH --output=tf_gpu_output.log

This will direct the output of the program to the cuda_out.log file.

Next, we must request the correct partition for our program to properly run and have access to the P100 gpus available on Spiedie. We therefore request the gpucompute partition with:

#SBATCH --partition-gpucompute

We can use the default number of nodes (1) and default memory for this program.

Finally, we should also let SLURM know how many tasks we will require for our program. Since we will not be using any parallel CPU computation, we will only request one.

#SBATCH --ntasks=1

We’ve finished defining our resource allocation parameters for our job.

Binding Module Directories

We must first load the Singularity module installed on Spiedie. Append the following line to the batch scrip:

module load singularity/3.1.1

We need the CUDA drivers available on Spiedie to succesfully have access to the GPUs.

We must load the CUDA toolkit, and associated driver with.

module load cuda10.0/toolkit/10.0.130

The loaded library will not be available to the container automatically, so before we run the container, we must expose the module directory to the container.

We can add additional directories to the container, by simply binding those directories. We add additional paths by updating the SINGULARITY_BINDPATH variable.

We can also bind paths using the -B/–bind flag. click here for more details

export SINGULARITY_BINDPATH="/cm/shared/apps/cuda10.0/toolkit/current/lib64/,/cm/local/apps/cuda-driver/libs/418.40.04/lib64/"

Running a GPU-enabled Workload

Finally our environment is set up and we can run our test file. Add the line

singulariy run  spiedie_tf_gpu.simg python3.6

The complete file:

#SBATCH --output=tf_gpu_output.log
#SBATCH --partition-gpucompute
#SBATCH --ntasks=1

module load singularity/3.1.1
module load cuda10.0/toolkit/10.0.130
export SINGULARITY_BINDPATH="/cm/shared/apps/cuda10.0/toolkit/current/lib64/,/cm/local/apps/cuda-driver/libs/418.40.04/lib64/"
singulariy run  spiedie_tf_gpu.simg python3.6

Click here to download the complete batch file.

We can now queue the job to SLURM with:


The job should be queued and the output logged in tf_gpu_output.log.

You can run:

cat tf_gpu_output.log

You should see the following output:

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla P100-PCIE-12GB, pci bus id: 0000:83:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
2019-07-15 13:42:49.767029: I tensorflow/core/common_runtime/] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla P100-PCIE-12GB, pci bus id: 0000:83:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768241: I tensorflow/core/common_runtime/] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768275: I tensorflow/core/common_runtime/] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768300: I tensorflow/core/common_runtime/] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0