Create a Grid Engine cluster with Preemptible VM workers¶
The properly rendered version of this document can be found at Read The Docs.
If you are reading this on github, you should instead click here.
With Compute Engine Preemptible Virtual Machines you can create and run VMs in the cloud at a much lower price than normal instances. The trade-off for the lower price is that individual instances will run for at most 24 hours.
This trade-off is often a very good fit for distributed batch compute jobs, such as a large Grid Engine job consisting of many small stateless tasks. When a preemptible VM is terminated, only the work of the current task running on it is lost, and the lost task can be requeued for execution on another worker node. The preempted VM can also often be replaced to bring the cluster back to full strength.
This document builds on the instructions to Create a Grid Engine cluster on Compute Engine to create a Grid Engine cluster of preemptible VMs.
The instructions presented here are guidelines that have been used to create clusters up to 100 nodes. However when preemption rates are high, Elasticluster’s re-provisioning of clusters (via Ansible) often converges too slowly due to repeated failures.
For best success with the instructions here, it is recommended to keep cluster sizes to 20 compute nodes or fewer. For larger clusters, use regular (non-preemptible) virtual machines.
To run a Grid Engine workload on preemptible VMs, the instructions here employ three tools:
Setting up your cluster¶
To create a Grid Engine cluster with preemptible VMs, follow the instructions provided in Create a Grid Engine cluster on Compute Engine and configure the compute nodes of your cluster to be preemptible.
A gridengine cluster is composed of one
frontend node, and multiple
compute nodes. The
frontend node should NOT be preemptible. Only the
compute nodes should be.
If your cluster is named
gridengine then after you
Create your cluster definition file,
configure the compute nodes to be preemptible by adding the following to
Monitoring your cluster¶
As the cluster runs, Compute Engine instances will be automatically terminated independently some time within 24 hours of being created. You typically will want the following to happen:
- TERMINATED instances get removed from the cluster
- New instances are created to replace TERMINATED instances
This is what the cluster_monitor.sh script does.
The cluster monitoring script is available in the grid-computing-tools github project.
Downloading the grid-computing-tools¶
To clone the grid-computing-tools github project, issue the following:
git clone https://github.com/googlegenomics/grid-computing-tools.git
It is recommended that you clone the
grid-computing-tools project into a
sibling directory of your elasticluster directory.
Be sure that the elasticluster executable is in your PATH. You can do this by setting the PATH explicitly or by activating the elasticluster virtualenv in your shell:
Then run the cluster monitor script:
The script will run continuously; to terminate the script, hit
By default, the monitor will check the cluster status and then sleep for 10 minutes. To change the sleep interval, you can pass an additional argument on the command line, for example:
grid-computing-tools/bin/cluster_monitor.sh gridengine 5
would sleep for 5 minutes between checks.
To grow your cluster¶
To increase the number of workers in your cluster while it is running,
compute_nodes value in
For example, to increase the number of compute nodes from the 3
specified in the Create a Grid Engine cluster on Compute Engine instructions to 10, set:
[cluster/gridengine] ... compute_nodes=10 ...
The next time the cluster monitor wakes up, it will add nodes to the cluster to reach the new value.
To shrink your cluster¶
To reduce the number of workers in your cluster while it is running,
compute_nodes value in
As the preemptible VMs are terminated, the cluster monitor will remove them from the cluster, and will only replace instances if the total number in the cluster is less than the configured value. You can also manually terminate instances if desired.
Monitoring your job¶
When nodes are TERMINATED, any tasks running on those nodes need to be restarted. If the TERMINATED node is re-added by the cluster monitor, and the task is NOT submitted for restart, then the new node may sit idle (if the new node has the same name as the TERMINATED node).
Independent of node terminations, tasks can also stall due to programming bugs or unexpected resource contention. Failing to restart stalled tasks results in a node effectively sitting idle.
- For each task allocated to a node:
- Get the associated node’s uptime
- Restart the task if
- the node is down
- the node’s uptime is less than the task’s running time (meaning that the node has been replaced since the task started)
- the task runtime is longer than a configurable timeout interval (optional)
Note: when you launch your job on the Grid Engine cluster, be sure to mark
the job as “restartable”. This can be done by passing the flag
-r y to
Upload the job monitor script¶
The job monitor script must be run on the cluster’s
elasticluster sftp gridengine << EOF mkdir tools put tools/array_job_monitor tools/ EOF
Run the job monitor script¶
To run the
array_job_monitor.sh, ssh to the frontend instance:
elasticluster ssh gridengine
- Grid Engine job ID to monitor
Minutes to sleep between checks of running tasks
Default: 15 minutes
Number of minutes a task may run before it is considered stalled, and is eligible to be resubmitted.
Grid Engine job queue the job_id is associated with
For example, to monitor job 1, every 5 minutes, for jobs that should not take more than 10 minutes:
./tools/array_job_monitor.sh 1 5 10