Skip to main content

Workflow Services Overview

Workflow service provides for the ability to run Common Workflow Language executions, which is an open standard for describing analysis workflows and tools. The workflow service makes use of Cromwell as the orchestration layer to manage steps within a workflow. A custom plugin was developed so that the step executions are performed in a highly secure environment by Task Service. You initiate CWL executions either through the LifeOmic CLI (Command Line Interface), through the PHC Console Add Workflow page, or through the PHC SDK for Python.

Before using the CLI or the PHC web console to run workflows, users need to log into their PHC account at

General concept

The workflow service requires that all CWL resource and dependency files exist within the PHC File Service. Once all resources are in place, then you execute a run of the CWL through the CLI or the PHC web console. The Automation page lists all workflows and their current states, start times, and run times.

Automation Landing Page

From this page, select an individual execution. This view displays a graph of the workflow using the Rabix CWL-SVG open source library for generating visualizations. It also lists the individual steps of the workflow as run by the Task Service.

View Workflow

Note: Click the folder icon at the top right of this view to display the workflow files. The files include all of the outputs generated by the workflow. Each output is saved in a directory with the name of the CWL step that generated it.

View Workflow File Outputs

Use the CLI to run a basic workflow

The following step demonstrates how to run a simple workflow that generates an index for a BAM file. It's important to note that CWL has a fairly broad syntax and the below is just a simple example. Reference Common Workflow Language for more general information. Workflow service only implements a subset of the full CWL feature set, reference Workflow Service limitations for current limitations.

To use the PHC web console to run a workflow, see Add a Workflow.

Generate and upload the CWL resources

Here is a sample master CWL

This file describes:

  • Two inputs, a bamfile and a filename for the index of the BAM file

  • One output, the index file that will have the name provided by the input above

  • One step, this gives a name to the step index_bam and the name of the CWL file bamindex.cwl that will execute the step

Generate this file then upload it using the CLI, ex lo files upload ./bam_master.cwl <datasetId>

cwlVersion: v1.0
class: Workflow

bamfile: File
bamindexfilename: string

type: File
outputSource: index_bam/bamindexout

run: bamindex.cwl
bamfile: bamfile
bamindexfilename: bamindexfilename
out: [bamindexout]
Here is a sample CWL for the step CWL

This file describes:

  • The type of tool used, in this case CommandLineTool

  • The Docker container that will run the step

    Note: The workflow service requires that all steps use a docker container for execution. This allows for the secure execution within task service

  • Two inputs, the BAM file and the index filename

    Note: In this example the two inputs are also used as arguments to the baseCommand notice the inputBinding and position values.

  • One output, in this case the input filename is re-used to name the output file

  • The command that the container runs

Generate this file then upload it using the CLI, ex lo files upload ./bamindex.cwl <datasetId>

cwlVersion: v1.0
class: CommandLineTool
dockerPull: genomicpariscentre/samtools

type: File
position: 1
type: string
position: 2
type: File
glob: $(inputs.bamindexfilename)

baseCommand: ['samtools', 'index']
Next a JSON file provides the inputs

This file describes:

  • A file input, using the class File and the ID of the file

  • The name of the desired output file

"bamfile": {
"class": "File",
"fileId": "805209e1-35cb-49f3-a5cc-327a93d1f72d"
"bamindexfilename": "HG00463.bam.bai"

Generate this file then upload it using the CLI, ex lo files upload ./bam_inputs.json <datasetId>

Finally we are ready to run the workflow using the CLI

lo workflows create <datasetId> -n "BAM Indexing" -w <masterCwlFileId> -f <inputsFileId> -d <cwlDependenciesFileId>

And that's it, the workflow is running. You can go to the Automation View to see the list of workflows and select the one you've started to look at in detail.

Using a non-public image

Task service can make use of non-public docker images, ref Using a non public image. The syntax to make use of this in workflow service is as follows:

Using the required DockerRequirement, prefix the name of the private container with lifeomic_private/. This informs workflow service to handle the image as a non-public image. Then add a file input type to the CWL master and step files, treating it as any other file input.

dockerPull: lifeomic_private/my_private_image

Using an image from the Tool Registry Service

Task service can make use of an image stored within the Tool Registry Service. The syntax to make use of this in workflow service is as follows:

Using the required DockerRequirement, prefix the name of the tool with lifeomic_tool/ and then add the account that owns the image account_owning_image/. Lastly, add the image name and optional version my_tool_image:1.0.0. If no version is supplied, the version currently marked as default will be used. The complete path informs workflow service all the details needed to pull the image of that name owned by that account from the Tool Registry Service.

dockerPull: lifeomic_tool/account_owning_image/my_tool_image:1.0.0

Glob Pattern Handling

The supported syntax for handling glob patterns in output files has some limitations when the pattern includes multiple unknown directories. The following examples explain in detail this limitation.

  • The pattern /tmp/**/*.txt will look for a *.txt file within any one sub directory. This is our best use case.
    • For example, pattern /tmp/**/*.txt would find output.txt given location /tmp/foo/output.txt
  • The pattern /tmp/**/*.txt should also find *.txt files under multiple sub directories, but currently we are limited to one sub directory.
    • For example, pattern /tmp/**/*.txt would not find output.txt give location /tmp/foo/bar/output.txt
  • If the number of sub directories is known, this pattern may be used to get through this limitation by including /** for each directory.
    • For example, pattern /tmp/**/**/*.txt will find output.txt given location /tmp/foo/bar/output.txt

Workflow Service limitations

The full CWL syntax is not currently supported. While some CWL Requirements are required, i.e. DockerRequirement others most likely will not be supported due to security concerns, i.e. InlineJavascript. However, as we add support for the other Requirements they will be listed here. Due to the explicit nature of the file service handling, CWL secondary files are also not supported. Each file needs to be explicitly listed as a file input and ID provided in the inputs.

Supported Requirements

  • DockerRequirement (also a required value)