This repository contains the BioWDL workflow which can be used for quality control preprocessing and reporting of sequencing data.
This workflow is part of BioWDL developed by the SASC team at Leiden University Medical Center.
Usage
This workflow can be run using Cromwell:
First download the latest version of the workflow wdl file(s) from the github page.
The workflow can then be started with the following command:
java \
-jar cromwell-<version>.jar \
run \
-o options.json \
-i inputs.json \
QC.wdl
Inputs
Inputs are provided through a JSON file. The minimally required inputs are described below, but additional inputs are available. A template containing all possible inputs can be generated using Womtool as described in the WOMtool documentation. For an overview of all available inputs, see this page.
{
"QC.read1": "Path to file with forward reads / unpaired reads"
}
QC.read1
is the only required input. In case of read pairs the reverse
read can be set with QC.read2
.
Optional settings:
{
"QC.read2": "Path to file with reverse reads.",
"QC.adapterForward": "The adapter for the forward reads (read1), default = \"AGATCGGAAGAG\".",
"QC.adapterReverse": "The adapter for the reverse reads (read2), default = \"AGATCGGAAGAG\".",
"QC.contaminations": "A list of contaminations to be cleaned with cutadapt (Optional).",
"QC.runAdapterClipping": "Can be set to false to prevent cutadapt from running.",
"QC.readgroupName": "What basename should be used to save the fastq files. By default will use the name of the fastq as in <name>.fq.gz.",
}
An output directory can be set using an options.json
file. See the
cromwell documentation for more
information.
Example options.json
file:
{
"final_workflow_outputs_dir": "my-analysis-output",
"use_relative_output_paths": true,
"default_runtime_attributes": {
"docker_user": "$EUID"
}
}
Alternatively an output directory can be set with QC.outputDir
.
QC.outputDir
must be mounted in the docker container. Cromwell will
need a custom configuration to allow this.
Example
The following is an example of what an inputs JSON might look like:
{
"QC.read1":"/home/user/samples/sample_1/lib_1/rg_1/R1.fq.gz",
"QC.read2":"/home/user/samples/sample_1/lib_1/rg_1/R2.fq.gz",
"QC.adapterForward": ["AGATCGGAAGAG"],
"QC.adapterReverse": ["AGATCGGAAGAG"]
}
Note that adapterBoth
uses a list of strings instead of a single string.
This is because cutadapt accepts multiple adapters.
Dependency requirements and tool versions
Biowdl workflows use docker images to ensure reproducibility. This means that biowdl workflows will run on any system that has docker installed. Alternatively they can be run with singularity.
For more advanced configuration of docker or singularity please check the cromwell documentation on containers.
Images from biocontainers are preferred for
biowdl workflows. The list of default images for this workflow can be
found in the default for the dockerImages
input.
Output
A new set of FASTQ files from which detected adapters have been clipped and a set of quality reports.
Contact
For any questions about running this workflow and feature requests (such as adding additional tools and options), please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.