Please be aware that the page you are currently viewing is not for the latest available version!
This repository contains the BioWDL workflow which can be used for quality control preprocessing and reporting of sequencing data.
These workflows are part of BioWDL developed by the SASC team at Leiden University Medical Center.
Usage
QC.wdl
can be run using
Cromwell:
java -jar cromwell-<version>.jar run -i inputs.json QC.wdl
Input
Inputs are provided through a JSON file. The minimally required inputs are described below, but additional inputs are available. A template containing all possible inputs can be generated using Womtool as described in the WOMtool documentation. For an overview of all available inputs, see this page.
{
"QC.read1": "Path to file with forward reads / unpaired reads"
}
QC.read1
is the only required input. In case of read pairs the reverse
read can be set with QC.read2
.
Optional inputs:
{
"QC.read2": "Path to file with reverse reads",
"QC.adapterForward": "The adapter for the forward reads (read1), default = \"AGATCGGAAGAG\"",
"QC.adapterReverse": "The adapter for the reverse reads (read2), default = \"AGATCGGAAGAG\")",
"QC.contaminations": "A list of contaminations to be cleaned with cutadapt (Optional)",
"QC.runAdapterClipping": "Can be set to false to prevent cutadapt from running.",
"QC.readgroupName": "What basename should be used to save the fastq files. By default will use the name of the fastq as in <name>.fq.gz",
}
An output directory can be set using an options.json
file. See the
cromwell documentation for more
information.
Example options.json
file:
{
"final_workflow_outputs_dir": "my-analysis-output",
"use_relative_output_paths": true,
"default_runtime_attributes": {
"docker_user": "$EUID"
}
}
Alternatively an output directory can be set with QC.outputDir
.
QC.outputDir
must be mounted in the docker container. Cromwell will
need a custom configuration to allow this.
Example
An example of an inputs.json might look like this:
{
"QC.read1":"/home/user/samples/sample_1/lib_1/rg_1/R1.fq.gz",
"QC.read2":"/home/user/samples/sample_1/lib_1/rg_1/R2.fq.gz",
"QC.Cutadapt.adapter": ["AGATCGGAAGAG"],
"QC.Cutadapt.adapterRead2": ["AGATCGGAAGAG"]
}
Note that adapterBoth
uses a list of strings instead of a single string.
This is because cutadapt accepts multiple adapters.
Dependency requirements and tool versions
Biowdl pipelines use docker images to ensure reproducibility. This means that biowdl pipelines will run on any system that has docker installed. Alternatively they can be run with singularity.
For more advanced configuration of docker or singularity please check the cromwell documentation on containers.
Images from biocontainers are preferred for
biowdl pipelines. The list of default images for this pipeline can be
found in the default for the dockerImages
input.
Output
A new set of FASTQ files from which detected adapters have been clipped and a set of quality reports.
Contact
For any question related to these workflows, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.