BioWDL: QC

A BioWDL workflow for gathering QC metrics on and prepocessing FASTQ files.

This repository contains the BioWDL workflow which can be used for quality control preprocessing and reporting of sequencing data.

This workflow is part of BioWDL developed by the SASC team at Leiden University Medical Center.

Usage

This workflow can be run using Cromwell:

First download the latest version of the workflow wdl file(s) from the github page.

The workflow can then be started with the following command:

java \
    -jar cromwell-<version>.jar \
    run \
    -o options.json \
    -i inputs.json \
    QC.wdl

Inputs

Inputs are provided through a JSON file. The minimally required inputs are described below, but additional inputs are available. A template containing all possible inputs can be generated using Womtool as described in the WOMtool documentation. For an overview of all available inputs, see this page.

{
    "QC.read1": "Path to file with forward reads / unpaired reads"
}

QC.read1 is the only required input. In case of read pairs the reverse read can be set with QC.read2.

Optional settings:

{
    "QC.read2": "Path to file with reverse reads.",
    "QC.adapterForward":  "The adapter for the forward reads (read1), default = \"AGATCGGAAGAG\".",
    "QC.adapterReverse": "The adapter for the reverse reads (read2), default = \"AGATCGGAAGAG\".",
    "QC.contaminations": "A list of contaminations to be cleaned with cutadapt (Optional).",
    "QC.runAdapterClipping": "Can be set to false to prevent cutadapt from running.",
    "QC.readgroupName": "What basename should be used to save the fastq files. By default will use the name of the fastq as in <name>.fq.gz.",
}

An output directory can be set using an options.json file. See the cromwell documentation for more information.

Example options.json file:

{
    "final_workflow_outputs_dir": "my-analysis-output",
    "use_relative_output_paths": true,
    "default_runtime_attributes": {
        "docker_user": "$EUID"
    }
}

Alternatively an output directory can be set with QC.outputDir. QC.outputDir must be mounted in the docker container. Cromwell will need a custom configuration to allow this.

Example

The following is an example of what an inputs JSON might look like:

{
    "QC.read1":"/home/user/samples/sample_1/lib_1/rg_1/R1.fq.gz",
    "QC.read2":"/home/user/samples/sample_1/lib_1/rg_1/R2.fq.gz",
    "QC.adapterForward": ["AGATCGGAAGAG"],
    "QC.adapterReverse": ["AGATCGGAAGAG"]
}

Note that adapterBoth uses a list of strings instead of a single string. This is because cutadapt accepts multiple adapters.

Dependency requirements and tool versions

Biowdl workflows use docker images to ensure reproducibility. This means that biowdl workflows will run on any system that has docker installed. Alternatively they can be run with singularity.

For more advanced configuration of docker or singularity please check the cromwell documentation on containers.

Images from biocontainers are preferred for biowdl workflows. The list of default images for this workflow can be found in the default for the dockerImages input.

Output

A new set of FASTQ files from which detected adapters have been clipped and a set of quality reports.

Contact

For any questions about running this workflow and feature requests (such as adding additional tools and options), please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.