This is not a stable version!
You are currently viewing the documentation for a development version. It is not guaranteed that this documentation is up to date. Things will likely change without announcement or versioning incrementation. If there is no other documentation available, there are likely no releases available for this repository. The content is, therefore, likely still in development and not production ready. Use at your own risk!
Please be aware that the page you are currently viewing is not for the latest available version!
There are two workflows provided for SV detection. The first operates on a single and sample merges the SV predictions from Delly, Manta, GRIDSS, Smoove and optionally Mateclever using SURVIVOR merge. The second utilizes the somatic SV calling capabilities of Delly, Manta and GRIDSS to call somatic structural variants across multiple tumor-normal pairs.
This pipeline is part of BioWDL developed by the SASC team at Leiden University Medical Center.
Usage
You can run the pipeline using Cromwell:
# germline
java -jar cromwell-<version>.jar run -i inputs.json structural-variantcalling.wdl
# somatic
java -jar cromwell-<version>.jar run -i inputs.json somatic.wdl
Inputs are provided through a JSON file. The minimally required inputs are described below, but additional inputs are available. A template containing all possible inputs can be generated using Womtool as described in the WOMtool documentation.
structural-variantcalling.wdl
For overviews of all available inputs, see this page
{
"SVcalling.outputDir": "The directory the output should be written to.",
"SVcalling.referenceFasta": "The reference fasta file",
"SVcalling.referenceFastaFai": "Fasta index (.fai) file of the reference",
"SVcalling.referenceFastaDict": "Sequence dictionary (.dict) file of the reference",
"SVcalling.bamFile": "sorted BAM file",
"SVcalling.bamIndex": "BAM index(.bai) file",
"SVcalling.bwaIndex": "Struct containing the BWA reference files",
"SVcalling.sample": "The name of the sample"
}
Example
The following is an example of what an inputs JSON might look like:
{
"SVcalling.bamIndex": "tests/data/ref_with_svs.bai",
"SVcalling.sample": "sample",
"SVcalling.referenceFastaFai": "tests/data/reference/reference.fasta.fai",
"SVcalling.referenceFasta": "tests/data/reference/reference.fasta",
"SVcalling.bamFile": "tests/data/ref_with_svs.bam",
"SVcalling.referenceFastaDict": "tests/data/reference/reference.dict",
"SVcalling.bwaIndex": {
"fastaFile": "tests/data/reference/bwa/reference.fasta",
"indexFiles": [
"tests/data/reference/bwa/reference.fasta.sa",
"tests/data/reference/bwa/reference.fasta.amb",
"tests/data/reference/bwa/reference.fasta.ann",
"tests/data/reference/bwa/reference.fasta.bwt",
"tests/data/reference/bwa/reference.fasta.pac"
]
}
}
somatic.wdl
For overviews of all available inputs, see this page
{
"SomaticSvCalling.normalIds": "The IDs of the normal samples in the same order as the BAM files.",
"SomaticSvCalling.normalBams": "The BAM files for the normal samples in the same order as IDs.",
"SomaticSvCalling.normalBamIndexes": "The indexes for the normal BAM files in the same order as the BAM files.",
"SomaticSvCalling.tumorIds": "The IDs of the tumor samples in the same order as the BAM files.",
"SomaticSvCalling.tumorBams": "The BAM files for the tumor samples in the same order as IDs.",
"SomaticSvCalling.tumorBamIndexes": "The indexes for the tumor BAM files in the same order as the BAM files.",
"SomaticSvCalling.pairs": "The tumor-normal pairs. The left element is the ID for the tumor and the right element is the ID for the associated normal.",
"SomaticSvCalling.referenceFasta": "The FASTA file for the reference genome.",
"SomaticSvCalling.referenceFastaFai": "The index for the reference genomes FASTA file.",
"SomaticSvCalling.bwaIndex": "The BWA index for the reference genome.",
"SomaticSvCalling.outputDir": "The directory the output should be written to."
}
The somatic workflow will use all control samples as a panel of normals for both Delly and
GRIDSS. For GRIDSS it is possible to provide a premade PON through the
SomaticSvCalling.gridssPonBed
and SomaticSvCalling.gridssPonBedpe
inputs.
Dependency requirements and tool versions
Biowdl pipelines use docker images to ensure reproducibility. This means that biowdl pipelines will run on any system that has docker installed. Alternatively they can be run with singularity.
For more advanced configuration of docker or singularity please check the cromwell documentation on containers.
Images from biocontainers are preferred for
biowdl pipelines. The list of default images for this pipeline can be
found in the default for the dockerImages
input.
Output
The single sample pipeline produces VCF files from Delly, Manta, (optionally) Mateclever, GRIDSS, and Smoove, as well as the merged VCF file from SURVIVOR.
The somatic pipeline produces VCF/BCF files for Delly, Manta and GRIDSS. If not GRIDSS PON was provided then A PON BED and BEDPE files will also be generated.
Contact
For any questions about running this pipeline and feature request (such as adding additional tools and options), please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.