Workflows for somatic copy number variant (CNV) calling, based on the GATK CNV calling workflow/best practices.
There are two workflows:
- CNV-PON: generates the panel of normals required for pairedCnvCalling
- pairedCnvCalling: call CNVs on a case-control pair
These workflows are part of BioWDL developed by the SASC team at Leiden University Medical Center.
Usage
You can run the workflows using Cromwell:
java -jar cromwell-<version>.jar run -i inputs.json CNV-PON.wdl
java -jar cromwell-<version>.jar run -i inputs.json pairedCnvCalling.wdl
One should first run CNV-PON.wdl to generate the panel of normals and preprocessed interval list. These can then be used as input for pairedCnvCalling.wdl.
Inputs
Inputs are provided through a JSON file. The minimally required inputs are described below, but additional inputs are available. A template containing all possible inputs can be generated using Womtool as described in the WOMtool documentation. For an overview of all available inputs, see the following pages:
CNV-PON
required inputs
{
"PanelOfNormals.referenceFasta": "The reference fasta file.",
"PanelOfNormals.referenceFastaFai": "The index for the reference fasta file.",
"PanelOfNormals.referenceFastaDict": "The sequence dictionary associated with the reference fasta file.",
"PanelOfNormals.inputBams": "The BAM files for the samples to include in the PON. May be empty so just the annotated intervals can be created.",
"PanelOfNormals.inputBamIndexes": "The indexes for the input BAM files."
}
other useful inputs
{
"PanelOfNormals.regions": "The regions to operate on.",
"PanelOfNormals.outputDir": "The directory the output should be written to.",
"PanelOfNormals.PONname": "The name the PON file should be given."
}
Example
The following is an example of what an inputs JSON might look like:
{
"PanelOfNormals.referenceFasta": "/references/GRCh38/reference.fasta",
"PanelOfNormals.referenceFastaFai": "/references/GRCh38/reference.fasta.fai",
"PanelOfNormals.referenceFastaDict": "/references/GRCh38/reference.dict",
"PanelOfNormals.inputBams": [
"/samples/s1/control.bam",
"/samples/s2/control.bam",
"/samples/s3/control.bam"
],
"PanelOfNormals.inputBamIndexes": [
"/samples/s1/control.bai",
"/samples/s2/control.bai",
"/samples/s3/control.bai"
],
"PanelOfNormals.regions": "/capture-kit/targets.bed",
"PanelOfNormals.outputDir": "/results/CNV/PON",
"PanelOfNormals.PONname": "MyFancyPanelOfNormals"
}
pairedCnvCalling
required inputs
{
"PairedCnvCalling.caseBam": "The BAM file for the case sample.",
"PairedCnvCalling.caseBamIndex": "The index for the case sample's BAM file.",
"PairedCnvCalling.caseSampleName": "The name of the case sample.",
"PairedCnvCalling.controlBam": "The BAM file for the control sample.",
"PairedCnvCalling.controlBamIndex": "The index for the control sample's BAM file.",
"PairedCnvCalling.controlSampleName": "The name of the control sample.",
"PairedCnvCalling.referenceFasta": "The reference fasta file.",
"PairedCnvCalling.referenceFastaFai": "The index for the reference fasta file.",
"PairedCnvCalling.referenceFastaDict": "The sequence dictionary associated with the reference fasta file.",
"PairedCnvCalling.preprocessedIntervals": "Intervals to operate on. Should be produced by gatk PreprocessIntervals (eg. using CNV-PON.wdl).",
"PairedCnvCalling.commonVariantSites": "Interval list of common variant sites (to retrieve the allelic counts for). Preferably a list variants from the sample being analysed. For targeted/exome sequencing the list should be limited to variants within the sequenced regions, due to memory usage.",
"PairedCnvCalling.PON": "A read counts panel of normals as generated by gatk CreateReadCountPanelOfNormals (eg. using CNV-PON.wdl)."
}
other useful inputs
{
"PairedCnvCalling.outputDir": "The directory the output should be written to.",
"PairedCnvCalling.annotatedIntervals": "An annotated set of intervals as generated by AnnotateIntervals (eg. using CNV-PON.wdl). Will be ignored if PON is provided."
}
Example
The following is an example of what an inputs JSON might look like:
{
"PairedCnvCalling.caseBam": "/samples/s1/case.bam",
"PairedCnvCalling.caseBamIndex": "/samples/s1/case.bai",
"PairedCnvCalling.caseSampleName": "s1_case",
"PairedCnvCalling.controlBam": "/samples/s1/control.bam",
"PairedCnvCalling.controlBamIndex": "/samples/s1/control.bai",
"PairedCnvCalling.controlSampleName": "s1_control",
"PairedCnvCalling.referenceFasta": "/references/GRCh38/reference.fasta",
"PairedCnvCalling.referenceFastaFai": "/references/GRCh38/reference.fasta.fai",
"PairedCnvCalling.referenceFastaDict": "/references/GRCh38/reference.dict",
"PairedCnvCalling.preprocessedIntervals": "/results/CNV/PON/preprocessed.interval_list",
"PairedCnvCalling.commonVariantSites": "/capture-kit/dbSNP_subset_positions.interval_list",
"PairedCnvCalling.PON": "/results/CNV/PON/MyFancyPanelOfNormals.hdf5"
}
Dependency requirements and tool versions
Biowdl pipelines use docker images to ensure reproducibility. This means that biowdl pipelines will run on any system that has docker installed. Alternatively they can be run with singularity.
For more advanced configuration of docker or singularity please check the cromwell documentation on containers.
Images from biocontainers are preferred for
biowdl pipelines. The list of default images for this pipeline can be
found in the default for the dockerImages
input.
Output
CNV-PON
A PON hdf5 file, preprocessed interval list and annotated interval list.
pairedCnvCalling
Two folders containing the CNV calling results for the case and control sample.
Contact
For any questions about running these workflows and feature request (such as adding additional tools and options), please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.