BioWDL: gatk-preprocess

A BioWDL workflow for preprocessing BAM files for variantcalling. Based on the GATK Best Practices.

Please be aware that the page you are currently viewing is not for the latest available version!

Inputs for GatkPreprocess

The following is an overview of all available inputs in GatkPreprocess.

Required inputs

GatkPreprocess.bam
File
The BAM file which should be processed
GatkPreprocess.bamIndex
File
The index for the BAM file
GatkPreprocess.dbsnpVCF
File
A dbSNP vcf.
GatkPreprocess.dbsnpVCFIndex
File
Index for dbSNP vcf.
GatkPreprocess.referenceFasta
File
The reference fasta file
GatkPreprocess.referenceFastaDict
File
Sequence dictionary (.dict) for the reference fasta file
GatkPreprocess.referenceFastaFai
File
Fasta index (.fai) for the reference fasta file

Other common inputs

GatkPreprocess.bamName
String — Default: "recalibrated"
The basename for the produced BAM files. This should not include any parent direcoties, use `outputDir` if the output directory should be changed.
GatkPreprocess.outputDir
String — Default: "."
The directory to which the outputs will be written.
GatkPreprocess.regions
File?
A bed file describing the regions to operate on.
GatkPreprocess.splitSplicedReads
Boolean — Default: false
Whether or not gatk's SplitNCgarReads should be run to split spliced reads. This should be enabled for RNAseq samples.

Advanced inputs

Show/Hide
GatkPreprocess.applyBqsr.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.applyBqsr.memory
String — Default: "12G"
The amount of memory this job will use.
GatkPreprocess.baseRecalibrator.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.baseRecalibrator.knownIndelsSitesVCFIndexes
Array[File] — Default: []
The indexed for the known variant VCFs.
GatkPreprocess.baseRecalibrator.knownIndelsSitesVCFs
Array[File] — Default: []
VCF files with known indels.
GatkPreprocess.baseRecalibrator.memory
String — Default: "12G"
The amount of memory this job will use.
GatkPreprocess.dockerImages
Map[String,String] — Default: {"picard": "quay.io/biocontainers/picard:2.20.5--0", "gatk4": "quay.io/biocontainers/gatk4:4.1.0.0--0", "biopet-scatterregions": "quay.io/biocontainers/biopet-scatterregions:0.2--0"}
The docker images used. Changing this may result in errors which the developers may choose not to address.
GatkPreprocess.gatherBamFiles.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.gatherBamFiles.memory
String — Default: "12G"
The amount of memory this job will use.
GatkPreprocess.gatherBqsr.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.gatherBqsr.memory
String — Default: "12G"
The amount of memory this job will use.
GatkPreprocess.scatterList.bamFile
File?
Equivalent to biopet scatterregions' `--bamfile` option.
GatkPreprocess.scatterList.bamIndex
File?
The index for the bamfile given through bamFile.
GatkPreprocess.scatterList.javaXmx
String — Default: "8G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.scatterList.memory
String — Default: "24G"
The amount of memory this job will use.
GatkPreprocess.scatterSize
Int — Default: scatterSizeMillions * 1000000
The size of the scattered regions in bases. Scattering is used to speed up certain processes. The genome will be sseperated into multiple chunks (scatters) which will be processed in their own job, allowing for parallel processing. Higher values will result in a lower number of jobs. The optimal value here will depend on the available resources.
GatkPreprocess.scatterSizeMillions
Int — Default: 1000
Same as scatterSize, but is multiplied by 1000000 to get scatterSize. This allows for setting larger values more easily
GatkPreprocess.splitNCigarReads.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
GatkPreprocess.splitNCigarReads.memory
String — Default: "16G"
The amount of memory this job will use.