BioWDL: gatk-variantcalling

A pipeline that calls variants on a group of BAM files.

This is not a stable version!
You are currently viewing the documentation for a development version. It is not guaranteed that this documentation is up to date. Things will likely change without announcement or versioning incrementation. If there is no other documentation available, there are likely no releases available for this repository. The content is, therefore, likely still in development and not production ready. Use at your own risk!

Please be aware that the page you are currently viewing is not for the latest available version!

Inputs for MultisampleCalling

The following is an overview of all available inputs in MultisampleCalling.

Required inputs

MultisampleCalling.bamFilesAndGenders
Array[BamAndGender]
List of structs containing,BAM file, BAM index and gender. The BAM should be recalibrated beforehand if required. The gender string is optional. Actionable values are 'female','f','F','male','m' and 'M'.
MultisampleCalling.referenceFasta
File
The reference fasta file
MultisampleCalling.referenceFastaDict
File
Sequence dictionary (.dict) file of the reference
MultisampleCalling.referenceFastaFai
File
Fasta index (.fai) file of the reference

Other common inputs

MultisampleCalling.dbsnpVCF
File?
dbsnp VCF file used for checking known sites
MultisampleCalling.dbsnpVCFIndex
File?
Index (.tbi) file for the dbsnp VCF
MultisampleCalling.dontUseSoftClippedBases
Boolean — Default: false
Whether soft-clipped bases should be excluded from the haplotype caller analysis (should be set to 'true' for RNA).
MultisampleCalling.jointgenotyping
Boolean — Default: true
Whether to perform jointgenotyping (using HaplotypeCaller to call GVCFs and merge them with GenotypeGVCFs) or not
MultisampleCalling.JointGenotyping.genotypeGvcfs.pedigree
File?
Pedigree file for determining the population "founders".
MultisampleCalling.JointGenotyping.regions
File?
A bed file describing the regions to operate on.
MultisampleCalling.JointGenotyping.Stats.compareVcf
File?
When inputVcf and compareVCF are given, the program generates separate stats for intersection and the complements. By default only sites are compared, samples must be given to include also sample columns.
MultisampleCalling.JointGenotyping.Stats.compareVcfIndex
File?
Index for the compareVcf.
MultisampleCalling.outputDir
String — Default: "."
The directory where the output files should be located
MultisampleCalling.regions
File?
A bed file describing the regions to operate on.
MultisampleCalling.singleSampleCalling.callAutosomal.excludeIntervalList
Array[File]+?
Bed files or interval lists describing the regions to NOT operate on.
MultisampleCalling.singleSampleCalling.callAutosomal.pedigree
File?
Pedigree file for determining the population "founders".
MultisampleCalling.singleSampleCalling.callAutosomal.ploidy
Int?
The ploidy with which the variants should be called.
MultisampleCalling.singleSampleCalling.callX.excludeIntervalList
Array[File]+?
Bed files or interval lists describing the regions to NOT operate on.
MultisampleCalling.singleSampleCalling.callX.pedigree
File?
Pedigree file for determining the population "founders".
MultisampleCalling.singleSampleCalling.callY.excludeIntervalList
Array[File]+?
Bed files or interval lists describing the regions to NOT operate on.
MultisampleCalling.singleSampleCalling.callY.pedigree
File?
Pedigree file for determining the population "founders".
MultisampleCalling.singleSampleCalling.Stats.compareVcf
File?
When inputVcf and compareVCF are given, the program generates separate stats for intersection and the complements. By default only sites are compared, samples must be given to include also sample columns.
MultisampleCalling.singleSampleCalling.Stats.compareVcfIndex
File?
Index for the compareVcf.
MultisampleCalling.singleSampleGvcf
Boolean — Default: false
Whether to output single-sample gvcfs
MultisampleCalling.vcfBasename
String — Default: "multisample"
The basename of the VCF and GVCF files that are outputted by the workflow
MultisampleCalling.XNonParRegions
File?
Bed file with the non-PAR regions of X
MultisampleCalling.YNonParRegions
File?
Bed file with the non-PAR regions of Y

Advanced inputs

Show/Hide
MultisampleCalling.calculateRegions.intersectAutosomalRegions.memory
String — Default: "~{512 + ceil(size([regionsA, regionsB],"MiB"))}MiB"
The amount of memory needed for the job.
MultisampleCalling.calculateRegions.intersectAutosomalRegions.timeMinutes
Int — Default: 1 + ceil(size([regionsA, regionsB],"GiB"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.calculateRegions.intersectX.memory
String — Default: "~{512 + ceil(size([regionsA, regionsB],"MiB"))}MiB"
The amount of memory needed for the job.
MultisampleCalling.calculateRegions.intersectX.timeMinutes
Int — Default: 1 + ceil(size([regionsA, regionsB],"GiB"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.calculateRegions.intersectY.memory
String — Default: "~{512 + ceil(size([regionsA, regionsB],"MiB"))}MiB"
The amount of memory needed for the job.
MultisampleCalling.calculateRegions.intersectY.timeMinutes
Int — Default: 1 + ceil(size([regionsA, regionsB],"GiB"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.calculateRegions.inverseBed.memory
String — Default: "~{512 + ceil(size([inputBed, faidx],"MiB"))}MiB"
The amount of memory needed for the job.
MultisampleCalling.calculateRegions.inverseBed.timeMinutes
Int — Default: 1 + ceil(size([inputBed, faidx],"G"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.calculateRegions.mergeBeds.memory
String — Default: "~{512 + ceil(size(bedFiles,"MiB"))}MiB"
The amount of memory needed for the job.
MultisampleCalling.calculateRegions.mergeBeds.outputBed
String — Default: "merged.bed"
The path to write the output to.
MultisampleCalling.calculateRegions.mergeBeds.timeMinutes
Int — Default: 1 + ceil(size(bedFiles,"G"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.calculateRegions.scatterAutosomalRegions.memory
String — Default: "256MiB"
The amount of memory this job will use.
MultisampleCalling.calculateRegions.scatterAutosomalRegions.prefix
String — Default: "scatters/scatter-"
The prefix of the ouput files. Output will be named like: .bed, in which N is an incrementing number. Default 'scatter-'. </dd>
MultisampleCalling.calculateRegions.scatterAutosomalRegions.splitContigs
Boolean — Default: false
If set, contigs are allowed to be split up over multiple files.
MultisampleCalling.calculateRegions.scatterAutosomalRegions.timeMinutes
Int — Default: 2
The maximum amount of time the job will run in minutes.
MultisampleCalling.dockerImages
Map[String,String] — Default: {"bedtools": "quay.io/biocontainers/bedtools:2.23.0--hdbcaa40_3", "picard": "quay.io/biocontainers/picard:2.23.2--0", "gatk4": "quay.io/biocontainers/gatk4:4.1.8.0--py38h37ae868_0", "chunked-scatter": "quay.io/biocontainers/chunked-scatter:1.0.0--py_0", "bcftools": "quay.io/biocontainers/bcftools:1.10.2--h4f4756c_2"}
specify which docker images should be used for running this pipeline
MultisampleCalling.JointGenotyping.gatherGvcfs.intervals
Array[File] — Default: []
Bed files or interval lists describing the regions to operate on.
MultisampleCalling.JointGenotyping.gatherGvcfs.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
MultisampleCalling.JointGenotyping.gatherGvcfs.memory
String — Default: "5GiB"
The amount of memory this job will use.
MultisampleCalling.JointGenotyping.gatherGvcfs.timeMinutes
Int — Default: 1 + ceil((size(gvcfFiles,"G") * 8))
The maximum amount of time the job will run in minutes.
MultisampleCalling.JointGenotyping.gatherVcfs.compressionLevel
Int — Default: 1
The compression level at which the BAM files are written.
MultisampleCalling.JointGenotyping.gatherVcfs.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
MultisampleCalling.JointGenotyping.gatherVcfs.memory
String — Default: "5GiB"
The amount of memory this job will use.
MultisampleCalling.JointGenotyping.gatherVcfs.timeMinutes
Int — Default: 1 + ceil(size(inputVCFs,"GiB")) * 2
The maximum amount of time the job will run in minutes.
MultisampleCalling.JointGenotyping.gatherVcfs.useJdkDeflater
Boolean — Default: true
True, uses the java deflator to compress the BAM files. False uses the optimized intel deflater.
MultisampleCalling.JointGenotyping.gatherVcfs.useJdkInflater
Boolean — Default: false
True, uses the java inflater. False, uses the optimized intel inflater.
MultisampleCalling.JointGenotyping.genotypeGvcfs.annotationGroups
Array[String] — Default: ["StandardAnnotation"]
Which annotation groups will be used for the annotation.
MultisampleCalling.JointGenotyping.genotypeGvcfs.javaXmx
String — Default: "6G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
MultisampleCalling.JointGenotyping.genotypeGvcfs.memory
String — Default: "7GiB"
The amount of memory this job will use.
MultisampleCalling.JointGenotyping.genotypeGvcfs.timeMinutes
Int — Default: 120
The maximum amount of time the job will run in minutes.
MultisampleCalling.JointGenotyping.sampleIds
Array[String] — Default: []
Sample IDs which should be analysed by the stats tools.
MultisampleCalling.JointGenotyping.scatterRegions.memory
String — Default: "256MiB"
The amount of memory this job will use.
MultisampleCalling.JointGenotyping.scatterRegions.prefix
String — Default: "scatters/scatter-"
The prefix of the ouput files. Output will be named like: .bed, in which N is an incrementing number. Default 'scatter-'. </dd>
MultisampleCalling.JointGenotyping.scatterRegions.splitContigs
Boolean — Default: false
If set, contigs are allowed to be split up over multiple files.
MultisampleCalling.JointGenotyping.scatterRegions.timeMinutes
Int — Default: 2
The maximum amount of time the job will run in minutes.
MultisampleCalling.JointGenotyping.Stats.afBins
String?
Allele frequency bins, a list (0.1,0.5,1) or a file (0.1 0.5 1).
MultisampleCalling.JointGenotyping.Stats.applyFilters
String?
Require at least one of the listed FILTER strings (e.g. "PASS,.").
MultisampleCalling.JointGenotyping.Stats.collapse
String?
Treat as identical records with <snps|indels|both|all|some|none>, see man page for details.
MultisampleCalling.JointGenotyping.Stats.depth
String?
Depth distribution: min,max,bin size [0,500,1].
MultisampleCalling.JointGenotyping.Stats.exclude
String?
Exclude sites for which the expression is true (see man page for details).
MultisampleCalling.JointGenotyping.Stats.exons
File?
Tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed).
MultisampleCalling.JointGenotyping.Stats.firstAlleleOnly
Boolean — Default: false
Include only 1st allele at multiallelic sites.
MultisampleCalling.JointGenotyping.Stats.include
String?
Select sites for which the expression is true (see man page for details).
MultisampleCalling.JointGenotyping.Stats.memory
String — Default: "256MiB"
The amount of memory this job will use.
MultisampleCalling.JointGenotyping.Stats.regions
String?
Restrict to comma-separated list of regions.
MultisampleCalling.JointGenotyping.Stats.samplesFile
File?
File of samples to include.
MultisampleCalling.JointGenotyping.Stats.splitByID
Boolean — Default: false
Collect stats for sites with ID separately (known vs novel).
MultisampleCalling.JointGenotyping.Stats.targets
String?
Similar to regions but streams rather than index-jumps.
MultisampleCalling.JointGenotyping.Stats.targetsFile
File?
Similar to regionsFile but streams rather than index-jumps.
MultisampleCalling.JointGenotyping.Stats.threads
Int — Default: 0
Number of extra decompression threads [0].
MultisampleCalling.JointGenotyping.Stats.timeMinutes
Int — Default: 1 + 2 * ceil(size(select_all([inputVcf, compareVcf]),"G"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.JointGenotyping.Stats.userTsTv
String?
<TAG[:min:max:n]>. Collect Ts/Tv stats for any tag using the given binning [0:1:100].
MultisampleCalling.JointGenotyping.Stats.verbose
Boolean — Default: false
Produce verbose per-site and per-sample output.
MultisampleCalling.scatterSize
Int?
The size of the scattered regions in bases. Scattering is used to speed up certain processes. The genome will be seperated into multiple chunks (scatters) which will be processed in their own job, allowing for parallel processing. Higher values will result in a lower number of jobs. The optimal value here will depend on the available resources.
MultisampleCalling.scatterSizeMillions
Int — Default: 1000
Same as scatterSize, but is multiplied by 1000000 to get scatterSize. This allows for setting larger values more easily
MultisampleCalling.singleSampleCalling.callAutosomal.contamination
Float?
Equivalent to HaplotypeCaller's `-contamination` option.
MultisampleCalling.singleSampleCalling.callAutosomal.emitRefConfidence
String — Default: if gvcf then "GVCF" else "NONE"
Whether to include reference calls. Three modes: 'NONE', 'BP_RESOLUTION' and 'GVCF'.
MultisampleCalling.singleSampleCalling.callAutosomal.javaXmxMb
Int — Default: 4096
The maximum memory available to the program in megabytes. Should be lower than `memoryMb` to accommodate JVM overhead.
MultisampleCalling.singleSampleCalling.callAutosomal.memoryMb
Int — Default: javaXmxMb + 512
The amount of memory this job will use in megabytes.
MultisampleCalling.singleSampleCalling.callAutosomal.outputMode
String?
Specifies which type of calls we should output. Same as HaplotypeCaller's `--output-mode` option.
MultisampleCalling.singleSampleCalling.callX.contamination
Float?
Equivalent to HaplotypeCaller's `-contamination` option.
MultisampleCalling.singleSampleCalling.callX.emitRefConfidence
String — Default: if gvcf then "GVCF" else "NONE"
Whether to include reference calls. Three modes: 'NONE', 'BP_RESOLUTION' and 'GVCF'.
MultisampleCalling.singleSampleCalling.callX.javaXmxMb
Int — Default: 4096
The maximum memory available to the program in megabytes. Should be lower than `memoryMb` to accommodate JVM overhead.
MultisampleCalling.singleSampleCalling.callX.memoryMb
Int — Default: javaXmxMb + 512
The amount of memory this job will use in megabytes.
MultisampleCalling.singleSampleCalling.callX.outputMode
String?
Specifies which type of calls we should output. Same as HaplotypeCaller's `--output-mode` option.
MultisampleCalling.singleSampleCalling.callY.contamination
Float?
Equivalent to HaplotypeCaller's `-contamination` option.
MultisampleCalling.singleSampleCalling.callY.emitRefConfidence
String — Default: if gvcf then "GVCF" else "NONE"
Whether to include reference calls. Three modes: 'NONE', 'BP_RESOLUTION' and 'GVCF'.
MultisampleCalling.singleSampleCalling.callY.javaXmxMb
Int — Default: 4096
The maximum memory available to the program in megabytes. Should be lower than `memoryMb` to accommodate JVM overhead.
MultisampleCalling.singleSampleCalling.callY.memoryMb
Int — Default: javaXmxMb + 512
The amount of memory this job will use in megabytes.
MultisampleCalling.singleSampleCalling.callY.outputMode
String?
Specifies which type of calls we should output. Same as HaplotypeCaller's `--output-mode` option.
MultisampleCalling.singleSampleCalling.mergeSingleSampleGvcf.intervals
Array[File] — Default: []
Bed files or interval lists describing the regions to operate on.
MultisampleCalling.singleSampleCalling.mergeSingleSampleGvcf.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
MultisampleCalling.singleSampleCalling.mergeSingleSampleGvcf.memory
String — Default: "5GiB"
The amount of memory this job will use.
MultisampleCalling.singleSampleCalling.mergeSingleSampleGvcf.timeMinutes
Int — Default: 1 + ceil((size(gvcfFiles,"G") * 8))
The maximum amount of time the job will run in minutes.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.compressionLevel
Int — Default: 1
The compression level at which the BAM files are written.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.javaXmx
String — Default: "4G"
The maximum memory available to the program. Should be lower than `memory` to accommodate JVM overhead.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.memory
String — Default: "5GiB"
The amount of memory this job will use.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.timeMinutes
Int — Default: 1 + ceil(size(inputVCFs,"GiB")) * 2
The maximum amount of time the job will run in minutes.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.useJdkDeflater
Boolean — Default: true
True, uses the java deflator to compress the BAM files. False uses the optimized intel deflater.
MultisampleCalling.singleSampleCalling.mergeSingleSampleVcf.useJdkInflater
Boolean — Default: false
True, uses the java inflater. False, uses the optimized intel inflater.
MultisampleCalling.singleSampleCalling.Stats.afBins
String?
Allele frequency bins, a list (0.1,0.5,1) or a file (0.1 0.5 1).
MultisampleCalling.singleSampleCalling.Stats.applyFilters
String?
Require at least one of the listed FILTER strings (e.g. "PASS,.").
MultisampleCalling.singleSampleCalling.Stats.collapse
String?
Treat as identical records with <snps|indels|both|all|some|none>, see man page for details.
MultisampleCalling.singleSampleCalling.Stats.depth
String?
Depth distribution: min,max,bin size [0,500,1].
MultisampleCalling.singleSampleCalling.Stats.exclude
String?
Exclude sites for which the expression is true (see man page for details).
MultisampleCalling.singleSampleCalling.Stats.exons
File?
Tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed).
MultisampleCalling.singleSampleCalling.Stats.firstAlleleOnly
Boolean — Default: false
Include only 1st allele at multiallelic sites.
MultisampleCalling.singleSampleCalling.Stats.include
String?
Select sites for which the expression is true (see man page for details).
MultisampleCalling.singleSampleCalling.Stats.memory
String — Default: "256MiB"
The amount of memory this job will use.
MultisampleCalling.singleSampleCalling.Stats.regions
String?
Restrict to comma-separated list of regions.
MultisampleCalling.singleSampleCalling.Stats.samplesFile
File?
File of samples to include.
MultisampleCalling.singleSampleCalling.Stats.splitByID
Boolean — Default: false
Collect stats for sites with ID separately (known vs novel).
MultisampleCalling.singleSampleCalling.Stats.targets
String?
Similar to regions but streams rather than index-jumps.
MultisampleCalling.singleSampleCalling.Stats.targetsFile
File?
Similar to regionsFile but streams rather than index-jumps.
MultisampleCalling.singleSampleCalling.Stats.threads
Int — Default: 0
Number of extra decompression threads [0].
MultisampleCalling.singleSampleCalling.Stats.timeMinutes
Int — Default: 1 + 2 * ceil(size(select_all([inputVcf, compareVcf]),"G"))
The maximum amount of time the job will run in minutes.
MultisampleCalling.singleSampleCalling.Stats.userTsTv
String?
<TAG[:min:max:n]>. Collect Ts/Tv stats for any tag using the given binning [0:1:100].
MultisampleCalling.singleSampleCalling.Stats.verbose
Boolean — Default: false
Produce verbose per-site and per-sample output.
MultisampleCalling.singleSampleCalling.statsRegions
File?
Which regions need to be analysed by the stats tools.
MultisampleCalling.singleSampleCalling.timeMinutes
Int — Default: ceil((size(bam,"G") * 120))
The time in minutes expected for each haplotype caller task. Will be exposed as the time_minutes runtime attribute.
MultisampleCalling.standardMinConfidenceThresholdForCalling
Float?
Minimum confidence treshold used by haplotype caller.
</dl> </details>