Genomic Annotation of Metagenomic Sequences. Category:Multi-Sample Experimental
Currently viewing version: develop
View the Project on GitHub biowdl/gams
There are no releases available for this repository. The content is, therefore, likely still under development and not production ready. Use at own risk!
The GAMS (Genomic Annotation of Metagenomic Sequences) pipeline can be used to process metagenomic data. It performs preprocessing and quality control using cutadapt and fastqc, optionally merges overlapping reads using flash and analyses the data using centrifuge.
In order to run the complete multisample pipeline, you can
run pipeline.wdl
using
Cromwell:
java -jar cromwell-<version>.jar run -i inputs.json pipeline.wdl
The inputs JSON can be generated using WOMtools as described in the WOMtools documentation. Note that not some inputs should not be used! See this page for more information.
The primary inputs are described below, additional inputs (such as precommands and JAR paths) are available. Please use the above mentioned WOMtools command to see all available inputs.
field | type | |
---|---|---|
sampleConfigFiles | Array[File] |
The sample configuration file. See more details below. |
outputDir | String |
The output directory. |
sample.centrifugeIndexPrefix | String |
Prefix of the Centrifuge index. |
All inputs have to be preceded by
pipeline.
. Type is indicated according to the WDL data types:File
should be indicators of file location (a string in JSON). Types ending in?
indicate the input is optional, types ending in+
indicate they require at least one element.
The sample configuration should be a YML file which adheres to the following structure:
samples:
<sample>:
libraries:
<library>:
readgroups:
<readgroup>:
R1: <Path to first-end FastQ file.>
R1_md5: <MD5 checksum of first-end FastQ file.>
R2: <Path to second-end FastQ file.>
R2_md5: <MD5 checksum of second-end FastQ file.>
Replace the text between < >
with appropriate values. MD5 values may be
omitted and R2 values may be omitted in the case of single-end data.
Multiple readgroups can be added per library and multiple libraries may be
given per sample.
Included in the repository is an environment.yml
file. This file includes
all the tool version on which the workflow was tested. You can use conda and
this file to create an environment with all the correct tools.
This pipeline will produce a number of directories and files:
For any question related to this pipeline, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.