RNA-Seq pipeline
Here we provide the tools to perform paired end or single read RNA-Seq analysis including raw data quality control, differential expression (DE) analysis and functional annotation. As input files you may use either zipped fastq-files (.fastq.gz) or mapped read data (.bam files). In case of paired end reads, corresponding fastq files should be named using .R1.fastq.gz and .R2.fastq.gz suffixes.
Pipeline Workflow
All analysis steps are illustrated in the pipeline flowchart. Specify the desired analysis details for your data in the essential.vars.groovy file (see below) and run the pipeline rnaseq.pipeline.groovy as described here. A markdown file DEreport.Rmd will be generated in the output reports folder after running the pipeline. Subsequently, the DEreport.Rmd file can be converted to a final html report using the knitr R-package.
The pipelines includes
- quality control of rawdata with FastQC and MultiQC
- Read mapping to the reference genome using STAR
- generation of bigWig tracks for visualisation of alignment with deeptools
- Characterization of insert size for paired-end libraries
- Read quantification with featureCounts (Subread)
- Library complexity assessment with dupRadar
- RNA class representation
- Check for strand specificity
- Visualization of gene body coverage
- Illustration of sample relatedness with MDS plots and heatmaps
- Differential Expression Analysis for depicted group comparisons with DESeq2
- Enrichment analysis for DE results with clusterProfiler and ReactomePA
- Additional DE analysis including multimapped reads
Pipeline parameter settings
- targets.txt: tab-separated txt-file giving information about the analysed samples. The following columns are required
- sample: sample identifier for use in plots and and tables
- file: read counts file name (a unique sub-string of the file name is sufficient, this sub-string is grebbed against the count file names produced by the pipeline)
- group: variable for sample grouping (e.g. by condition)
- replicate: replicate number of samples belonging to the same group
- contrasts.txt: indicate intended group comparisions for differential expression analysis, e.g. KOvsWT=(KO-WT) if targets.txt contains the groups KO and WT. Give 1 contrast per line.
- essential.vars.groovy: essential parameter describing the experiment including:
- ESSENTIAL_PROJECT: your project folder name
- ESSENTIAL_STAR_REF: path to STAR indexed reference genome
- ESSENTIAL_GENESGTF: genome annotation file in gtf-format
- ESSENTIAL_PAIRED: either paired end ("yes") or single read ("no") design
- ESSENTIAL_STRANDED: strandness of library (no|yes|reverse)
- ESSENTIAL_ORG: UCSC organism name
- ESSENTIAL_READLENGTH: read length of library
- ESSENTIAL_THREADS: number of threads for parallel tasks
- additional (more specialized) parameter can be given in the var.groovy-files of the individual pipeline modules
Programs required
- Bedtools
- DEseq2
- deeptools
- dupRadar (provided by another project from imbforge)
- FastQC
- MultiQC
- Picard
- R packages DESeq2, clusterProfiler, ReactomePA
- RSeQC
- Samtools
- STAR
- Subread
- UCSC utilities
Version History
Version 1 (earliest) Created 7th Oct 2020 at 08:38 by Sergi Sayols
Added/updated 2 files
Open
master
df5e16c
Creator
Submitter
Views: 2671 Downloads: 302
Created: 7th Oct 2020 at 08:38
Last updated: 10th Jan 2022 at 15:19
None