scRNA-Seq pipelines
Here we forge the tools to analyze single cell RNA-Seq experiments. The analysis workflow is based on the Bioconductor packages scater and scran as well as the Bioconductor workflows by Lun ATL, McCarthy DJ, & Marioni JC A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Res. 2016 Aug 31 [revised 2016 Oct 31];5:2122 and Amezquita RA, Lun ATL et al. Orchestrating Single-Cell Analysis with Bioconductor Nat Methods. 2020 Feb;17(2):137-145.
Implemented protocols
- MARS-Seq (massively parallel single-cell RNA-sequencing): The protocol is based on the publications of Jaitin DA, et al. (2014). Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science (New York, N.Y.), 343(6172), 776–779. https://doi.org/10.1126/science.1247651 and Keren-Shaul H., et al. (2019). MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing. Nature Protocols. https://doi.org/10.1038/s41596-019-0164-4. The MARS-Seq library preparation protocol is given here. The sequencing reads are demultiplexed according to the respective pool barcodes before they are used as input for the analysis pipeline.
- Smart-seq2: Libraries are generated using the Smart-seq2 kit.
Pipeline Workflow
All analysis steps are illustrated in the pipeline flowchart. Specify desired analysis details for your data in the respective essential.vars.groovy file (see below) and run the selected pipeline marsseq.pipeline.groovy or smartsseq.pipeline.groovy as described here. The analysis allows further parameter fine-tuning subsequent the initial analysis e.g. for plotting and QC thresholding. Therefore, a customisable sc.report.Rmd file will be generated in the output reports folder after running the pipeline. Go through the steps and modify the default settings where appropriate. Subsequently, the sc.report.Rmd file can be converted to a final html report using the knitr R-package.
The pipelines includes:
- FastQC, MultiQC and other tools for rawdata quality control
- Adapter trimming with Cutadapt
- Mapping to the genome using STAR
- generation of bigWig tracks for visualisation of alignment
- Quantification with featureCounts (Subread) and UMI-tools (if UMIs are used for deduplication)
- Downstream analysis in R using a pre-designed markdown report file (sc.report.Rmd). Modify this file to fit your custom parameter and thresholds and render it to your final html report. The Rmd file uses, among others, the following tools and methods:
Pipeline parameter settings
-
essential.vars.groovy: essential parameter describing the experiment
- project folder name
- reference genome
- experiment design
- adapter sequence, etc.
-
additional (more specialized) parameter can be given in the var.groovy-files of the individual pipeline modules
-
targets.txt: comma-separated txt-file giving information about the analysed samples. The following columns are required
- sample: sample identifier. Must be a unique substring of the input sample file name (e.g. common prefixes and suffixes may be removed). These names are grebbed against the count file names to merge targets.txt to the count data.
- plate: plate ID (number)
- row: plate row (letter)
- col: late column (number)
- cells: 0c/1c/10c (control wells)
- group: default variable for cell grouping (e.g. by condition)
for pool-based libraries like MARSseq required additionally:
- pool: the pool ID comprises all cells from 1 library pool (i.e. a set of unique cell barcodes; the cell barcodes are re-used in other pools). Must be a unique substring of the input sample file name. For pool-based design, the pool ID is grebbed against the respective count data filename instead of the sample name as stated above.
- barcode: cell barcodes used as cell identifier in the count files. After merging the count data with targets.txt, the barcodes are replaced with sample IDs given in the sample column (i.e. here, sample names need not be a substring of input sample file name).
Programs required
- FastQC
- STAR
- Samtools
- Bedtools
- Subread
- Picard
- UCSC utilities
- RSeQC
- UMI-tools
- R
Resources
- QC: the scater package.
- Normalization: the scran package.
- Trajectory analysis (pseudotime): the monocle package.
- A tutorial from Hemberg lab
- Luecken and Theis 2019 Current best practices in single‐cell RNA‐seq analysis: a tutorial
Version History
Version 1 (earliest) Created 7th Oct 2020 at 08:46 by Sergi Sayols
Added/updated 2 files
Open
master
82d2ea5
Creator
Submitter
Views: 3442 Downloads: 314
Created: 7th Oct 2020 at 08:46
Last updated: 10th Jan 2022 at 15:19
None