1 / 36

Single cell RNAseq Kathie Mihindukulasuriya , PhD Senior Scientist, Cruchaga Lab

Single cell RNAseq Kathie Mihindukulasuriya , PhD Senior Scientist, Cruchaga Lab Department of Psychiatry Washington University in St. Louis. Plan: Single cell RNA- seq vs bulk RNA- seq C urrent single cell protocols and platforms Processing single cell RNA- seq data

laksha
Download Presentation

Single cell RNAseq Kathie Mihindukulasuriya , PhD Senior Scientist, Cruchaga Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single cell RNAseq Kathie Mihindukulasuriya, PhD Senior Scientist, Cruchaga Lab Department of Psychiatry Washington University in St. Louis

  2. Plan: Single cell RNA-seq vs bulk RNA-seq Current single cell protocols and platforms Processing single cell RNA-seq data Biology based analysis Current challenges in single cell RNA-seq processing and analysis

  3. Bulk RNAseq vs single cell RNASeq

  4. What are some types of questions that can be answered by scRNAseq?

  5. Fluidigm C1

  6. Fluidigm C1

  7. Fluidigm C1

  8. Fluidigm C1

  9. Fluidigm C1

  10. Fluidigm C1

  11. Fluidigm C1

  12. Fluidigm C1

  13. Droplet-based Methods of single-cell isolation: Limiting dilution: not very efficient Micromanipulation:  Time consuming; low throughput FACS: highly purified single cells IF cells express cell surface marker

  14. Droplet-based Methods of single-cell isolation: Microfluidic technology low sample consumption low analysis cost precise fluid control Decreased risk of external contamination CellSearch Antibody conjugated to magnetic particles To isolate desired cells Good for rare cell types Laser capture microdissection isolate cellsfrom solid samples

  15. Droplet-based cell lysis -> reverse transcription into first-strand cDNA -> second-strand synthesis -> cDNA amplification • UMIs: • - 4–10 random nucleotides that are introduced with the primer used for cDNA generation before amplification • multiple reads with the same UMI sequence map to the same gene = one molecule • Cell barcodes: • labeling of cDNA by a cell-specific DNA sequence that allows multiplexing at an early stage

  16. Droplet-based cell lysis -> reverse transcription into first-strand cDNA -> second-strand synthesis -> cDNA amplification • UMIs: • - 4–10 random nucleotides that are introduced with the primer used for cDNA generation before amplification • multiple reads with the same UMI sequence map to the same gene = one molecule • Cell barcodes: • labeling of cDNA by a cell-specific DNA sequence that allows multiplexing at an early stage

  17. Plate-based Template Switching Oligonucleotide

  18. Processing scRNA-seq data • Map reads to genome, not transcriptome • Decreases multi-mapping reads • Critical for snRNA-seq • Splice-aware aligners (STAR) • Pseudoaligners (faster) Associate reads with genes or transcripts - featureCounts - HTSeq remove PCR noise using UMIs demultiplexing to identify cells Remove barcodes from cell-free mRNA (much lower average read count than barcodes derived from intact cells)

  19. Processing scRNA-seq data • Remove low-quality ‘cells’ based on mapping statistics: • overrepresentation of mitochondrial RNAs, ribosomal RNAs (>40%), spike-ins, adapters • and/or reads that map outside of exons Normalization to correct for unwanted variation among cells caused by technical variation remove batch effects Biology-based analysis (like differential expression)

  20. Some examples of biology-based analysis Purpose: to directly investigate AD brain changes in cell proportion and gene expression using single cell resolution Del-Aguila, J.L. et al. A single- nuclei RNA sequencing study of Mendelian and sporadic AD in the human brain. bioRxiv. Mar. 30, 2019. doi: http://dx.doi.org/10.1101/593756.

  21. To identify different cell types in brain samples by a CGS approach (unsupervised graph-based clustering) and then annotated by cell type using marker genes  t-distributed Stochastic Neighbor Embedding (tSNE) plot is a dimensionality reduction technique Differences with PCA: tSNE always produces a 2D separation tSNE is non-deterministic (you won't get exactly the same output each time you run it) tSNE tends to cope better with non-linear signals in your data, (less impact of outliers; visible separation between relevant groups is improved) 4. After tSNE input features are no longer identifiable, and you cannot make any inference based only on the output of t-SNE NOTE: very computationally intensive (may need to apply another dimensionality reduction technique like PCA first)

  22. To identify different cell types in brain samples: Classic Gene Set (CGS) from Pooled Subjects: (Seurat FindVariableGenes -> 2,360 genes -> calculate 100 PCs -> identified the optimal number of PCs (65) 25 clusters 6 cell types

  23. To identify different cell types in brain samples: Consensus Gene Set (ConGen) from each subject: (Seurat FindVariableGenes -> 2,447 (S1); 2,354 (S2); 1,972 (S3) -> R function intersection to identify common genes (1,434) -> calculate 100 PCs -> identified the optimal number of PCs (25) 14 cell types; better resolution

  24. Cluster annotation Evaluating the expression of maker genes for neurons, astrocytes, oligodendrocytes, microglia, oligodendrocyte precursor cells, endothelial cells, excitatory and inhibitory neurons (from literature) -> Seurat DotPlot to visualize the average gene expression for the marker genes in each cluster

  25. Workflow Analysis Plan

  26. Single cell analysis: current challenges • - Biggest challenge: missing data (excess zeros) “Dropout” • - technical (not captured) • - biological (really no expression) • sampling (just not deep enough sequencing) • can’t distinguish between these • dropout = largest source of variation • How to deal with missing data? • Increase read depth • Impute the missing data based on clustered cells (DrImpute, CIDR, MAGIC, scimpute) • Impute the missing data based on bulk RNAseq data (SCRABBLE) • Use biological knowledge – gene-gene coexpression (netNMF-sc)

  27. Single cell analysis: current challenges Explosion of methods and software, but not yet clear best practices https://github.com/seandavi/awesome-single-cell • Doublet Identification • demuxlet - [shell] - Multiplexed droplet single-cell RNA-sequencing using natural genetic variation • DoubletFinder - [R] - Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. BioRxiv • DoubletDecon - [R] - Cell-State Aware Removal of Single-Cell RNA-Seq Doublets. [BioRxiv](DoubletDecon: Cell-State Aware Removal of Single-Cell RNA-Seq Doublets) • DoubletDetection - [R, Python] - A Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. An R implementation is in development. • Scrublet - [Python] - Computational identification of cell doublets in single-cell transcriptomic data. BioRxiv

  28. Single cell analysis: current challenges • Assigning cell types to clusters of cells: • - dimensionality reduction (tSNE, PCA, UMAP) -> unsupervised clustering -> annotation of clusters • Use of marker genes • Known marker genes • Expression high enough to be measured (not always true for known cell surface markers) • Subjective (different researchers choose different markers) • Novel cell types? • Use of annotated training data (e.g. reference atlas) • comparisons with annotated reference data using automatically chosen genes that optimally discriminate • between cell types (scmap, SingleR) • - allow the assignment of cells to an intermediate or unassigned type (CHETAH) Challenge: human data often clusters by individual, rather than cell type

  29. Single cell analysis: current challenges • How to combine datasets for analysis: • scmap: projection of single-cell RNA-seq data across data sets • scMerge: using genes that do not to change across all samples and a robust algorithm to infer pseudoreplicates between datasets. 

  30. Single cell analysis: current challenges Look to see advances in single cell RNA seq cancer research for solutions to problems

More Related