edacc primary analysis pipelines n.
Skip this Video
Download Presentation
EDACC Primary Analysis Pipelines

Loading in 2 Seconds...

play fullscreen
1 / 19

EDACC Primary Analysis Pipelines - PowerPoint PPT Presentation

  • Uploaded on

EDACC Primary Analysis Pipelines. Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics. Data Levels. ChIP-Seq Shotgun Bisulfite Sequencing Methyl-C Reduced Representation Bisulfite Sequencing RRBS MRE-Seq MeDIP-Seq Chromatin Accessibility

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'EDACC Primary Analysis Pipelines' - sevita

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
edacc primary analysis pipelines

EDACCPrimary Analysis Pipelines

Cristian Coarfa

Bioinformatics Research Laboratory

Molecular and Human Genetics

data types submitted to edacc

Shotgun Bisulfite Sequencing


Reduced Representation Bisulfite Sequencing




Chromatin Accessibility

small RNA-Seq


Data Types Submitted To EDACC
read mapping
Common processing step to all pipelines

High throughput

Sequence space: Illumina

Color space: SOLID

Quick and accurate anchoring

Reads size varies 36-76 bp

Short read aligners

1st generation: Maq, soap

Ungapped alignment

2nd generation: bowtie, bwa, soap 2

Tradeoff speed for sensitivity, good enough for many applications

Mapping tools

Robust to indels

Sensitive to variable number of mismatches

Read Mapping
pash 3 0
Positional Hashing

Regular reads mapping

Bisulfite sequencing mapping

Integrate basepair variation with epigenetic variation

SAM output, easy integration with other analysis tools

Accuracy without sacrificing efficiency

Pash 3.0
bisulfite sequencing
Current tools: BSMAP, RMAP-BS, mrsFast, Zoom

Pash 3.0

Integrate mutation discovery with basepair-level methylation discovery


General approach

Covert C’s to T’s in reads and/or reference

Use mappings, reads and reference to determine methylated sites

Pash 3

Generate and hash all possible kmers for reads


Map against forward and reverse complement chromosome strands

Superior sensitivity to other tools, without loss of efficiency

Bisulfite Sequencing
galaxy genboree
Developed at Penn State University


Rapid deployment tool

Share pipelines w/ others

Alan Harris, Sriram Raghuram

Deployed Galaxy/Genboree

Integration w/ Genboree

API for upload/download

Adaptors for LFF file format support

EDACC XML validation tools

Sriram Raghuram, Andrew Jackson, Cristian Coarfa

Integration with compute clusters

Arpit Tandon, Sriram Raghuram

Deployed analysis tools



primary analysis pipelines
Implemented & exposed via Galaxy/Genboree

Read mapping

Bisulfite Sequencing read mapping

Peak calling (ChIP-Seq, MeDIP-Seq)

MACS (Harvard), FindPeaks (UBC)

Chromatin accessibility

HotSpot (UW)

Small RNA-seq

Coming soon

mRNA seq

Expression, alternative splicing

Gene fusion

Typical user interaction

Use Galaxy for user input

Submit jobs to a cluster

Upload results to Genboree

Primary Analysis Pipelines
chip seq
Select uniquely mapping reads

Build read density maps

Extend each read 200bp along the mapping strand

Remove monoclonal reads

Generate WIG data

Can be visualized in Genboree and UCSC

Peak calling

FindPeaks, MACS

Intepret Peaks

Overlap with genomic features of interest: gene promoters, etc

medip seq
Select uniquely mapping reads

Build read density maps

Determine methylated CpGs


mre seq
Select uniquely mapping reads

Determine unmethylated CpGs

bisulfite sequencing1
Shotgun Bisulfite Sequencing


Genome wide

Reduced Representation Bisulfite Sequencing


Enzyme cocktail

Map using Pash

Build methylation maps

Bisulfite Sequencing
methylation maps
Methylation Maps

Position Strand CHHStatus Methylation Unmethylated TotalReads

50100242 + CG 1 0 1

50100243 - CG 40 11 51

50100250 + CG 1 0 1

50100251 - CG 37 8 46

small rna seq
Trim adapters

Map reads onto target genome

up to 100 locations per read


Overlap w/ miRNAs, piRNAs, sno/scaRNAs

Small RNA-Seq
Download the input MeDIP-Seq file from the workshop wiki

Analyze it using FindPeaks in Galaxy

Obtain results in Genboree Lff format

Upload the results to Genboree database

View the results in a tabular view

Find the largest peaks

Explore them in the Genboree browser