genomes with ensembl l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Genomes with Ensembl PowerPoint Presentation
Download Presentation
Genomes with Ensembl

Loading in 2 Seconds...

play fullscreen
1 / 34

Genomes with Ensembl - PowerPoint PPT Presentation


  • 184 Views
  • Uploaded on

Genomes with Ensembl. Dr. Giulietta M. Spudich European Bioinformatics Institute Hinxton, UK. Today. Introduction to the Ensembl project Walk-through of the browser BioMart Variation Comparative Genomics. Introduction to Ensembl. Why do we have genome browsers? Why Ensembl?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Genomes with Ensembl' - emmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
genomes with ensembl
Genomes with Ensembl

Dr. Giulietta M. Spudich

European Bioinformatics Institute Hinxton, UK

today
Today
  • Introduction to the Ensembl project
  • Walk-through of the browser
  • BioMart
  • Variation
  • Comparative Genomics
introduction to ensembl
Introduction to Ensembl

Why do we have genome browsers?

Why Ensembl?

Ensembl genes and genomes

Help and tutorials

genome browsers provide a map

Histone modification

DNase I sensitive site

Conserved sequence

Gene

Allele

Genome browsers provide a map

Figure adapted from the EnCODE project

www.nature.com/nature/focus/encode/

genome browsers
Genome Browsers
  • Ensembl Genome browser

http://www.ensembl.org

  • NCBI Map Viewer

http://www.ncbi.nlm.nih.gov/mapview/

  • UCSC Genome Browser

http://genome.ucsc.edu

what distinguishes ensembl from the ucsc and ncbi browsers
What Distinguishes Ensembl from the UCSC and NCBI Browsers?
  • The gene set. Automatic annotation based on mRNA and protein information.
  • Programmatic access via the Perl API (open source)
  • BioMart
  • Integration with other databases (DAS)
  • Comparative analysis (gene trees)
subjects
Subjects

Why do we have genome browsers?

Why Ensembl?

How can we extract data from Ensembl?

Where can I find help?

to meet a challenge
To meet a challenge…

Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality

  • Started in 2000
  • Joint project between EBI and Sanger
  • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC
vertebrates are available
Vertebrates are available

Extension to other genomes:

Plants, Microorganisms,…

www.ensemblgenomes.org

Non-chordates:

D. melanogaster

C. elegans

S. cerevisiae

slide10

: Extending Ensembl across the taxonomic space

Archaea

48 Chordates including:

Human

Mouse

Zebrafish

Chicken

Chimpanzee

Pig

Platypus

21 species

Drosophila (12)

Caenorhabditis (5)

Anopheles gambiae

8 species

Arabidopsis thaliana

Arabidopsis lyrata

Oryzasativa

  • 8 Aspergillums
  • 2 yeast
  • S.cerevisiae
  • S.pombe

Eukaryota

  • 134 species
  • 6 bacterial clades
  • 1 prokaryotic clades

3 Plasmodia

falciparum

knowlesi

vivax

Bacteria

Slide design by

Jeff Almeida-King

F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork.

Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March 2006.

exploring genomes
Exploring genomes
  • Vertebrates focus: www.ensembl.org
  • Other species: www.ensemblgenomes.org
subjects12
Subjects

Why do we have genome browsers?

Why Ensembl?

Ensembl (vertebrate) genes & genomes

Help and tutorials

what is known
What is known?

Genomic assemblies from sequencing consortia

what is known14
What is known?

Proteins and cDNA/mRNA sequences from the research community found in:

  • UniProt/Swiss-Prot (manually curated)
  • UniProt/TrEMBL

www.uniprot.org

  • NCBI RefSeq (manually curated)

www.ncbi.nlm.nih.gov/RefSeq

combining genes and genomes

Exon

Exon

Exon

Coding

Untranslated

Untranslated+Coding

Combining genes and genomes

…tgcctgttag...

too many pieces

Genome

Aligned cDNA

and protein

Exon

Exon

Exon

Coding

Untranslated

Untranslated+Coding

Too many pieces…
ensembl shows one transcript
Ensembl shows one transcript

with underlying evidence

vega havana
VEGA/Havana
  • Automatic annotation pipeline: Gene building all at once (whole genome)

Ensembl

  • Manual curation: case-by-case basis

VEGA: Vertebrate Genome Annotation

Havana

havana
HAVANA

http://www.sanger.ac.uk/HGP/havana/

genes and transcripts in ensembl
Genes and Transcripts in Ensembl
  • Ensembl known transcripts
  • Ensemblnovel transcripts
  • Ensembl merged transcripts (Havana)
  • EST clusters
  • More manual curation (SGD, WormBase, FlyBase)
ensembl havana
Ensembl/Havana
  • Transcripts are labelled:

Ensembl

Havana

Ensembl/Havana merge

names in ensembl
Names in Ensembl
  • ENSG### Ensembl Gene ID
  • ENST### Ensembl Transcript ID
  • ENSP### Ensembl Peptide ID
  • ENSE### Ensembl Exon ID
  • For other species than human a suffix is added:

MUS (Mus musculus) for mouse: ENSMUSG###

DAR (Danio rerio) for zebrafish: ENSDARG###, etc.

low coverage genomes
Low-coverage genomes
  • High-coverage sequencing is time-consuming and expensive
    • BAC clones (>10x): Human, Mouse, Zebrafish
    • Whole Genome Shotgun (6x): Chimp, Rat, Chicken,...
  • Low (~2x) coverage genome sequencing
    • Faster, cheaper, but only useful when annotated
  • Assembled into lots of “scaffolds”
  • “Classic” Ensembl gene-build would result in many partial and fragmented genes
low coverage gene build
Low-Coverage Gene-Build

Whole Genome Alignment to an annotated high-quality reference genome

Guided re-ordering of scaffolds

Annotation of longer, more complete gene structures

2x genebuild

NNNNNN

2X Genebuild

Human gene

Human genome

Cat scaffold 2

Cat scaffold 1

Human or dog gene (projected)

what other annotation
What other annotation?
  • Non-coding (nc)RNAs
  • IDs in other databases
  • microarray probes, clonesets, BAC maps
  • Other features of the genome:
  • repeats, CpG islands
  • Comparative data:
  • orthologues and paralogues, protein families, whole genome alignments, syntenic regions
  • Variation data:
  • SNPs, InDels
  • Regulatory data (a first guess at promoter and enhancer elements)
  • Data from external sources (DAS)
sources of variation
Sources of Variation
  • NCBI dbSNP
  • Import: alleles, flanking sequence, frequencies,
  • Calculate: position, transcript effect
  • http://www.ncbi.nlm.nih.gov/SNP/
  • For human also:
  • HGVbase
    • Affy GeneChip 100K and 500K Mapping Array
    • Affy Genome-Wide SNP array 6.0
    • Ensembl-called SNPs (from Celera reads and Jim Watson’s and Craig Venter’s genomes)
  • For mouse, rat, dog and chicken also:
    • Sanger- and Ensembl-called SNPs (other strains / breeds)
    • STAR Project for rat, other projects
external sources
External Sources

Large-scale variations in…

DECIPHER

  • Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources

DGV loci

  • Database of Genomic Variants
  • CNVs, Inversions, InDels
subjects30
Subjects

Why do we have genome browsers?

Why Ensembl?

Ensembl genes and genomes

Help and tutorials

how is this information organised
How is this information organised?
  • Ensembl Views (Website)
  • Ensembl Database (open source)
  • BioMart ‘DataMining tool’
help and information
Help and Information
  • Comments and questions?

helpdesk@ensembl.org

  • Check out our tutorials page:

www.ensembl.org/info/website/tutorials/index.html

  • Videoshttp://www.youtube.com/user/EnsemblHelpdesk
  • Mailing list ensembl-announce@ebi.ac.uk
  • Come visit our blog!http://ensembl.blogspot.com/
  • FTP site: ftp://ftp.ensembl.org
  • Amazon Web Services: http://aws.amazon.com/publicdatasets