Joint ebi wellcome trust
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Joint EBI-Wellcome Trust PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Joint EBI-Wellcome Trust. Summer School 14-18 June 2010. Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective.

Download Presentation

Joint EBI-Wellcome Trust

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Joint ebi wellcome trust

Joint EBI-Wellcome Trust

Summer School

14-18 June 2010


Joint ebi wellcome trust

Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective

Teresa K.Attwood

University of Manchester


Joint ebi wellcome trust

Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European

Teresa K.Attwood

University of Manchester


Joint ebi wellcome trust

Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European

Teresa K.Attwood

University of Manchester


Overview

Overview

  • Where the concept of bioinformatics originated

  • Some key milestones & key people

  • Its place in ‘the new biology’

Teresa K.Attwood

University of Manchester


Disclaimer

Disclaimer

  • Bear in mind that this is a personal view

  • That it’s hard

    • to step out of a situation & look back in

      • & remain objective

    • to separate the European & American histories

  • Observers from different perspectives will see & tell the story differently!

  • So this is just my perspective…

    • & it’s bound up with sequences & dbs

Teresa K.Attwood

University of Manchester


Origin of bioinformatics

Origin of bioinformatics

  • The origins of bioinformatics are rooted in sequence analysis

  • And driven by the desire to

    • collect them

    • annotate them

    • & analyse them

      • systematically (i.e., using computers)!

The concept ‘bioinformatics’ was barely known pre 1990…

Teresa K.Attwood

University of Manchester


Key milestones

Key milestones

ARPAnet

insulin

ribonuclease

Dayhoff Atlas

Teresa K.Attwood

University of Manchester


Margaret dayhoff 1925 1983

Margaret Dayhoff1925-1983

  • Pioneered development of computer methods to compare protein sequences

    • & to derive evolutionary histories from alignments

  • Particularly interested in deducing evolutionary connections from sequence evidence

Teresa K.Attwood

University of Manchester


Margaret dayhoff

Margaret Dayhoff

  • Collected all the known protein sequences

    • made them available to the scientific community

  • In 1965, she compiled a book

    • the 1st Atlas of Protein Sequence and Structure

Teresa K.Attwood University of Manchester


Margaret dayhoff1

Margaret Dayhoff

Teresa K.Attwood

University of Manchester


Key milestones1

Key milestones

7 structures

65 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

PDB

Teresa K.Attwood

University of Manchester


Data overload in the usa

Data overload in the USA

Teresa K.Attwood

University of Manchester


Data overload in the usa1

Data overload in the USA

Teresa K.Attwood

University of Manchester


Data overload in europe

Data overload in Europe

  • The data overload problem had also been noticed in Europe

  • The solution was to create the 1st nucleotide sequence database

    • this was the EMBL databank

      • this preceded the 1st release of GenBank by ~6 months

Teresa K.Attwood

University of Manchester


Key milestones2

Key milestones

7 structures

65 sequences

859 sequences

568 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

PIR-PSD

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Enter amos bairoch

Enter Amos Bairoch

  • A crazy postgrad student in Switzerland

    • interested in space exploration & the search for ET life

  • His project was to develop software to analyse protein & nucleotide sequences

    • PC/Gene

Teresa K.Attwood

University of Manchester


Amos bairoch

Amos Bairoch

  • He published his 1st paper in 1982

  • A letter to the BJ suggesting the use of checksums to “facilitate the detection of typographical & keyboard errors”

    • a true computer nerd!

Teresa K.Attwood

University of Manchester


Amos bairoch1

Amos Bairoch

  • Why did he do this?

  • In the process of developing PC/Gene, he typed in >1,000 protein sequences

    • some from the literature, most from the Atlas

      • by 1981, this was a large book & several supplements, & listed 1,660 proteins

      • it was not then available electronically

Teresa K.Attwood

University of Manchester


Amos bairoch2

Amos Bairoch

  • In 1983, he acquired a computer tape of the EMBL databank

    • this was version 2, with 811 sequences

  • In 1984, he received the 1st available computer tape copy of the Atlas

    • (which quickly became the PIR-PSD)

    • but he was deeply unhappy with the PIR format

Teresa K.Attwood

University of Manchester


Amos bairoch3

Amos Bairoch

  • So he decided to convert the PIR database into the semi-structured format of EMBL

    • part manually & part automatically

    • the result was PIR+

    • it was distributed as part of PC/Gene (now commercial)

  • In summer 1986, he decided to release the database independently of PC/Gene

    • so that it would be available to all, free of charge

Teresa K.Attwood

University of Manchester


Amos bairoch4

Amos Bairoch

  • The new database was called Swiss-Prot

  • The 1st release was made on 21 July 1986

    • the exact number of entries is unknown, as he can’t find the original floppy disks!

Teresa K.Attwood

University of Manchester


Key milestones3

Key milestones

30 entries

58 entries

~3,900 sequences

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

DDBJ, Swiss-Prot

PRINTS

PROSITE

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Global data overload

Global data overload

  • The number of sequences was growing

  • The number of structures was growing

  • So was the number of protein family signatures

  • Two extraordinary developments had yet to take place

    • what were they?

Teresa K.Attwood

University of Manchester


Key milestones4

Key milestones

~3,900 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

DDBJ, Swiss-Prot

PRINTS

PROSITE

FlyBase

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones5

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Original interpro partners

Original InterPro partners

Prosite

ProDom

PRINTS

ProDom

InterPro

Profiles

Pfam

Teresa K.Attwood

University of Manchester


What is interpro

What is InterPro?

“InterPro is an integrated documentation resource for protein families, domains & sites. By uniting databasesthat use different methodologies & a varying degree of biological information, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.”

Teresa K.Attwood

University of Manchester


The vision

The vision?

  • Naïvely, we wanted to make life easier!

  • We aimed to

    • simplify & rationalise protein family analysis

    • centralise & streamline the annotation process

      • & reduce manual annotation burdens

    • &, in the wake of all the genome projects, to facilitate automatic functional annotation of uncharacterised proteins

In fact (& now with 11 partners) we made life a lot harder!

But that’s another story…

Teresa K.Attwood

University of Manchester


Key milestones6

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones7

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

UniProt

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones8

185,231,366 sequences

517,100 sequences

10,867,798 sequences

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

UniProt

ENA

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones9

185,231,366 sequences

517,100 sequences

10,867,798 sequences

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

billions more

ARPAnet

Internet

www

email

insulin

hundreds more

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

UniProt

ENA

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

hundreds more

PDB

Teresa K.Attwood

University of Manchester


The central place of bioinformatics in modern biology

The central place of bioinformatics in modern biology

  • Hopefully, this potted history speaks for itself

  • In the last 30 years, bioinformatics has given us

    • the first ‘complete’ catalogues of DNA & protein sequences

      • including genomes & proteomes of organisms across the Tree of Life

    • software to analyse biological data on an unprecedented scale

    • & hence tools to help understand

      • more about evolutionary processes in general

      • our place on the Tree of Life in particular

      • &, ultimately, more about health & disease

  • It isn’t a panacea, but its contribution has been huge

Teresa K.Attwood

University of Manchester


Joint ebi wellcome trust

Recommended reading

A.B.Richon. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)

A.Bairoch (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64.

M.Ashburner (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Laboratory Press.

B.J.Strasser (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538.

Teresa K.Attwood

University of Manchester


  • Login