Joint ebi wellcome trust
Download
1 / 35

Joint EBI-Wellcome Trust - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Joint EBI-Wellcome Trust. Summer School 14-18 June 2010. Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Joint EBI-Wellcome Trust' - sian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Joint ebi wellcome trust

Joint EBI-Wellcome Trust

Summer School

14-18 June 2010


Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective

Teresa K.Attwood

University of Manchester


Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European

Teresa K.Attwood

University of Manchester


Concepts, bioinformatics in modern biology: historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European

Teresa K.Attwood

University of Manchester


Overview
Overview bioinformatics in modern biology:

  • Where the concept of bioinformatics originated

  • Some key milestones & key people

  • Its place in ‘the new biology’

Teresa K.Attwood

University of Manchester


Disclaimer
Disclaimer bioinformatics in modern biology:

  • Bear in mind that this is a personal view

  • That it’s hard

    • to step out of a situation & look back in

      • & remain objective

    • to separate the European & American histories

  • Observers from different perspectives will see & tell the story differently!

  • So this is just my perspective…

    • & it’s bound up with sequences & dbs

Teresa K.Attwood

University of Manchester


Origin of bioinformatics
Origin of bioinformatics bioinformatics in modern biology:

  • The origins of bioinformatics are rooted in sequence analysis

  • And driven by the desire to

    • collect them

    • annotate them

    • & analyse them

      • systematically (i.e., using computers)!

The concept ‘bioinformatics’ was barely known pre 1990…

Teresa K.Attwood

University of Manchester


Key milestones
Key milestones bioinformatics in modern biology:

ARPAnet

insulin

ribonuclease

Dayhoff Atlas

Teresa K.Attwood

University of Manchester


Margaret dayhoff 1925 1983
Margaret Dayhoff bioinformatics in modern biology: 1925-1983

  • Pioneered development of computer methods to compare protein sequences

    • & to derive evolutionary histories from alignments

  • Particularly interested in deducing evolutionary connections from sequence evidence

Teresa K.Attwood

University of Manchester


Margaret dayhoff
Margaret Dayhoff bioinformatics in modern biology:

  • Collected all the known protein sequences

    • made them available to the scientific community

  • In 1965, she compiled a book

    • the 1st Atlas of Protein Sequence and Structure

Teresa K.Attwood University of Manchester


Margaret dayhoff1
Margaret Dayhoff bioinformatics in modern biology:

Teresa K.Attwood

University of Manchester


Key milestones1
Key milestones bioinformatics in modern biology:

7 structures

65 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

PDB

Teresa K.Attwood

University of Manchester


Data overload in the usa
Data overload in the USA bioinformatics in modern biology:

Teresa K.Attwood

University of Manchester


Data overload in the usa1
Data overload in the USA bioinformatics in modern biology:

Teresa K.Attwood

University of Manchester


Data overload in europe
Data overload in Europe bioinformatics in modern biology:

  • The data overload problem had also been noticed in Europe

  • The solution was to create the 1st nucleotide sequence database

    • this was the EMBL databank

      • this preceded the 1st release of GenBank by ~6 months

Teresa K.Attwood

University of Manchester


Key milestones2
Key milestones bioinformatics in modern biology:

7 structures

65 sequences

859 sequences

568 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

PIR-PSD

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Enter amos bairoch
Enter Amos Bairoch bioinformatics in modern biology:

  • A crazy postgrad student in Switzerland

    • interested in space exploration & the search for ET life

  • His project was to develop software to analyse protein & nucleotide sequences

    • PC/Gene

Teresa K.Attwood

University of Manchester


Amos bairoch
Amos Bairoch bioinformatics in modern biology:

  • He published his 1st paper in 1982

  • A letter to the BJ suggesting the use of checksums to “facilitate the detection of typographical & keyboard errors”

    • a true computer nerd!

Teresa K.Attwood

University of Manchester


Amos bairoch1
Amos Bairoch bioinformatics in modern biology:

  • Why did he do this?

  • In the process of developing PC/Gene, he typed in >1,000 protein sequences

    • some from the literature, most from the Atlas

      • by 1981, this was a large book & several supplements, & listed 1,660 proteins

      • it was not then available electronically

Teresa K.Attwood

University of Manchester


Amos bairoch2
Amos Bairoch bioinformatics in modern biology:

  • In 1983, he acquired a computer tape of the EMBL databank

    • this was version 2, with 811 sequences

  • In 1984, he received the 1st available computer tape copy of the Atlas

    • (which quickly became the PIR-PSD)

    • but he was deeply unhappy with the PIR format

Teresa K.Attwood

University of Manchester


Amos bairoch3
Amos Bairoch bioinformatics in modern biology:

  • So he decided to convert the PIR database into the semi-structured format of EMBL

    • part manually & part automatically

    • the result was PIR+

    • it was distributed as part of PC/Gene (now commercial)

  • In summer 1986, he decided to release the database independently of PC/Gene

    • so that it would be available to all, free of charge

Teresa K.Attwood

University of Manchester


Amos bairoch4
Amos Bairoch bioinformatics in modern biology:

  • The new database was called Swiss-Prot

  • The 1st release was made on 21 July 1986

    • the exact number of entries is unknown, as he can’t find the original floppy disks!

Teresa K.Attwood

University of Manchester


Key milestones3
Key milestones bioinformatics in modern biology:

30 entries

58 entries

~3,900 sequences

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

DDBJ, Swiss-Prot

PRINTS

PROSITE

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Global data overload
Global data overload bioinformatics in modern biology:

  • The number of sequences was growing

  • The number of structures was growing

  • So was the number of protein family signatures

  • Two extraordinary developments had yet to take place

    • what were they?

Teresa K.Attwood

University of Manchester


Key milestones4
Key milestones bioinformatics in modern biology:

~3,900 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

DDBJ, Swiss-Prot

PRINTS

PROSITE

FlyBase

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones5
Key milestones bioinformatics in modern biology:

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Original interpro partners
Original InterPro partners bioinformatics in modern biology:

Prosite

ProDom

PRINTS

ProDom

InterPro

Profiles

Pfam

Teresa K.Attwood

University of Manchester


What is interpro
What is InterPro? bioinformatics in modern biology:

“InterPro is an integrated documentation resource for protein families, domains & sites. By uniting databasesthat use different methodologies & a varying degree of biological information, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.”

Teresa K.Attwood

University of Manchester


The vision
The vision? bioinformatics in modern biology:

  • Naïvely, we wanted to make life easier!

  • We aimed to

    • simplify & rationalise protein family analysis

    • centralise & streamline the annotation process

      • & reduce manual annotation burdens

    • &, in the wake of all the genome projects, to facilitate automatic functional annotation of uncharacterised proteins

In fact (& now with 11 partners) we made life a lot harder!

But that’s another story…

Teresa K.Attwood

University of Manchester


Key milestones6
Key milestones bioinformatics in modern biology:

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones7
Key milestones bioinformatics in modern biology:

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

UniProt

DDBJ, Swiss-Prot

PRINTS

FlyBase

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones8

185,231,366 sequences bioinformatics in modern biology:

517,100 sequences

10,867,798 sequences

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

ARPAnet

Internet

www

email

insulin

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

UniProt

ENA

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

PDB

Teresa K.Attwood

University of Manchester


Key milestones9

185,231,366 sequences bioinformatics in modern biology:

517,100 sequences

10,867,798 sequences

Key milestones

2,423entries

~3,900 sequences

70,000 sequences

58 entries

30 entries

7 structures

859 sequences

65 sequences

568 sequences

billions more

ARPAnet

Internet

www

email

insulin

hundreds more

DNA sequencing

C.elegans genome

H.sapiens genome

ribonuclease

Dayhoff Atlas

HT DNA sequencing

S.cerevisae genome

M.jannachii genome

H.influenzae genome

Auto DNA sequencing

Auto protein sequencers

D.Melanogaster genome

DDBJ, Swiss-Prot

PRINTS

FlyBase

UniProt

ENA

PROSITE

Pfam

InterPro

TrEMBL

PIR

EMBL, GenBank

hundreds more

PDB

Teresa K.Attwood

University of Manchester


The central place of bioinformatics in modern biology
The central place of bioinformatics in modern biology bioinformatics in modern biology:

  • Hopefully, this potted history speaks for itself

  • In the last 30 years, bioinformatics has given us

    • the first ‘complete’ catalogues of DNA & protein sequences

      • including genomes & proteomes of organisms across the Tree of Life

    • software to analyse biological data on an unprecedented scale

    • & hence tools to help understand

      • more about evolutionary processes in general

      • our place on the Tree of Life in particular

      • &, ultimately, more about health & disease

  • It isn’t a panacea, but its contribution has been huge

Teresa K.Attwood

University of Manchester


Recommended reading bioinformatics in modern biology:

A.B.Richon. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)

A.Bairoch (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64.

M.Ashburner (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Laboratory Press.

B.J.Strasser (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538.

Teresa K.Attwood

University of Manchester


ad