Concepts historical milestones the central place of bioinformatics in modern biology
Download
1 / 33

Concepts, historical milestones & the central place of bioinformatics in modern biology: - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

Concepts, historical milestones & the central place of bioinformatics in modern biology:. a European perspective. Overview. Where the term bioinformatics originated Where the ‘ modern ’ concept originated Some key events & folk Its place in ‘ the new biology ’. Origin of Bioinformatics.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Concepts, historical milestones & the central place of bioinformatics in modern biology: ' - delta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Concepts historical milestones the central place of bioinformatics in modern biology
Concepts, historical milestones & the central place of bioinformatics in modern biology:

a European perspective

Teresa K.Attwood

University of Manchester


Overview
Overview bioinformatics in modern biology:

  • Where the term bioinformatics originated

  • Where the ‘modern’concept originated

  • Some key events & folk

  • Its place in‘the new biology’

Teresa K.Attwood

University of Manchester


Origin of bioinformatics
Origin of Bioinformatics bioinformatics in modern biology:

  • The origin of the term‘bioinformatics’ has been attributed to Paulien Hogeweg

    • Dutch theoretical biologist

  • She & colleague Ben Hesper coined the term in the early ‘70s, defining it as

    • “the study of informatic processes in biotic systems”

      • Hogeweg, P. (2011) The roots of bioinformatics in theoretical biology. PLoS Computational Biology, 7(3), e1002021

  • The term failed to gain traction for ~20 years

Teresa K.Attwood

University of Manchester


Origin of bioinformatics1
Origin of Bioinformatics bioinformatics in modern biology:

  • The origins of the ‘modern’concept of bioinformatics are rooted in sequenceanalysis

  • Driven by the desire to

    • collect

    • annotate

    • & analyse sequence data

      • systematically (i.e., using computers)!

This concept of‘bioinformatics’was barely known pre 1990…

Teresa K.Attwood

University of Manchester


Key milestones

GIVEQCCASVCSLYQLENYCN bioinformatics in modern biology:

Key milestones

FVNQHLCGSHLVEALYLVCGERGFFYTPKA

CSD

1950 1960 1970 1980 1990 2000 2010

insulin

ribonuclease

Dayhoff Atlas


Margaret dayhoff 1925 1983
Margaret bioinformatics in modern biology: Dayhoff1925-1983

  • Pioneer of computer methods to compare proteins

    • & to derive evolutionary histories from alignments

  • Particular interest in deducing evolutionary connections from sequence evidence

Teresa K.Attwood

University of Manchester


Margaret dayhoff
Margaret bioinformatics in modern biology: Dayhoff

  • Collected all the known protein sequences

    • made them available to the scientific community

  • In 1965, she compiled a book

    • Atlas of Protein Sequence & Structure

Teresa K.Attwood

University of Manchester


Margaret dayhoff1
Margaret bioinformatics in modern biology: Dayhoff

“There is a tremendous amount of information regarding the evolutionary history and biochemical function implicit in each sequence andthe number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it”

M.O.Dayhoff to C.Berkley, February 27, 1967

Strasser, B. (2008)

“GenBank – Natural history in the 21st century?”

Science, 322, 537-538

Teresa K.Attwood

University of Manchester


Key milestones1
Key milestones bioinformatics in modern biology:

CSD

PDB

ARPAnet

Exam 1

What pernicious, life-changing development occurred in 1971?

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

65

7


Data overload in the usa
Data overload in the USA bioinformatics in modern biology:

“the rate limiting step in the process of nucleic acid sequencing is now shifting from data acquisition towards the organization and analysis of that data”

Gingeras, T.R. & Roberts, R.J. (1980)

“Steps toward Computer Analysis of Nucleotide Sequences,”

Science, 209, 1322-1328

Teresa K.Attwood

University of Manchester


Data overload in the usa1
Data overload in the USA bioinformatics in modern biology:

“a centralized data bank [is] essential for the efficient use of nucleic acid sequence information”

C.Anderson, Minutes, 1980

Teresa K.Attwood

University of Manchester


Data overload in europe
Data overload in Europe bioinformatics in modern biology:

  • While the US debated where to locate a new centralised resource, EMBL acted…

  • The 1st internationally funded, public ‘central’ nucleotide sequence database was thus European

    • the EMBL data library, Heidelberg

      • preceded the 1st release of GenBank by ~6 months

Attwood, T.K. et al. (2011)

Concepts, Historical Milestones & the Central Place of Bioinformatics in Modern Biology:

A European Perspective

In Bioinformatics - Trends & Methodologies, Intech Online Publishers,

Teresa K.Attwood

University of Manchester


Data overload in europe1
Data overload in Europe bioinformatics in modern biology:

  • Copies of the EMBL data library & GenBank were being maintained in Cambridge

    • together with their search tools, etc.

  • An integrated system gave access to the dbs & tools

    • “this system is presently being used by over 30 researchers in 8 departments in the University & in local research institutes. These users can keep in touch with each other via the MAIL command”!

Teresa K.Attwood

University of Manchester


Key milestones2
Key milestones bioinformatics in modern biology:

PIR

EMBL, GenBank

CSD

PDB

ARPAnet

Internet

email

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

568

65

859

7


Enter amos bairoch
Enter Amos bioinformatics in modern biology: Bairoch

  • A crazy postgrad student in Switzerland

    • interested in space exploration & the search for ET life

  • His project was to develop s/w to analyse protein & nucleotide sequences

    • PC/Gene

Teresa K.Attwood

University of Manchester


Amos bairoch
Amos bioinformatics in modern biology: Bairoch

  • Published his 1st paper in 1982

    • a letter to the BJ

  • Suggested use of checksums

    • “tofacilitate detection of typographical & keyboard errors”

Teresa K.Attwood

University of Manchester


Amos bairoch1
Amos bioinformatics in modern biology: Bairoch

  • Why?

  • Alongside PC/Gene, he needed to supply a db

  • The Atlas wasn’t available electronically

    • typed in >1,000 protein sequences

    • some from the literature

    • most from the Atlas

      • by 1981, this was a large book, plus several supplements, listing 1,660 proteins

Teresa K.Attwood

University of Manchester


Amos bairoch2
Amos bioinformatics in modern biology: Bairoch

  • In 1983, he acquired a computer tape of the EMBL Data Library

    • version 2, with 811 sequences

  • In 1984, he received the 1st available computer tape copy of the Atlas

    • (which became known as the PIR-PSD)

    • but… he disliked the PIR format

Teresa K.Attwood

University of Manchester


Amos bairoch3
Amos bioinformatics in modern biology: Bairoch

  • So he converted the PIR database into the semi-structured format of EMBL

    • part manually & part automatically

  • The result was PIR+

    • & was distributed as part of PC/Gene (now commercial)

  • In summer 1986, he finally released the database independently of PC/Gene

    • to make it available to all, free of charge

Teresa K.Attwood

University of Manchester


Amos bairoch4
Amos bioinformatics in modern biology: Bairoch

  • This new database was called Swiss-Prot

  • 1st released on 21 July 1986

    • the exact number of entries is unknown, as he lostthe original floppy disks!

Teresa K.Attwood

University of Manchester


Amos bairoch5
Amos bioinformatics in modern biology: Bairoch

  • As part of his work on PC/Gene, he created another key database

    • diagnostic tool for characterising protein families

  • 1st released March1989, with 58 entries

    • this was PROSITE

  • Philosophy of his approach

    • coupling high quality data analysis with manual annotation

Teresa K.Attwood

University of Manchester


Characterising protein families
Characterising bioinformatics in modern biology: protein families

PROSITE

[IVM]-[AS]-L-W-S-L-V2-L-A-[IV]-E-R-Y-[IV]3-C-K-P-M

PRINTS

Teresa K Attwood

University of Manchester


The burden of maintenance
The burden of maintenance bioinformatics in modern biology:

  • Database annotation…

Database

Maintenance

Nirvana

Database annotation

Teresa K Attwood

University of Manchester


Amos bairoch s lament
Amos bioinformatics in modern biology: Bairoch’s lament

“It is quite depressive to think that we are spending millions in grants for people to perform experiments, produce new knowledge, hide this knowledge in often badly written text and then spend some more millions trying to second guess what the authors really did and found”

Bairoch, A. (2009)

The future of annotation/biocuration

Nature Precedings

Teresa K Attwood

University of Manchester


Key milestones3
Key milestones bioinformatics in modern biology:

PRINTS

PROSITE

Swiss-Prot

PIR

EMBL, GenBank

CSD

PDB

ARPAnet

Internet

email

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

ribonuclease

Dayhoff Atlas

Auto DNA sequencing

Auto protein sequencers

568

65

859

7

3,900


Global data overload
Global data overload bioinformatics in modern biology:

  • The number of sequences was growing

  • The number of structures was growing

  • The number of protein family signatures was growing

Exam 2

Two extraordinary developments had yet to take place. What were they?

Teresa K.Attwood

University of Manchester


Key milestones4
Key milestones bioinformatics in modern biology:

PRINTS

PROSITE

Pfam

InterPro

Swiss-Prot

TrEMBL

FlyBase

PIR

EMBL, GenBank

CSD

PDB

ARPAnet

Internet

email

www

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

H.sapiensgenome

C.elegansgenome

ribonuclease

Dayhoff Atlas

S.cerevisaegenome

HT DNA sequencing

H.influenzae genome

Auto DNA sequencing

D.melanogastergenome

Auto protein sequencers

568

65

859

7

2,423

3,900

105,000


Prosite bioinformatics in modern biology:

HAMAP

PIRSF

PRINTS

ProDom

InterPro

Gene3D

SUPERFAMILY

TIGRFAM

PANTHER

Pfam

Profiles

SMART


Key milestones5
Key milestones bioinformatics in modern biology:

EMBnet

ELIXIR

NCBI

SIB

EBI

PRINTS

PROSITE

Pfam

InterPro

Swiss-Prot

TrEMBL

FlyBase

UniProt

ENA

PIR

EMBL, GenBank

CSD

PDB

ARPAnet

Internet

email

www

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

H.sapiensgenome

C.elegansgenome

ribonuclease

Dayhoff Atlas

S.cerevisaegenome

HT DNA sequencing

H.influenzae genome

Auto DNA sequencing

D.melanogastergenome

Auto protein sequencers

568

65

859

7

2,423

3,900

105,000

>500B

36.0M


Key milestones6
Key milestones bioinformatics in modern biology:

EMBnet

ELIXIR

NCBI

SIB

EBI

PRINTS

PROSITE

Pfam

InterPro

Swiss-Prot

TrEMBL

FlyBase

UniProt

ENA

PIR

EMBL, GenBank

CSD

PDB

hundreds more

ARPAnet

Internet

email

www

1950 1960 1970 1980 1990 2000 2010

insulin

DNA sequencing

H.sapiensgenome

C.elegansgenome

ribonuclease

Dayhoff Atlas

thousands more

S.cerevisaegenome

HT DNA sequencing

H.influenzae genome

Auto DNA sequencing

D.melanogastergenome

Auto protein sequencers

billions more

568

65

859

7

2,423

3,900

105,000

>500B

36.0M


Scary monsters

Red Line bioinformatics in modern biology:

Growth of EMBL since its inception

Scary monsters!

282 M

By2020, NGS & 3Gen technologies will be producing data a million times faster than the current rate

Green Line

Growth of manually annotated Swiss-Prot

35 M

540 K

84 K

Blue Line

Growth of PDB


The central place of bioinformatics in modern biology
The central place of bioinformatics in modern biology bioinformatics in modern biology:

  • Hopefully, this potted history speaks for itself

  • In the last 30 years, bioinformatics has given us

    • the first ‘complete’ catalogues of DNA & protein sequences

      • including genomes & proteomes of organisms across the Tree of Life

    • software to analyse biological data on an unprecedented scale

    • & hence tools to help understand

      • more about evolutionary processes in general

      • our place on the Tree of Life in particular

      • &, ultimately, more about health & disease

  • It isn’t a panacea, but its contribution has been huge

Teresa K.Attwood

University of Manchester


Recommended reading bioinformatics in modern biology:

Richon, A.B. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)

Bairoch, A. (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64.

Ashburner, M. (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Lab. Press

Strasser, B.J. (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538.

Attwood, T.K., Gisel, A., Eriksson, N-E. & Bongcam-Rudloff, E. (2011) Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European Perspective

Teresa K.Attwood

University of Manchester


ad