The tree of life challenges for discrete mathematics and theoretical computer science
Download
1 / 48

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science. Fred S. Roberts DIMACS Rutgers University. The tree of life problem raises new challenges for mathematics and computer science just as it does for biological science.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science' - luz


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The tree of life challenges for discrete mathematics and theoretical computer science
The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science

Fred S. Roberts

DIMACS

Rutgers University


The tree of life problem raises new challenges for mathematics and computer science just as it does for biological science.





These are some of the motivations for this meeting. define the agenda and develop the tools of this field.

I will lay out some of the challenges for math and CS, with emphasis on discrete math and theoretical CS.


What are dm and tcs

What are DM and TCS? define the agenda and develop the tools of this field.

DM deals with:

arrangements

designs

codes

patterns

schedules

assignments


Tcs deals with the theory of computer algorithms

TCS deals with the theory of computer algorithms. define the agenda and develop the tools of this field.

During the first 30-40 years of the computer age, TCS, aided by powerful mathematical methods, had a direct impact on technology, by developing models, data structures, algorithms, and lower bounds that are now at the core of computing.


DM and TCS have found extensive use in many areas of science and public policy, for example in Molecular Biology.

These tools seem especially relevant to problems of the tree of life


Dm and tcs continued

DM and TCS Continued and public policy, for example in Molecular Biology.

These tools are made especially relevant to the tree of life problem because of:

Geographic Information Systems


Dm and tcs continued1

DM and TCS Continued and public policy, for example in Molecular Biology.

Availability of large and disparate computerized databases on subjects relating to species and the relevance of modern methods of data mining.


Outline
Outline and public policy, for example in Molecular Biology.

  • Phylogenetic Tree Reconstruction

  • Database Issues

  • Nomenclature

  • Setting up a Species Bank

  • Digitization of Natural History Collections

  • Interoperability

  • The Many Applications of Research on the Tree of Life


Phylogenetic tree reconstruction

Phylogenetic Tree Reconstruction and public policy, for example in Molecular Biology.


Phylogeny continued

Phylogeny (continued) and public policy, for example in Molecular Biology.

New methods of phylogenetic tree reconstruction owe a significant amount to modern methods of DM/TCS.

Trees, supertrees, consensus trees will all be discussed at length in this meeting

I will only make a few brief remarks about them.


Phylogenetic challenges for dm tcs

Phylogenetic Challenges for DM/TCS and public policy, for example in Molecular Biology.

Tailoring phylogenetic methods to describe the idiosyncracies of viral evolution -- going beyond a binary tree with a small number of contemporaneous species appearing as leaves.

Dealing with trees of thousands of vertices, many of high degree.

Making use of data about species at internal vertices (e.g., when data comes from serial sampling of patients).


Phylogenetic challenges for dm tcs continued

Phylogenetic Challenges for DM/TCS: Continued and public policy, for example in Molecular Biology.

Network representations of evolutionary history - if recombination has taken place.

Modeling viral evolution by a collection of trees -- to recognize the “quasispecies” nature of viruses.

Devising fast methods to average the quantities of interest over all likely trees.

Thanks to Eddie Holmes and Mike Steel for ideas.

DIMACS Working Group on Phylogenetic Trees and Rapidly Evolving Diseases, Sept. 3-6, 2003


Database issues
Database Issues and public policy, for example in Molecular Biology.

  • Assembling the tree of life requires collecting massive amounts of data about the world’s scientific species.

  • Making it a collaborative project requires making such data universally available.

  • There are great challenges for Math and CS, specifically DM and TCS.

    Thanks to the Global Biodiversity Information Facility (GBIF) for many of the following ideas.


Complexity of data
Complexity of Data and public policy, for example in Molecular Biology.

  • In many ways, data about the world’s species are far more complex than genetic or protein sequence data. (GBIF)


Complexity of data cont d
Complexity of Data (cont’d) and public policy, for example in Molecular Biology.

  • There are databases of images, databases in numerous forms, etc.

  • Data is heterogeneous.

  • Data has errors and inconsistencies.


Nomenclature
Nomenclature and public policy, for example in Molecular Biology.

  • There are some 1.75M named species

  • By some estimates, there are up to 10M actual species.


Nomenclature cont d
Nomenclature (cont’d) and public policy, for example in Molecular Biology.

  • The same species is often named more than once.

  • On the average, each species has two additional names (synonyms) besides its own name. (GBIF)


Nomenclature cont d1
Nomenclature (cont’d) and public policy, for example in Molecular Biology.

  • Thus, there is need to assemble names in an electronic catalogue, with synonyms and common misspellings.

  • This would be of fundamental importance in aiding research on biodiversity.


Nomenclature cont d2
Nomenclature (cont’d) and public policy, for example in Molecular Biology.

  • Because of errors, one major challenge for TCS is data cleaning.


Nomenclature cont d3
Nomenclature (cont’d) and public policy, for example in Molecular Biology.

  • Another challenge is to search a database to see if two entries are similar.

  • This is a standard problem in database theory.

  • TCS algorithms involving k-nearest neighbor and other methods are very helpful here.


Setting up a species bank
Setting up a Species Bank and public policy, for example in Molecular Biology.


Setting up a species bank cont d
Setting up a Species Bank (cont’d) and public policy, for example in Molecular Biology.

  • A species bank would provide not only names, but also data about a species:

    • Type

    • Distribution

    • Ecological role

    • Phylogenetic history

    • Physiology

    • Genomics

  • This involves issues about huge datasets.


Setting up a species bank cont d1
Setting up a Species Bank (cont’d) and public policy, for example in Molecular Biology.

  • NASA earth science satellites alone beam home image data at the rate of 1.2 terabytes a day.

  • By 2010, this is expected to grow to 10 petabytes a day. (Kathleen Bergen, U. Michigan)


Setting up a species bank cont d2
Setting up a Species Bank (cont’d) and public policy, for example in Molecular Biology.

  • The problem is even worse: We need to combine information from many databases.

  • There is no known way to catalogue all species of plants in one place given current database systems techniques. (Jessie Kennedy, Napier University, Edinburgh)


Setting up a species bank cont d3
Setting up a Species Bank (cont’d) and public policy, for example in Molecular Biology.

  • One possible approach: Tree and graph methods to support overlapping classifications as directed acyclic graphs or with complex objects (taxa or specimens) as nodes. (Jessie Kennedy)


Digitizing natural history collections
Digitizing Natural History Collections and public policy, for example in Molecular Biology.

  • It has been estimated that there are between 1.5 and 3 Billion specimens in the world’s natural history collections, including herbaria, living microorganism stock centers, and other repositories (GBIF).


Digitizing natural history collections cont d
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • If we could digitize information about these specimens, and make them available, we would “have a treasure trove of information about the world’s biota.” (GBIF)

  • Pilot projects have shown that utilizing digitized data from several institutions’ databases can be a powerful tool. (GBIF)


Digitizing natural history collections cont d1
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: digitization and reference of non-standard data (photos, sonograms, field notes)


Digitizing natural history collections cont d2
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: Develop methods for visualizing the data (e.g., species’ distributions)


Digitizing natural history collections cont d3
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: Develop search engines for real-time searching of such extremely large data sets.


Digitizing natural history collections cont d4
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: Make information access on the web more knowledge-based so humans and intelligent software can work together. (Susan Gauch, U. Kansas)


Digitizing natural history collections cont d5
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: Use “intelligent agents” to organize and present relevant information on the web. (Susan Gauch)


Digitizing natural history collections cont d6
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Challenge: Use partial information as “training data” for classification algorithms (Susan Gauch)

  • One approach: Use training data and classification algorithms with learning capabilities.

    (See: DIMACS project on Monitoring Message Streams)


Digitizing natural history collections cont d7
Digitizing Natural History Collections (cont’d) and public policy, for example in Molecular Biology.

  • Another approach to problems posed by digitization: Use tools of “knowledge inferencing” (Yannis Ioannidis, University of Wisconsin)

  • Still another approach: Use methods of spatio-temporal data mining (Ioannidis; see work of Muthukrishnan at Rutgers)


Interoperability
Interoperability and public policy, for example in Molecular Biology.

  • Goal: Devise standards for datasets so as to allow researchers to collaborate across datasets – develop standards leading to database interoperability. (GBIF)


Interoperability1
Interoperability and public policy, for example in Molecular Biology.

  • Challenge: How do we develop ways to more accurately represent observational or experimental data so that others may use them? (Jessie Kennedy)

  • Challenge: Deal with issues of inconsistency and scalability.

  • Challenge: Formalize issues of policy with regard to others’ databases.

  • Challenge: Interoperability over a diversity of users and types of equipment.


Interoperability2
Interoperability and public policy, for example in Molecular Biology.

  • One approach: “Semantic Web” – the idea used to express the growing desire to make information access on the Web more knowledge-based so humans and intelligent software can work together. (Susan Gauch)


Interoperability3
Interoperability and public policy, for example in Molecular Biology.

  • Another approach: Make use of languages such as XML developed to aid interoperability in business and military collaborations.


The many applications of research on the tree of life
The Many Applications of Research on the Tree of Life and public policy, for example in Molecular Biology.

  • Side benefits in many fields:

    • Agriculture

    • Biomedicine

    • Biotechnology

    • Natural resource management

    • Pest control

    • Control of emergent diseases

    • Sustainable use of biodiversity resources

    • Global climate change


The many applications of research on the tree of life1
The Many Applications of Research on the Tree of Life and public policy, for example in Molecular Biology.

  • Let’s say you’re importing bananas from South America


The many applications of research on the tree of life2
The Many Applications of Research on the Tree of Life and public policy, for example in Molecular Biology.

  • A camera in the hold of the ship sees a spider.

  • What kind of spider is it?

  • Is it safe to unload your cargo of bananas?


The many applications of research on the tree of life3
The Many Applications of Research on the Tree of Life and public policy, for example in Molecular Biology.

  • Luckily, you have a digitized natural history database.

  • With an efficient search feature.

    (Thanks to Diana Lipscomb for this example)


The many applications of research on the tree of life4
The Many Applications of Research on the Tree of Life and public policy, for example in Molecular Biology.


ad