The tree of life challenges for discrete mathematics and theoretical computer science
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science. Fred S. Roberts DIMACS Rutgers University. The tree of life problem raises new challenges for mathematics and computer science just as it does for biological science.

Download Presentation

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The tree of life challenges for discrete mathematics and theoretical computer science

The Tree of Life: Challenges for Discrete Mathematics and Theoretical Computer Science

Fred S. Roberts

DIMACS

Rutgers University


The tree of life challenges for discrete mathematics and theoretical computer science

The tree of life problem raises new challenges for mathematics and computer science just as it does for biological science.


The tree of life challenges for discrete mathematics and theoretical computer science

  • For math. and CS to become more effectively utilized, we need to:

  • develop new tools;

  • establish working partnerships between mathematical scientists and biological scientists;

  • introduce the two communities to each others’ problems, language, and tools;

  • .


The tree of life challenges for discrete mathematics and theoretical computer science

  • introduce outstanding junior researchers from both sides to the issues, problems, and challenges of problems arising from the tree of life;


The tree of life challenges for discrete mathematics and theoretical computer science

  • involve biological and mathematical scientists together to define the agenda and develop the tools of this field.


The tree of life challenges for discrete mathematics and theoretical computer science

These are some of the motivations for this meeting.

I will lay out some of the challenges for math and CS, with emphasis on discrete math and theoretical CS.


What are dm and tcs

What are DM and TCS?

DM deals with:

arrangements

designs

codes

patterns

schedules

assignments


Tcs deals with the theory of computer algorithms

TCS deals with the theory of computer algorithms.

During the first 30-40 years of the computer age, TCS, aided by powerful mathematical methods, had a direct impact on technology, by developing models, data structures, algorithms, and lower bounds that are now at the core of computing.


The tree of life challenges for discrete mathematics and theoretical computer science

DM and TCS have found extensive use in many areas of science and public policy, for example in Molecular Biology.

These tools seem especially relevant to problems of the tree of life


Dm and tcs continued

DM and TCS Continued

These tools are made especially relevant to the tree of life problem because of:

Geographic Information Systems


Dm and tcs continued1

DM and TCS Continued

Availability of large and disparate computerized databases on subjects relating to species and the relevance of modern methods of data mining.


Outline

Outline

  • Phylogenetic Tree Reconstruction

  • Database Issues

  • Nomenclature

  • Setting up a Species Bank

  • Digitization of Natural History Collections

  • Interoperability

  • The Many Applications of Research on the Tree of Life


Phylogenetic tree reconstruction

Phylogenetic Tree Reconstruction


Phylogeny continued

Phylogeny (continued)

New methods of phylogenetic tree reconstruction owe a significant amount to modern methods of DM/TCS.

Trees, supertrees, consensus trees will all be discussed at length in this meeting

I will only make a few brief remarks about them.


Phylogenetic challenges for dm tcs

Phylogenetic Challenges for DM/TCS

Tailoring phylogenetic methods to describe the idiosyncracies of viral evolution -- going beyond a binary tree with a small number of contemporaneous species appearing as leaves.

Dealing with trees of thousands of vertices, many of high degree.

Making use of data about species at internal vertices (e.g., when data comes from serial sampling of patients).


Phylogenetic challenges for dm tcs continued

Phylogenetic Challenges for DM/TCS: Continued

Network representations of evolutionary history - if recombination has taken place.

Modeling viral evolution by a collection of trees -- to recognize the “quasispecies” nature of viruses.

Devising fast methods to average the quantities of interest over all likely trees.

Thanks to Eddie Holmes and Mike Steel for ideas.

DIMACS Working Group on Phylogenetic Trees and Rapidly Evolving Diseases, Sept. 3-6, 2003


Database issues

Database Issues

  • Assembling the tree of life requires collecting massive amounts of data about the world’s scientific species.

  • Making it a collaborative project requires making such data universally available.

  • There are great challenges for Math and CS, specifically DM and TCS.

    Thanks to the Global Biodiversity Information Facility (GBIF) for many of the following ideas.


Complexity of data

Complexity of Data

  • In many ways, data about the world’s species are far more complex than genetic or protein sequence data. (GBIF)


Complexity of data cont d

Complexity of Data (cont’d)

  • There are databases of images, databases in numerous forms, etc.

  • Data is heterogeneous.

  • Data has errors and inconsistencies.


Nomenclature

Nomenclature

  • There are some 1.75M named species

  • By some estimates, there are up to 10M actual species.


Nomenclature cont d

Nomenclature (cont’d)

  • The same species is often named more than once.

  • On the average, each species has two additional names (synonyms) besides its own name. (GBIF)


Nomenclature cont d1

Nomenclature (cont’d)

  • Thus, there is need to assemble names in an electronic catalogue, with synonyms and common misspellings.

  • This would be of fundamental importance in aiding research on biodiversity.


Nomenclature cont d2

Nomenclature (cont’d)

  • Because of errors, one major challenge for TCS is data cleaning.


Nomenclature cont d3

Nomenclature (cont’d)

  • Another challenge is to search a database to see if two entries are similar.

  • This is a standard problem in database theory.

  • TCS algorithms involving k-nearest neighbor and other methods are very helpful here.


Setting up a species bank

Setting up a Species Bank


Setting up a species bank cont d

Setting up a Species Bank (cont’d)

  • A species bank would provide not only names, but also data about a species:

    • Type

    • Distribution

    • Ecological role

    • Phylogenetic history

    • Physiology

    • Genomics

  • This involves issues about huge datasets.


Setting up a species bank cont d1

Setting up a Species Bank (cont’d)

  • NASA earth science satellites alone beam home image data at the rate of 1.2 terabytes a day.

  • By 2010, this is expected to grow to 10 petabytes a day. (Kathleen Bergen, U. Michigan)


Setting up a species bank cont d2

Setting up a Species Bank (cont’d)

  • The problem is even worse: We need to combine information from many databases.

  • There is no known way to catalogue all species of plants in one place given current database systems techniques. (Jessie Kennedy, Napier University, Edinburgh)


Setting up a species bank cont d3

Setting up a Species Bank (cont’d)

  • One possible approach: Tree and graph methods to support overlapping classifications as directed acyclic graphs or with complex objects (taxa or specimens) as nodes. (Jessie Kennedy)


Digitizing natural history collections

Digitizing Natural History Collections

  • It has been estimated that there are between 1.5 and 3 Billion specimens in the world’s natural history collections, including herbaria, living microorganism stock centers, and other repositories (GBIF).


Digitizing natural history collections cont d

Digitizing Natural History Collections (cont’d)

  • If we could digitize information about these specimens, and make them available, we would “have a treasure trove of information about the world’s biota.” (GBIF)

  • Pilot projects have shown that utilizing digitized data from several institutions’ databases can be a powerful tool. (GBIF)


Digitizing natural history collections cont d1

Digitizing Natural History Collections (cont’d)

  • Challenge: digitization and reference of non-standard data (photos, sonograms, field notes)


Digitizing natural history collections cont d2

Digitizing Natural History Collections (cont’d)

  • Challenge: Develop methods for visualizing the data (e.g., species’ distributions)


Digitizing natural history collections cont d3

Digitizing Natural History Collections (cont’d)

  • Challenge: Develop search engines for real-time searching of such extremely large data sets.


Digitizing natural history collections cont d4

Digitizing Natural History Collections (cont’d)

  • Challenge: Make information access on the web more knowledge-based so humans and intelligent software can work together. (Susan Gauch, U. Kansas)


Digitizing natural history collections cont d5

Digitizing Natural History Collections (cont’d)

  • Challenge: Use “intelligent agents” to organize and present relevant information on the web. (Susan Gauch)


Digitizing natural history collections cont d6

Digitizing Natural History Collections (cont’d)

  • Challenge: Use partial information as “training data” for classification algorithms (Susan Gauch)

  • One approach: Use training data and classification algorithms with learning capabilities.

    (See: DIMACS project on Monitoring Message Streams)


Digitizing natural history collections cont d7

Digitizing Natural History Collections (cont’d)

  • Another approach to problems posed by digitization: Use tools of “knowledge inferencing” (Yannis Ioannidis, University of Wisconsin)

  • Still another approach: Use methods of spatio-temporal data mining (Ioannidis; see work of Muthukrishnan at Rutgers)


Interoperability

Interoperability

  • Goal: Devise standards for datasets so as to allow researchers to collaborate across datasets – develop standards leading to database interoperability. (GBIF)


Interoperability1

Interoperability

  • Challenge: How do we develop ways to more accurately represent observational or experimental data so that others may use them? (Jessie Kennedy)

  • Challenge: Deal with issues of inconsistency and scalability.

  • Challenge: Formalize issues of policy with regard to others’ databases.

  • Challenge: Interoperability over a diversity of users and types of equipment.


Interoperability2

Interoperability

  • One approach: “Semantic Web” – the idea used to express the growing desire to make information access on the Web more knowledge-based so humans and intelligent software can work together. (Susan Gauch)


Interoperability3

Interoperability

  • Another approach: Make use of languages such as XML developed to aid interoperability in business and military collaborations.


The many applications of research on the tree of life

The Many Applications of Research on the Tree of Life

  • Side benefits in many fields:

    • Agriculture

    • Biomedicine

    • Biotechnology

    • Natural resource management

    • Pest control

    • Control of emergent diseases

    • Sustainable use of biodiversity resources

    • Global climate change


The many applications of research on the tree of life1

The Many Applications of Research on the Tree of Life

  • Let’s say you’re importing bananas from South America


The many applications of research on the tree of life2

The Many Applications of Research on the Tree of Life

  • A camera in the hold of the ship sees a spider.

  • What kind of spider is it?

  • Is it safe to unload your cargo of bananas?


The many applications of research on the tree of life3

The Many Applications of Research on the Tree of Life

  • Luckily, you have a digitized natural history database.

  • With an efficient search feature.

    (Thanks to Diana Lipscomb for this example.)


The many applications of research on the tree of life4

The Many Applications of Research on the Tree of Life


  • Login