1 / 40

Biological information management and analysis as illustrated by malaria research

Biological information management and analysis as illustrated by malaria research. Problems Managing data context Managing and analyzing data. Factors in combating malaria. Economic. Political/Ethical. Scientific: biology, ecology, chemistry, etc. Cultural/Sociological.

ilyssa
Download Presentation

Biological information management and analysis as illustrated by malaria research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological information management and analysis as illustrated by malaria research

  2. Problems • Managing data context • Managing and analyzing data

  3. Factors in combating malaria Economic Political/Ethical Scientific: biology, ecology, chemistry, etc. Cultural/Sociological Environmental

  4. Scientific layers Psychology/ Emergent properties of brain and Sociology populations BiologyAll complexity of element interactions (macromolecules, cells, brain, populations) Chemistry Properties of "simple" element interactions Physics Properties w/o inter-element interactions

  5. The labyrinth of biological research • Which direction to follow and in what way? • What relevant information is available? • How to keep a good record of the path? • How to find useful collaborators? • What do the results imply? Researchers are drowning in the sea of information…

  6. Problems with "physicalization" of biology • Data richness • Data sharing and integration • Model-data correspondance • Understanding bioresearch problems • Understanding bioresearch constraints

  7. Information problems in the Naturejournal • Tim Berners-Lee, James Hendler. Scientific publishing on the • 'semantic web'. • Nature Debates, April 2001. • Jonathan Knight. Negative Results: Null and void. • Nature, April 2003. • Who'd want to work in a team (Editorial). • Nature, July 2003. • Declan Butler. Open-access row leads paper to shed authors. • Nature, September 2003.

  8. Information management needs of Anopheles GPH • Inform scientific community • (publications, database submitions, conferences…) • Prevent loss of information • (unpublished results, method details, …) • Report to administration • (progress, problems, …) • Share and manage supplies • (materials, equipment, …) • Share informational resources • (protocols, bibliography, …) • Facilitate collaboration • (share information, co-author documents, …) IP, Senegal IP, Madagascar IP, Korea IP, France … … … … Columbia University, USA (outside collaboration)

  9. Sources of Research Information: Status quo Temporary, individual information (100%) : Notebooks Computer Files Permanent, shared information (<30%) : Databases Journals

  10. Sources of Research Information: Ideal Permanent, shared information (100%): Integrated Repositories of Structured Data

  11. Problems • Managing data context • Managing and analyzing data

  12. Flow of research information: at present Administration Advisor Researcher Scientific community Research group Collaborators

  13. Flow of research information: proposed Administration Advisor Researcher Database Scientific community Research group Collaborators

  14. 2 types of information Structured information: GenBank, Medline, Employee database, Invoice database, … Forms Unstructured information: Research notes, Contracts, Project reports, Clinical trials documentation … Documents

  15. Methods of contributing written information • Traditional documents • - hard to search and manipulate • Traditional forms • - overly constraining, hard to create documents • Structured documents (New!) • - best of both worlds

  16. Problems with forms Project:Measurements of response to … The ability to resist Plasmodium falciparum malaria is an important adaptive trait of human populations living in … Experiment:Entomological Observations of … The results of our comparative study show consistent interethnic differences in P. falciparum infection … Method:Observations Malaria surveys were carried out in two rural villages near the town of Ziniaré (35 km northeast of Ouagadougou) in a shrubby savanna of the Mossi plateau . An intense P. falciparum transmission is detected … Different response to P. …

  17. Summary 1 • Biological function is based on infinity of interactions • between basic elements • Biologists are drowning in the complexity of • information • Need to understand biological problems and • constraints before applying analytical approaches • Need to resolve the problem of information storage • and retrieval

  18. Form constraints • Limited number of categories • Limited number of fields per category • Constrained field space • Limited editing (copy, move, delete, etc.) • No coherent document representation • Unable to represent complex hierarchical • information

  19. "3-tier" architecture of the iPad system iPad Editor iPad Web Portal iPad middle-layer server Database

  20. iPad Demo

  21. Major Benefits Monetary savings: + Less lost work + Resource optimization Time savings: + Faster search + Faster communication and formatting + Less lost work Increase in the quality and quantity of research: + Useful perspectives + Improved collaboration + Improved project management + More information given to the Institute community + More information given to the scientific community (in the future) + A tool to structure scientific data (in the near future)

  22. Drawbacks • Learning new software (very simple) • Changing habits (will go away over time, gradual adoption)

  23. Support for structured documents • WWW Consortium, industry analysts • General systems within the past year • (Microsoft, Arbortext, Altova, etc.) • Specific systems in the military

  24. Evolution of information (Tim Berners-Lee)

  25. First Consulting Group, "XML and Pharmaceutical Industry" (2003) : "In order to be profitable and competitive as they serve our global healthcare needs, drug companies require information systems to help them work efficiently to deliver a high-quality product. With that in mind, momentum is growing to leverage XMLtechnology in the content management and publishing systems, being used by the pharmaceutical industry throughout the drug development lifecycle." * Interest from Aventis Pharma, Sopra Group, Genset

  26. Gilbane Report, "XML for Content" (2003): "So what's the biggest problem with XML content? Authoring it… The authoring tools are becoming more capable and people are starting to figure out that the ease of processing XML content can outweigh the pain of creating it, but there is still some way to go."

  27. Problems • Managing data context • Managing and analyzing data

  28. Summary 2 • Data context is important both for information • management and for data interpretation • iPad can be used to structure data context • using XML markup • Structuring data context is the precursor for better • structuring of data.

  29. 3 Steps to "Paradise" • Agree on standard organizational categories SB-UML Gene Ontology Bioprocess ontology … "Dynamic" ontologies

  30. Bioprocess ontology

  31. 3 Steps to "Paradise" • Agree on standard organizational categories • - "Dynamic" ontologies, Gene Ontology, Bioprocess ontology, …, • SB-UML. • Sort information into the ontological categories • - Data mining algorythms, Electronic forms, Semantic markup. <protein>p53</protein><interaction>activates</interaction><gene>CD95</gene>

  32. Dynamic ontology Entity Property Relation BioStructure Process Data Method Molecule MolecularComplex Organelle Organ Tissue Organism

  33. Data markup Molecule (name: Y, type: gene) Entity (name: salivary glands, type: organ) X protein activatesY gene in A. gambiaesalivary glands. Entity (name: A. gambiae, alt. name: Anopheles gambiae, type: organism) Relation (name: activates, type: molecular interaction) Molecule (name: X, type: protein)

  34. 3 Steps to "Paradise" • Agree on standard organizational categories • - "Dynamic" ontologies, Gene Ontology, Bioprocess ontology, …, • SB-UML. • Sort information into the ontological categories • - Data mining algorythms, Electronic forms, Semantic markup. • Develop search, visualization, and analysis tools • - Blast, Bioprocess and molecular modeling, Concept network, …

  35. Concept node

  36. Better global picture to see where to go • Helpful info along the way • Organized research process • Better ways to share data • Broader impact of results • Modeling and simulation tools

  37. Summary 1 • Biological function is based on infinity of interactions • between basic elements • Biologists are drowning in the complexity of • information • Need to understand biological problems and • constraints before applying analytical approaches • Need to resolve the problem of information storage • and retrieval

  38. Summary 2 • Data context is important both for information • management and for data interpretation • iPad can be used to structure data context • using XML markup • Structuring data context is the precursor for better • structuring of data.

  39. Summary 3 • 2 steps for structuring data: ontology + methods for • data entry • Simple "dynamic" ontologies can be used to derive • standard "static" ontologies • iPad-like system can be used to simplify structuring • biological data • Data analysis, modeling, and simulation tools need • to be data-driven, generic, and easy to use.

More Related