Escience
Download
1 / 24

EScience - PowerPoint PPT Presentation


  • 277 Views
  • Updated On :

eScience. Jim Gray Microsoft Research presented @ 21st Century Computing Conference October 2006. eScience: What is it?. Synthesis of information technology and science. Science methods are changing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'EScience' - anoush


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Escience l.jpg

eScience

Jim Gray

Microsoft Research

presented @ 21st Century Computing Conference

October 2006


Escience what is it l.jpg
eScience: What is it?

  • Synthesis of information technology and science.

  • Science methods are changing.

  • Science is being codified/objectified.How represent scientific knowledge in computers?

  • Science faces a data deluge.How to manage and analyze information?

  • Scientific communication changing;integrate online literature and data.


Science paradigms l.jpg
Science Paradigms

  • Thousand years ago: science was empirical

    describing natural phenomena

  • Last few hundred years: theoretical branch

    using models, generalizations

  • Last few decades: a computational branch

    simulating complex phenomena

  • Today:data exploration (eScience)

    unify theory, experiment, and simulation

    • Data captured by instrumentsOr generated by simulator

    • Processed by software

    • Information/Knowledge stored in computer

    • Scientist analyzes datausing data management and statistics


What x info needs from us cs not drawn to scale l.jpg

Data Mining

Algorithms

Miners

Scientists

Science Data & Questions

What X-info Needs from us (cs)(not drawn to scale)

Database

To Store Data

Execute Queries

Systems

Question & AnswerVisualization

Tools


How engage with an area l.jpg
How Engage With An Area

  • eScience is inter-disciplinary

  • We bring informatics expertise

  • Process:

    • Long-term and deep collaborations

    • Find someone who is desperate.

    • Start with requirements: 20 questions

    • Help build systems to:

      • Answer those questions much faster

      • Answer new questions.


Astronomy l.jpg
Astronomy

  • Help build world-wide telescope

    • All astronomy data and literature online and cross indexed

    • Tools to analyze the data

  • Built SkyServer.SDSS.org

  • Built Analysis system

    • MyDB

    • CasJobs (batch job)

  • OpenSkyQueryFederation of ~20 observatories.

  • Results:

    • It works and is used every day

    • Spatial extensions in SQL 2005

    • A good example of Data Grid

    • Good examples of Web Services.


Ecosystem sensor net lifeunderyourfeet org l.jpg
Ecosystem Sensor NetLifeUnderYourFeet.Org

  • Small sensor net monitoring soil

  • Sensors feed to a database

  • Helping build system to collect & organize data.

  • Working on data analysis tools

  • Prototype for other LIMSLaboratory Information Management Systems


Rna structural genomics l.jpg
RNA Structural Genomics

  • Goal: Predict secondary and tertiary structure from sequence.Deduce tree of life.

  • Technique: Analyze sequence variations sharing a common structure across tree of life

  • Representing structurally aligned sequences is a key challenge

  • Creating a database-driven alignment workbench accessing public and private sequence data


Vha health informatics l.jpg

VHA: largest standardized electronic medical records system in US.

Design, populate and tune a ~20 TB Data Warehouse and Analytics environment

Evaluate population health and treatment outcomes,

Support epidemiological studies

7 million enrollees

5 million patients

Example Milestones:

1 Billionth Vital Sign loaded in April ‘06

30-minutes to population-wide obesity analysis (next slide)

Discovered seasonality in blood pressure -- NEJM fall ‘06

VHA Health Informatics


Slide10 l.jpg

HDR Vitals Based Body Mass Index Calculation on VHA FY04 Population

Source: VHA Corporate Data Warehouse

Total Patients

23,876 (0.7%)

701,089 (21.6%)

1,177,093 (36.2%)

1,347,098 (41.5%)

3,249,156 (100%)


Other projects l.jpg
Other Projects Population

  • Carbon Cycle Portal

  • Hydrology Portal

  • Oceanography Workbench


Common themes l.jpg
Common Themes Population

  • Each science is codifying & objectifying their data and knowledge

    • What is a galaxy?

    • What is a molecule?

  • So that they can

    • Ask questions of the data

    • Exchange data with one another

  • Result will be a Data Grid

    • Datasets published as “objects”

    • Service Oriented Architecture


All scientific data online l.jpg
All Scientific Data Online Population

  • Many disciplines overlap and use data from other sciences.

  • Internet can unify all literature and data

  • Go from literature to computation to data back to literature.

  • Information at your fingertipsFor everyone-everywhere

  • Increase Scientific Information Velocity

  • Huge increase in Science Productivity


Unlocking peer reviewed literature l.jpg
Unlocking Peer-Reviewed Literature Population

  • Agencies and Foundations mandating research be public domain.

    • NIH (30 B$/y, 40k PIs,…)(see http://www.taxpayeraccess.org/)

    • Welcome Trust

    • Japan, China, Italy, South Africa,.…

    • Public Library of Science..

  • Other agencies will follow NIH


How does the new library work l.jpg
How Does the New Library Work? Population

  • Who pays for storage access (unfunded mandate)?

    • Its cheap: 1 milli-dollar per access

  • But… curation is not cheap:

    • Author/Title/Subject/Citation/…..

    • Dublin Core is great but…

    • NLM has a 6,000-line XSD for documents http://dtd.nlm.nih.gov/publishing

    • Need to capture document structure from author

      • Sections, figures, equations, citations,…

      • Automate curation

    • NCBI-PubMedCentral is doing this

      • Preparing for 1M articles/year

    • Automate it!


Portable pubmedcentral l.jpg
Portable PubMedCentral Population

  • “Information at your fingertips”

  • Helping build PortablePubMedCentral

  • Deployed US, China, England, Italy, South Africa, (Japan soon).

  • Each site can accept documents

  • Archives replicated

  • Federate thru web services

  • Working to integrate Word/Excel/…with PubmedCentral – e.g. WordML, XSD,

  • To be clear: NCBI is doing 99% of the work.


Overlay journals l.jpg

articles Population

Data Sets

Overlay Journals

  • Articles and Data in public archives

  • Journal title page in public archive.

  • All covered by Creative Commons License

    • permits: copy/distribute

    • requires: attribution

      http://creativecommons.org/

Data

Archives


Overlay journals18 l.jpg

title Population

page

articles

Data Sets

Overlay Journals

  • Articles and Data in public archives

  • Journal title page in public archive.

  • All covered by Creative Commons License

    • permits: copy/distribute

    • requires: attribution

      http://creativecommons.org/

JournalManagement

System

Data

Archives


Overlay journals19 l.jpg

title Population

page

comments

articles

Data Sets

Overlay Journals

  • Articles and Data in public archives

  • Journal title page in public archive.

  • All covered by Creative Commons License

    • permits: copy/distribute

    • requires: attribution

      http://creativecommons.org/

JournalCollaboration

System

JournalManagement

System

Data

Archives


Better authoring tools l.jpg
Better Authoring Tools Population

  • Extend Office tools to

    • capture document metadata (NLM DTD)

    • represent documents in standard format

      • WordML (ECMA standard)

    • capture references

    • Make active documents (words and data).

  • Easier for authors

  • Easier for archives


Conference management tool l.jpg
Conference Management Tool Population

  • Currently a conference peer-review system (~300 conferences)

    • Form committee

    • Accept Manuscripts

    • Declare interest/recuse

    • Review

    • Decide

    • Form program

    • Notify

    • Revise


Ejournal management tool l.jpg
eJournal Management Tool Population

  • Connect to Archives

  • Manage archive document versions

  • Capture Workshop

    • presentations

    • proceedings

  • Capture classroom ConferenceXP

  • Moderated discussions of published articles

  • Connect literature and data archives

  • Add publishing steps

    • Form committee

    • Accept Manuscripts

    • Declare interest/recuse

    • Review

    • Decide

    • Form program

    • Notify

    • Revise

    • Publish

    • Discuss & Critique


Why not a wiki l.jpg
Why Not a Wiki? Population

  • Peer-Review is different

    • It is very structured

    • It is moderated

    • There is a degree of confidentiality

  • Wiki is egalitarian

    • It’s a conversation

    • It’s completely transparent

  • Don’t get me wrong:

    • Wiki’s are great

    • SharePoints are great

    • But.. Peer-Review is different.

    • And, incidentally: review of proposals, projects,… is more like peer-review.


Escience what is it24 l.jpg
eScience: What is it? Population

  • Synthesis of information technology and science.

  • Science methods are changing.

  • Science is being codified/objectified.How represent scientific information and knowledge in computers?

  • Science faces a data deluge.How to manage and analyze information?

  • Scientific communication changingintegrate online literature and data.


ad