Tony hey corporate vice president technical computing microsoft corporation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 55

Tony Hey Corporate Vice President Technical Computing Microsoft Corporation PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

Social Sciences. Life Sciences. e-Science and Cyberinfrastructure. Earth Sciences. Computer and Information Sciences. Tony Hey Corporate Vice President Technical Computing Microsoft Corporation. New Materials, Technologies and Processes. Multidisciplinary Research. Licklider’s Vision.

Download Presentation

Tony Hey Corporate Vice President Technical Computing Microsoft Corporation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tony hey corporate vice president technical computing microsoft corporation

Social Sciences

Life Sciences

e-Science and Cyberinfrastructure

Earth Sciences

Computer andInformation Sciences

Tony Hey

Corporate Vice President

Technical Computing

Microsoft Corporation

New Materials,Technologiesand Processes

MultidisciplinaryResearch


Licklider s vision

Licklider’s Vision

“Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job”

Larry Roberts – Principal Architect of the ARPANET


Physics and the web

Physics and the Web

  • Tim Berners-Lee developed the Web at CERN as a tool for exchanging information between the partners in physics collaborations

  • The first Web Site in the USA was a link to the SLAC library catalogue

  • It was the international particle physics community who first embraced the Web

  • ‘Killer’ application for the Internet

  • Transformed modern world – academia, business and leisure


Beyond the web

Beyond the Web?

  • Scientists developing collaboration technologies that go far beyond the capabilities of the Web

    • To use remote computing resources

    • To integrate, federate and analyse information from many disparate, distributed, data resources

    • To access and control remote experimental equipment

  • Capability to access, move, manipulate and mine data is the central requirement of these new collaborative science applications

    • Data held in file or database repositories

    • Data generated by accelerator or telescopes

    • Data gathered from mobile sensor networks


What is e science

What is e-Science?

‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’

John Taylor

Director General of Research Councils

UK, Office of Science and Technology


The e science vision

The e-Science Vision

  • e-Science is about multidisciplinary science and the technologies to support such distributed, collaborative scientific research

    • Many areas of science are in danger of being overwhelmed by a ‘data deluge’ from new high-throughput devices, sensor networks, satellite surveys …

    • Areas such as bioinformatics, genomics, drug design, engineering, healthcare … require collaboration between different domain experts

  • ‘e-Science’ is a shorthand for a set of technologies to support collaborative networked science


E science vision and reality

e-Science – Vision and Reality

Vision

  • Oceanographic sensors - Project Neptune

    • Joint US-Canadian proposal

      Reality

  • Chemistry – The Comb-e-Chem Project

    • Annotation, Remote Facilities and e-Publishing


Tony hey corporate vice president technical computing microsoft corporation

http://www.neptune.washington.edu/


Tony hey corporate vice president technical computing microsoft corporation

Undersea Sensor Network

Connected & Controllable Over the Internet


Tony hey corporate vice president technical computing microsoft corporation

Data Provenance


Tony hey corporate vice president technical computing microsoft corporation

Persistent

Distributed

Storage

Visual Programming


Tony hey corporate vice president technical computing microsoft corporation

Distributed Computation

Interoperability & Legacy Support via Web Services


Tony hey corporate vice president technical computing microsoft corporation

Searching & Visualization

Live Documents

Reputation

& Influence


Tony hey corporate vice president technical computing microsoft corporation

Reproducible Research


Tony hey corporate vice president technical computing microsoft corporation

Collaboration


Tony hey corporate vice president technical computing microsoft corporation

Handwriting


Tony hey corporate vice president technical computing microsoft corporation

Interactive Data

Dynamic Documents


The comb e chem project

The Comb-e-Chem Project

Automatic Annotation

Video Data Stream

HPC Simulation

Data Mining and Analysis

StructuresDatabase

Diffractometer

Combinatorial Chemistry Wet Lab

National X-RayService

Middleware


National crystallographic s ervice

National Crystallographic Service

Send sample material to NCS service

Collaborate in e-Lab experiment and obtain structure

Search materials database and predict properties using Grid computations

Download full data on materials of interest


Tony hey corporate vice president technical computing microsoft corporation

A digital lab book replacement that chemists were able to use, and liked


Tony hey corporate vice president technical computing microsoft corporation

Monitoring laboratory experiments using a broker delivered over GPRS on a PDA


Tony hey corporate vice president technical computing microsoft corporation

Crystallographic e-Prints

Direct Access to Raw Data from scientific papers

Raw data sets can be very large - stored at UK National Datastore using SRB software


Tony hey corporate vice president technical computing microsoft corporation

eBank Project

Virtual Learning Environment

Reprints

Peer-Reviewed Journal & Conference Papers

Technical Reports

LocalWeb

Preprints & Metadata

Institutional Archive

Publisher Holdings

Certified Experimental Results & Analyses

Data, Metadata & Ontologies

Undergraduate Students

Digital Library

Graduate Students

E-Scientists

E-Scientists

E-Scientists

Grid

5

E-Experimentation

Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning


Support for e science

Support for e-Science

  • Cyberinfrastructure and e-Infrastructure

    • In the US, Europe and Asia there is a common vision for the ‘cyberinfrastructure’ required to support the e-Science revolution

    • Set of Middleware Services supported on top of high bandwidth academic research networks

  • Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up ‘Virtual Organizations’ for their research – or business

    • Many companies emphasize computing cycle aspect of Grids

    • The ‘Microsoft Grid’ vision is more about data management than about compute clusters


Six key elements for a global cyberinfrastructure for e science

Six Key Elements for a Global Cyberinfrastructure for e-Science

  • High bandwidth Research Networks

  • Internationally agreed AAA Infrastructure

  • Development Centers for Open Standard Grid Middleware

  • Technologies and standards for Data Provenance, Curation and Preservation

  • Open access to Data and Publications via Interoperable Repositories

  • Discovery Services and Collaborative Tools


The future hybrid networks

Standard packet routed production network for email, Web access, …

User-controlled ‘lambda’ connections for e-Science applications requiring high performance end-to-end Quality of Service

Similar strategies in North America, Europe and Asia

The GLIF consortium

The Future: Hybrid Networks


Tony hey corporate vice president technical computing microsoft corporation

Towards an International Grid Infrastructure

UK NGS

Leeds

Manchester

Starlight (Chicago)

Netherlight (Amsterdam)

Oxford

RAL

SDSC

NCSA

US TeraGrid

PSC

UCL

UKLight

SC05

Local laptops in Seattle and UK

All sites connected by production network

(not all shown)

Computation

Steering clients

Service Registry

Network PoP


Research prototypes to production quality middleware

Research Prototypes to Production Quality Middleware?

  • Research projects not funded to do testing, configuration and QA required to produce production quality middleware

  • Common rule of thumb is that 10 times more effort is needed to re-engineer ‘proof of concept’ research software to production quality (Brooks – ‘Mythical Man Month’)

  • Key issue for realizing a global e-Science Cyberinfrastructure is existence of robust, open standard, interoperable middleware

  • OMII-UK, OMII-China, NMI, NAREGI, …


The web services magic bullet

Company A

(J2EE)

Web Services

Company B

(ONE)

Company C

(.Net)

The Web Services ‘Magic Bullet’


Open grid services

Open Grid Services

  • Development of Web Services

  • Require small set of useful ‘Grid Services’ supported by IT industry

    • Execution Management, Data Access, Resource Management, Security, Information Management

  • Research community experimenting with higher level services: Workflow, Transactions, Data Mining, Knowledge Discovery …

  • Exploit Synergy: Use Commercial Development Environment and Tooling to build robust Grid Services


Digital curation

Digital Curation?

  • In 20 years can guarantee that the operating system and spreadsheet program and the hardware used to store data will not exist

    • Need research ‘curation’ technologies such as workflow, provenance and preservation

    • Need to liaise closely with individual research communities, data archives and libraries

  • The UK has set up the ‘Digital Curation Centre’ in Edinburgh with Glasgow, UKOLN and CCLRC

  • Attempt to bring together skills of scientists, computer scientists and librarians


Digital curation centre

Digital Curation Centre

  • Actions needed to maintain and utilise digital data and research results over entire life-cycle

    • For current and future generations of users

  • Digital Preservation

    • Long-run technological/legal accessibility and usability

  • Data curation in science

    • Maintenance of body of trusted data to represent current state of knowledge

  • Research in tools and technologies

    • Integration, annotation, provenance, metadata, security…..


Tony hey corporate vice president technical computing microsoft corporation

Real-world

Data

Persistent

Distributed

Data

Workflow,

Data Mining& Algorithms

Interpretation

& Insight

ComputationalModeling


Technical computing in microsoft

Technical Computing in Microsoft

  • Radical Computing

    • Research in potential breakthrough technologies

  • Advanced Computing for Science and Engineering

    • Application of new algorithms, tools and technologies to scientific and engineering problems

  • High Performance Computing

    • Application of high performance clusters and database technologies to industrial applications


New science paradigms

New Science Paradigms

  • Thousand years ago: Experimental Science

    - description of natural phenomena

  • Last few hundred years: Theoretical Science

    - Newton’s Laws, Maxwell’s Equations …

  • Last few decades: Computational Science

    - simulation of complex phenomena

  • Today: e-Science or Data-centric Science

    - unify theory, experiment, and simulation

    - using data exploration and data mining

    • Data captured by instruments

    • Data generated by simulations

    • Processed by software

    • Scientist analyzes databases/files

      (With thanks to Jim Gray)


Tony hey corporate vice president technical computing microsoft corporation

TOOLS Workflow, Collaboration, Visualization, Data Mining

DATA Acquisition, Storage, Annotation, Provenance, Curation, Preservation

CONTENT Scholarly Communication, Institutional Repositories

Advanced Computing for Science and Engineering

Bioinformatics

Energy Science

Earth Science

Engineering

. . .


Key issues for e science

Key Issues for e-Science

  • Workflows and Tools

    • The LEAD Project

    • The DiscoveryNet Project

  • The Data Chain

    • From Acquisition to Preservation

  • Scholarly Communication

    • Open Access to Data and Publications


The lead project

The LEAD Project

Better predictions for Mesoscale weather


The lead vision

The LEAD Vision

DYNAMIC OBSERVATIONS

  • Product Generation,

  • Display,

  • Dissemination

Prediction/Detection

PCs to Teraflop Systems

  • Analysis/Assimilation

  • Quality Control

  • Retrieval of Unobserved

  • Quantities

  • Creation of Gridded Fields

Models and Algorithms Driving Sensors

The CS challenge: Build a virtual “eScience” laboratory to

support experimentation and education leading to this vision.

  • End Users

  • NWS

  • Private Companies

  • Students


Composing lead services

Composing LEAD Services

Need to construct workflows that are:

  • Data Driven

    • The weather input stream defines the nature of the computation

  • Persistent and Agile

    • An agent mines a data stream and notices an “interesting” feature. This event may trigger a workflow scenario that has been waiting for months

  • Adaptive

    • The weather changes

    • Workflow may have to change on-the-fly

    • Resources


Example lead workflow

Example LEAD Workflow


And discoverynet

and DiscoveryNet

  • Constructing a ubiquitous workflow : by scientists

    • Integrate information resources and applications cross-domain

    • Capture the best practice of your scientific research

  • Warehousing workflows: for scientists

    • Manage discovery processes within an organisation

    • Construct an enterprise process knowledge bank

  • Deployment workflow: to scientists

    • Turn workflows into reusable applications/services

    • Turn every scientist into a solution builder


An integrative analysis example

Relational data mining

Relational data mining

Decision tree model of metabonomic profile

Text mining

Text mining

Visualizing serial/spectrum data

Visualizing cluster statistics

Visualizingmultidimensional data

Visualizingsequence data

Visualizingrelational data clusters

Chemical structure visualization

Visualizingpathway data

Chemical data model

Chemical sequence data model

Text mining visualization

Spectrum data mining

Spectrum data mining

An Integrative Analysis Example


The data deluge

The Data Deluge

  • In the next 5 years e-Science projects will produce more scientific data than has been collected in the whole of human history

  • Some normalizations:

    • The Bible = 5 Megabytes

    • Annual refereed papers = 1 Terabyte

    • Library of Congress = 20 Terabytes

    • Internet Archive (1996 – 2002) = 100 Terabytes

  • In many fields new high throughput devices, sensors and surveys will be producing Petabytes of scientific data


The problem for the e scientist

Experiments &

Instruments

facts

questions

facts

?

Other Archives

facts

answers

Literature

facts

Simulations

The Problem for the e-Scientist

  • Data ingest

  • Managing a petabyte

  • Common schema

  • How to organize it?

  • How to reorganize it?

  • How to coexist & cooperate with others?

  • Data Query and Visualization tools

  • Support/training

  • Performance

    • Execute queries in a minute

    • Batch (big) query scheduling


The e science data chain

The e-Science Data Chain

  • Data Acquisition

  • Data Ingest

  • Metadata

  • Annotation

  • Provenance

  • Data Storage

  • Curation

  • Preservation


Berlin declaration 2003

Berlin Declaration 2003

  • ‘To promote the Internet as a functional instrument for a global scientific knowledge base and for human reflection’

  • Defines open access contributions as including:

    • ‘original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material’


Nsf atkins report on cyberinfrastructure

NSF ‘Atkins’ Report on Cyberinfrastructure

  • ‘the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers’

  • ‘archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information’


Publishing data analysis is changing

Publishing Data & Analysis Is Changing

Roles

Authors

Publishers

Curators

Archives

Consumers

Traditional

Scientists

Journals

Libraries

Archives

Scientists

Emerging

Collaborations

Project web site

Data+Doc Archives

Digital Archives

Scientists


Data publishing the background

Data Publishing: The Background

In some areas – notably biology – databases are replacing (paper) publications as a medium of communication

  • These databases are built and maintained with a great deal of human effort

  • They often do not contain source experimental data - sometimes just annotation/metadata

  • They borrow extensively from, and refer to, other databases

  • You are now judged by your databases as well as your (paper) publications

  • Upwards of 1000 (public databases) in genetics


Data publishing the issues

Data Publishing: The issues

  • Data integration

    • Tying together data from various sources

  • Annotation

    • Adding comments/observations to existing data

    • Becoming a new form of communication

  • Provenance

    • ‘Where did this data come from?’

  • Exporting/publishing in agreed formats

    • To other programs as well as people

  • Security

    • Specifying/enforcing read/write access to parts of your data


Interoperable repositories

Interoperable Repositories?

  • Paul Ginsparg’s arXiv at Cornell has demonstrated new model of scientific publishing

    • Electronic version of ‘preprints’ hosted on the Web

  • David Lipman of the NIH National Library of Medicine has developed PubMedCentral as repository for NIH funded research papers

    • Microsoft funded development of ‘portable PMC’ now being deployed in UK and other countries

  • Stevan Harnad’s ‘self-archiving’ EPrints project in Southampton provides a basis for OAI-compliant ‘Institutional Repositories’

    • Many national initiatives around the world moving towards mandating deposition of ‘full text’ of publicly funded research papers in repositories


Microsoft strategy for e science

Microsoft Strategy for e-Science

Microsoft intends to work with the scientific and library communities:

  • to define open standard and/or interoperable high-level services, work flows and tools

  • to assist the community in developing open scholarly communication and interoperable repositories


Tony hey corporate vice president technical computing microsoft corporation

© 2005 Microsoft Corporation. All rights reserved.

This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


Acknowledgements

Acknowledgements

With special thanks to Kelvin Droegemeier, Geoffrey Fox, Jeremy Frey, Dennis Gannon, Jim Gray, Yike Guo, Liz Lyon and Beth Plale


  • Login