Computational science and the school of informatics at indiana university
Download
1 / 21

Computational Science and the School of Informatics at Indiana University - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

Computational Science and the School of Informatics at Indiana University. IU/HBCU STEM Initiative IUPUI April 11 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Computational Science and the School of Informatics at Indiana University' - tarik-beck


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computational science and the school of informatics at indiana university

Computational Science and theSchool of Informatics at Indiana University

IU/HBCU STEM Initiative

IUPUIApril 11 2007

Geoffrey Fox

Computer Science, Informatics, Physics

Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

[email protected]

http://www.infomall.org


What is computational science
What is Computational Science?

  • Informatics is the integration of the art, science, and the human dimensions of information technology to provide solutions to discipline-specific problems

  • Informatics is a response to the data/information/knowledge gaps (data deluge) caused by “billions and billions of bits”

    • Grids are technology supporting this in distributed research

  • Computational Science could be the same as this or focus on the large scale simulation part

  • Multicore chips will revitalize simulation!


Bioinformatics data deluge challenge and opportunity
Bioinformatics Data DelugeChallenge and Opportunity

2000

1985

1 experiment

1 experiment

1 gene

10,000 genes

OPPORTUNITY

10 data

10,000,000 data

CHALLENGE


E moreorlessanything and the grid
e-moreorlessanything and the Grid

  • ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology

  • e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research

  • Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world.

    • The growing use of outsourcing is one example

  • The Grid provides the information technology e-infrastructure for e-moreorlessanything.

  • A deluge of data of unprecedented and inevitable size must be managed and understood.

  • People, computers, data and instruments must be linked.

  • On demand assignment of experts, computers, networks and storage resources must be supported


Why grids cyberinfrastructure useful
Why Grids/ Cyberinfrastructure Useful

  • Supports distributed science – data, people, computers

  • Exploits Internet technology (Web2.0) adding management, security, supercomputers etc.

  • It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (microseconds) between nodes

  • Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem

  • Distributed aspect integrates already distinct components

  • Cyberinfrastructure is in general a distributed collection of parallel systems

  • Grids are made of services that are “just” programs or data sources packaged for distributed access

  • Web 2.0 can be used “instead of” Grids


Teragrid integrating nsf cyberinfrastructure
TeraGrid: Integrating NSF Cyberinfrastructure

Buffalo

Wisc

UC/ANL

Cornell

Utah

Iowa

PU

NCAR

PSC

IU

NCSA

Caltech

ORNL

USC-ISI

UNC-RENCI

SDSC

TACC

TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research.

Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today and doubling


Apec cooperation for earthquake simulation
APEC Cooperation for Earthquake Simulation

  • ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication

    • iSERVO is Infrastructure to supportwork of ACES

    • SERVOGrid is (completed) US Grid that is a prototype of iSERVO

    • http://www.quakes.uq.edu.au/ACES/

  • Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies


Field Trip Data

Database

?

GISGrid

Discovery

Services

RepositoriesFederated Databases

Streaming Data

Sensors

Database

Sensor Grid

Database Grid

Research

Education

SERVOGrid

Compute Grid

Customization

Services

From Researchto Education

Data FilterServices

ResearchSimulations

Analysis and VisualizationPortal

EducationGrid

Computer

Farm

Grid of Grids: Research Grid and Education Grid


Servogrid and cyberinfrastructure
SERVOGrid and Cyberinfrastructure

  • Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport

    • Internet scale managed services that link computers data repositories sensors instruments and people

  • There is a portal and services in SERVOGrid for

    • Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs …..

    • Job management and monitoring web services for running the above codes.

    • File management web services for moving files between various machines.

    • Geographical Information System services

    • Quaketables earthquake specific database

    • Sensors as well as databases

    • Context (dynamic metadata) and UDDI system long term metadata services

    • Services support streaming real-time data


Lead gateway portal
LEAD Gateway Portal

NSF Large ITR and Teragrid Gateway

- Adaptive Response to Mesoscale weather events

- Supports Data exploration,Grid Workflow


Grid workflow datamining in earth science

Streaming Data

Support

Transformations

Data Checking

Hidden MarkovDatamining (JPL)

Display (GIS)

Grid Workflow Datamining in Earth Science

NASA GPS

  • Work with Scripps Institute

  • Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California

Earthquake


Some organizations i work with
Some Organizations I work with

  • MSI CI2 Minority-Serving Institutions (MSI) Cyberinfrastructure Institute led by the

  • Alliance for Equity in Higher Education. Working with the Alliance will have systemic impact on at least 335 Minority Serving Institutions covered by the

  • AIHEC American Indian Higher Education Consortium)

  • HACU Hispanic Association of Colleges and Universities

  • NAFEO National Association for Equal Opportunity in Higher Education

  • MSI-CIEC Minority-Serving Institution Cyberinfrastructure (CI) Empowerment Coalition led by

  • UHD University of Houston Downtown as a major Hispanic Serving Institution

  • I am Senior Research Associate in the Center for Computational Science and Advanced Distributed Simulation at UHD and Visiting Scholar for Cyberinfrastructure Development at the Alliance for Equity in Higher Education


Basic ideas
Basic Ideas

  • Cyberinfrastructure is critical to all involved in Research and Education

  • Cyberinfrastructure is intrinsically democratic supporting broad participation

  • MSI’s should lead MSI integration with Cyberinfrastructure

  • One should guide the projects with experts

  • One should aim at scalable (systemic) approaches

  • Goal is peer collaborations involving all institutions of higher education



Example setting up a polar ci grid
Example: Setting up a Polar CI/Grid

  • NSF CI-Team project with HBCU ECSU in North Carolina and Kansas University will design and set up a Polar Grid

    • CI Enable MSIs (ECSU Haskell) and a community (Polar Science)

  • The North and South poles are melting with potential huge environmental impact

    • We have changed the 100,000 year Glacier cycle into a ~50 year cycle; the field has increased dramatically in importance and interest

  • Polar Grid is a network of computers, sensors (on robots and satellites), data and people aimed at understanding science of ice-sheets and impact of global warming

  • We are planning Polar Grid relevant CI Education Infrastructure and initial projects with Undergraduate students (ECSU) and Graduate students (Kansas)

    • Polar weather stations as Grid resources

    • Use distance education to cover all CReSIS sites


Cresis polargrid
CReSIS PolarGrid

  • Important CReSIS-specific Cyberinfrastructure components include

    • Managed data from sensors and satellites

    • Data analysis such as SAR processing – possibly with parallel algorithms

    • Electromagnetic simulations (currently commercial codes) to design instrument antennas

    • 3D simulations of ice-sheets (glaciers) with non-uniform meshes

    • GIS Geographical Information Systems

  • Also need capabilities present in many Grids

    • Portal i.e. Science Gateway

    • Submitting multiple sequential or parallel jobs

  • TeraGrid etc. (the National Cyberinfrastructure) is having Cyberinfrastructure days at various places around country to popularize and identify how institutions can participate

    • ECSU will be later this year


Indiana university cheminformatics center summary
Indiana University Cheminformatics Center Summary

Indiana University is focusing on two major areas:

  • Creating a comprehensive, easily accessible infrastructure for chemoinformatics toolsand data sources, linked with PubChem and made available as web services, and partnering with screening centers and other users to demonstrate how this infrastructure can be usefully applied

    • Infrastructure can include any tools, not just ours (commercial/open source, chemoinformatics, bioinformatics, and so on)

    • New, custom applications can be built quickly using existing services in a similar way to Google Maps and other “web 2.0” resources

  • Being a central hub of chemoinformatics education, including offering distance courses on chemoinformatics theory and techniques, practical workshops on using chemoinformatics resources, and freely available web-based educational resources

    • We currently offer a Ph.D, M.S. and graduate certificate (distance) in chemical informatics

    • Distance education program allows you to “pick and choose” courses to meet educational needs: certificate is awarded on completion of four courses


Chemical Informatics and Cyberinfrastucture Collaboratory

Funded by the National Institutes of Health

www.chembiogrid.org

CICC

CICC

CICC Combines Grid Computing with Chemical Informatics

Large Scale Computing Challenges

Science and Cyberinfrastructure

CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.

Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.

NIH

PubMed

DataBase

OSCAR

Text

Analysis

Cluster

Grouping

Toxicity

Filtering

Docking

.

Initial 3D

Structure

Calculation

OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.

Chemical informatics text analysis programs can process 100,000’s of abstracts of online journal

articles to extract chemical signatures of potential drugs.

Molecular

Mechanics

Calculations

Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.

  • CICC supports the NIH mission by combining state of the art chemical informatics techniques with

    • World class high performance computing

    • National-scale computing resources (TeraGrid)

    • Internet-standard web services

    • International activities for service orchestration

    • Open distributed computing infrastructure for scientists world wide

NIH

PubChem

DataBase

Quantum

Mechanics

Calculations

IU’s

Varuna

DataBase

POVRay

Parallel

Rendering

Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories


Cicc web service infrastructure

OSCAR Document Analysis

InChI Generation/Search

Computational Chemistry (Gamess, Jaguar etc.)

Varuna.net

Quantum Chemistry

Grid Services

Service Registry

Job Submission and Management

Local Clusters

IU Big Red

TeraGrid, Open Science Grid

Portal Services

RSS Feeds

User Profiles

Collaboration as in Sakai

CICC Web Service Infrastructure


Varuna environment for molecular modeling baik iu
Varuna environment for molecular modeling (Baik, IU)

Chemical

Concepts

Researcher

Papers

etc.

Experiments

ChemBioGrid

Simulation ServiceFORTRAN Code,

Scripts

DB ServiceQueries, Clustering,Curation, etc.

ReactionDB

QM

Database

Condor

PubChem, PDB,NCI, etc.

QM/MM

Database

TeraGridSupercomputers“Flocks”


ad