1 / 68

Future Scientific Infrastructure

This keynote talk explores the evolution of infrastructure from local power generation to the development of computing grids. It discusses the potential of computing grids to provide on-demand access to computing, data, and services. It also highlights the applications and benefits of distributed computing and visualization.

thompsonc
Download Presentation

Future Scientific Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FutureScientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster Keynote Talk, QUESTnet 2002 Conference, Gold Coast, July 4, 2002

  2. AC transmission => power Grid => economies of scale & revolutionary new devices Evolution of Infrastructure • 1890: Local power generation • 2002: Primarily local computing & storage • Internet & optical technologies => ???

  3. New capabilities constructed dynamically and transparently from distributed services “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) A Computing Grid • On-demand, ubiquitous access to computing, data, and services “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country” (Len Kleinrock, 1969)

  4. Distributed Computing+Visualization Remote CenterGenerates Tb+ datasets fromsimulation code FLASH data transferredto ANL for visualizationGridFTP parallelismutilizes high bandwidth (Capable of utilizing>Gb/s WAN links) WAN Transfer Chiba City Visualization code constructsand stores high-resolutionvisualizationframes fordisplay onmany devices Job SubmissionSimulation code submitted to remote center for execution on 1000s of nodes LAN/WAN Transfer User-friendly striped GridFTP application tiles the frames and stages tiles onto display nodes ActiveMural DisplayDisplays very high resolutionlarge-screen dataset animations FUTURE (1-5 yrs) • 10s Gb/s LANs, WANs • End-to-end QoS • Automated replica management • Server-side data reduction & analysis • Interactive portals

  5. eScience Application:Sloan Digital Sky Survey Analysis

  6. catalog tsObj core core brg field tsObj tsObj cluster field brg field tsObj brg brg field 5 4 3 2 1 2 1 2 1 1 2 3 Cluster-finding Data Pipeline

  7. Galaxy cluster size distribution Chimera Virtual Data System + iVDGL Data Grid (many CPUs) Chimera Application:Sloan Digital Sky Survey Analysis Size distribution of galaxy clusters?

  8. Human Models Grids at NASA: Aviation Safety Wing Models • Lift Capabilities • Drag Capabilities • Responsiveness Stabilizer Models Airframe Models • Deflection capabilities • Responsiveness Crew Capabilities - accuracy - perception - stamina - re-action times - SOPs Engine Models • Braking performance • Steering capabilities • Traction • Dampening capabilities • Thrust performance • Reverse Thrust performance • Responsiveness • Fuel Consumption Landing Gear Models

  9. Life Sciences: Telemicroscopy DATA ACQUISITION PROCESSING,ANALYSIS ADVANCEDVISUALIZATION NETWORK COMPUTATIONALRESOURCES IMAGING INSTRUMENTS LARGE DATABASES

  10. Business Opportunities • On-demand computing, storage, services • Significant savings due to reduced build-out, economies of scale, reduced admin costs • Greater flexibility => greater productivity • Entirely new applications and services • Based on high-speed resource integration • Solution to enterprise computing crisis • Render distributed infrastructures manageable

  11. Grid Evolution

  12. Butterfly.net: Grid for multi-player games Grids and Industry: Early Examples Entropia: Distributed computing(BMS, Novartis, …)

  13. Resources • Computing, storage, data A • Connectivity • Reduce tyranny of distance A A • Technologies • Build applications, services Grid Infrastructure • Services • Authentication, discovery, … • Communities • Operational procedures, …

  14. Example Grid Infrastructure Projects • I-WAY (1995): 17 U.S. sites for one week • GUSTO (1998): 80 sites worldwide, experim. • NASA Information Power Grid (since 1999) • Production Grid linking NASA laboratories • INFN Grid, EU DataGrid, iVDGL, … (2001+) • Grids for data-intensive science • TeraGrid, DOE Science Grid (2002+) • Production Grids link supercomputer centers • U.S. GRIDS Center • Software packaging, deployment, support

  15. Topics in Grid Infrastructure • Regional, national, intl optical infrastructure • I-WIRE, StarLight, APAN • TeraGrid: Deep infrastructure • High-end support for U.S. community • iVDGL: Wide infrastructure • Building a (international) community • Open Grid Services Architecture • Future service & technology infrastructure

  16. Topics in Grid Infrastructure • Regional, national, intl optical infrastructure • I-WIRE, StarLight • TeraGrid: Deep infrastructure • High-end support for U.S. community • iVDGL: Wide infrastructure • Building a (international) community • Open Grid Services Architecture • Future service & technology infrastructure

  17. NW Univ (Chicago) StarLight Hub UIC Atlanta I-WIRE Chicago Cross connect Ill Inst of Tech ANL Univ of Chicago Indianapolis (Abilene NOC) St Louis GigaPoP AMPATH NCSA/UIUC Targeted StarLightOptical Network Connections CERN Asia-Pacific SURFnet CA*net4 Vancouver Seattle NTON Portland U Wisconsin San Francisco NYC Chicago PSC NTON IU NCSA Asia-Pacific DTF 40Gb Los Angeles Atlanta San Diego (SDSC) AMPATH www.startap.net

  18. 18 4 Qwest455 N. Cityfront 4 10 4 McLeodUSA 151/155 N. Michigan Doral Plaza 4 12 12 Level(3) 111 N. Canal 2 2 2 I-Wire Fiber Topology Starlight (NU-Chicago) Argonne UC Gleacher Ctr 450 N. Cityfront UIC UIUC/NCSA • Fiber Providers: Qwest, Level(3), McLeodUSA, 360Networks • 10 segments • 190 route miles; 816 fiber miles • Longest segment: 140 miles • 4 strands minimum to each site State/City Complex James R. Thompson Ctr City Hall State of IL Bldg FNAL (est 4q2002) UChicago IIT Numbers indicate fiber count (strands)

  19. I-Wire Transport TeraGrid Linear 3x OC192 1x OC48 First light: 6/02 Starlight (NU-Chicago) Argonne Starlight Linear 4x OC192 4x OC48 (8x GbE) Operational UC Gleacher Ctr 450 N. Cityfront Qwest455 N. Cityfront Metro Ring 1x OC48 per site First light: 8/02 UIC UIUC/NCSA McLeodUSA 151/155 N. Michigan Doral Plaza State/City Complex James R. Thompson Ctr City Hall State of IL Bldg UChicago IIT • Each of these three ONI DWDM systems have capacity of up to 66 channels, up to 10 Gb/s per channel • Protection available in Metro Ring on a per-site basis

  20. DAS-2 Illinois Distributed Optical Testbed Northwestern Univ-Chicago “Starlight” I-290 UI-Chicago I-294 I-55 Illinois Inst. Tech Dan Ryan Expwy (I-90/94) Argonne Nat’l Lab (approx 25 miles SW) U of Chicago UIUC/NCSA Urbana (approx 140 miles South)

  21. Topics in Grid Infrastructure • Regional, national, intl optical infrastructure • I-WIRE, StarLight • TeraGrid: Deep infrastructure • High-end support for U.S. community • iVDGL: Wide infrastructure • Building a (international) community • Open Grid Services Architecture • Future service & technology infrastructure

  22. TeraGrid: Deep Infrastructure www.teragrid.org

  23. TeraGrid Objectives • Create unprecedented capability • Integrated with extant PACI capabilities • Supporting a new class of scientific research • Deploy a balanced, distributed system • Not a “distributed computer” but rather … • a distributed “system” using Grid technologies • Computing and data management • Visualization and scientific application analysis • Define an open and extensible infrastructure • Enabling infrastructure for scientific research • Extensible beyond the original four sites • NCSA, SDSC, ANL, and Caltech

  24. TeraGrid Timelines Proposal Submitted To NSF TeraGrid Operational Jan ‘01 Jan ‘02 Jan ‘03 McKinley systems Initial apps On McKinley Early access To McKinley At Intel Early McKinleys at TG sites for Testing/benchmarking TeraGrid clusters TeraGrid Prototypes TeraGrid prototype At SC2001, 60 Itanium Nodes, 10Gbs network “TeraGrid Lite” Systems and Grids testbed Grid Services on Current Systems Basic Grid svcs Linux clusters SDSC SP NCSA O2K Core Grid services deployment Advanced Grid services testing Networking 10Gigabit Enet testing TeraGrid Networking Deployment TeraGrid Operations Center Prototype Day Ops Production Operations Applications

  25. 64 inter-switch links 64 inter-switch links = 4 links 100Mb/s Switched Ethernet Management Network Terascale Cluster Architecture (a) Terascale Architecture Overview (b) Example 320-node Clos Network Myrinet System Interconnect Spine Switches Clos mesh Interconnect Each line = 8 x 2Gb/s links 128-port Clos Switches 64 hosts 64 hosts 64 hosts 64 hosts 64 hosts Add’l Clusters, External Networks 64 inter-switch links 64 TB RAID Local Display Networks for Remote Display Rendered Image files (c) I/O - Storage (d) Visualization (e) Compute • FCS Storage Network • GbE for external traffic

  26. Initial TeraGrid Design 384 McKinley Processors (1.5 Teraflops, 96 nodes) 125 TB RAID storage 384 McKinley Processors (1.5 Teraflops, 96 nodes) 125 TB RAID storage Caltech ANL DWDM Optical Mesh SDSC NCSA 768 McKinley Processors (3 Teraflops, 192 nodes) 250 TB RAID storage 2024 McKinley Processors (8 Teraflops, 512 nodes) 250 TB RAID storage

  27. 574p IA-32 Chiba City 256p HP X-Class 128p HP V2500 HR Display & VR Facilities 92p IA-32 HPSS HPSS UniTree HPSS 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 1500p Origin Sun E10K NSF TeraGrid: 14 TFLOPS, 750 TB Caltech: Data collection analysis ANL: Visualization WAN Architecture Options: • Myrinet-to-GbE; Myrinet as a WAN • Layer2 design • Wavelength Mesh • Traditional IP Backbone WAN Bandwidth Options: • Abilene (2.5 Gb/s, 10Gb/s late 2002) • State and regional fiber initiatives plus CANARIE CA*Net • Leased OC48 • Dark Fiber, Dim Fiber, Wavelengths Myrinet Myrinet SDSC: Data-Intensive NCSA: Compute-Intensive

  28. NSF TeraGrid: 14 TFLOPS, 750 TB 574p IA-32 Chiba City 256p HP X-Class 128p Origin 128p HP V2500 HR Display & VR Facilities Caltech: Data collection analysis 92p IA-32 HPSS HPSS ANL: Visualization SDSC: Data-Intensive Myrinet UniTree HPSS Myrinet 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 1500p Origin Sun E10K NCSA: Compute-Intensive

  29. Defining Standard Services Finite set of TeraGrid services- applications see standard services rather than particular implementations… Grid Applications …but sites also provide additional services that can be discovered and exploited. IA-64 Linux Cluster Runtime Interactive Collection-Analysis Service Volume-Render Service File-based Data Service Collection-based Data Service IA-64 Linux Cluster Interactive Development

  30. Standards  Cyberinfrastructure • TeraGrid: focus on a finite set of service specifications applicable to TeraGrid resources. • If done well, other IA-64 cluster sites would adopt TeraGrid service specifications, increasing users’ leverage in writing to the specification, and others would adopt the framework for developing similar services (for Alpha, IA-32, etc.) • Note the specification should attempt to offer improvement over general Globus runtime environment without bogging down attempting to do everything (for which a user is better off running interactively!) Certificate Authority Certificate Authority Certificate Authority Certificate Authority Grid Applications TeraGrid Certificate Authority Grid Info Svces Visualization Services Alpha Clusters IA-64 Linux Clusters IA-32 Linux Clusters Data/Information Compute Analysis File-based Data Service Interactive Development Interactive Collection-Analysis Service Collection-based Data Service Runtime Visualization Services Relational dBase Data Service

  31. Strategy: Define Standard Services • Finite number of TeraGrid Services • Defined as specifications, protocols, APIs • Separate from implementation • Example: File-based Data Service • API/Protocol: Supports FTP and GridFTP, GSI authentication • SLA: All TeraGrid users have access to N TB storage, available 24/7 with M% availability, >= R Gb/s read, >= W Gb/s write, etc.

  32. General TeraGrid Services • Authentication • GSI: Requires TeraGrid CA policy and services • Resource Discovery and Monitoring • Define TeraGrid services/attributes to be published in Globus MDS-2 directory services • Require standard account information exchange to map use to allocation/individual • For many services, publish query interface • Scheduler: queue status • Compute, Visualization, etc. services: attribute details • Network Weather Service • Allocations/Accounting Database: for allocation status

  33. General TeraGrid Services • Advanced Reservation • On-demand services • Staging data: coordination of storage+compute • Communication and Data Movement • All services assume any TeraGrid cluster node can talk to any TeraGrid cluster node • All resources support GridFTP • “Hosting environment” • Standard software environment • More sophisticated dynamic provisioning issues not yet addressed

  34. Topics in Grid Infrastructure • Regional, national, intl optical infrastructure • I-WIRE, StarLight • TeraGrid: Deep infrastructure • High-end support for U.S. community • iVDGL: Wide infrastructure • Building a (international) community • Open Grid Services Architecture • Future service & technology infrastructure

  35. iVDGL: A Global Grid Laboratory “We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.” From NSF proposal, 2001 • International Virtual-Data Grid Laboratory • A global Grid laboratory (US, Europe, Asia, South America, …) • A place to conduct Data Grid tests “at scale” • A mechanism to create common Grid infrastructure • A laboratory for other disciplines to perform Data Grid tests • A focus of outreach efforts to small institutions • U.S. part funded by NSF (2001-2006) • $13.7M (NSF) + $2M (matching)

  36. Tier1 (FNAL) Proto-Tier2 Tier3 university Initial US-iVDGL Data Grid SKC BU Wisconsin PSU BNL Fermilab Hampton Indiana JHU Caltech UCSD Florida Brownsville Other sites to be added in 2002

  37. Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link iVDGL:International Virtual Data Grid Laboratory U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

  38. iVDGL Architecture(from proposal)

  39. US iVDGL Interoperability • US-iVDGL-1 Milestone (August 02) US-iVDGL-1 Aug 2002 iGOC ATLAS SDSS/NVO CMS LIGO 1 1 2 2 1 1 2 2

  40. Transatlantic Interoperability • iVDGL-2 Milestone (November 02) DataTAG iVDGL-2 Nov 2002 iGOC Outreach ATLAS SDSS/NVO CMS LIGO ANL CS Research UC BNL FNAL CERN BU CIT CIT JHU INFN HU PSU UCSD UK PPARC IU ANL UTB UF U of A LBL UC UWM FNAL UM UCB OU IU UTA ISI NU UW

  41. Topics in Grid Infrastructure • Regional, national, intl optical infrastructure • I-WIRE, StarLight • TeraGrid: Deep infrastructure • High-end support for U.S. community • iVDGL: Wide infrastructure • Building a (international) community • Open Grid Services Architecture • Future service & technology infrastructure

  42. “Standard” Software Infrastructure:Globus ToolkitTM • Small, standards-based set of protocols for distributed system management • Authentication, delegation; resource discovery; reliable invocation; etc. • Information-centric design • Data models; publication, discovery protocols • Open source implementation • Large international user community • Successful enabler of higher-level services and applications

  43. Example Grid Projects in eScience

  44. MDS-2 (Monitor./Discov. Svc.) Soft state registration; enquiry Reliable remote invocation GSI (Grid Security Infrastruc-ture) User Reporter(registry +discovery) GIIS: GridInformationIndex Server (discovery) Gatekeeper(factory) Authenticate & create proxy credential Other GSI-authenticated remote service requests Create process Register User User process #1 process #2 Other service(e.g. GridFTP) Proxy Proxy #2 GRAM (Grid Resource Allocation & Management) The Globus Toolkit in One Slide • Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services) • Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, …

  45. Globus Toolkit: Evaluation (+) • Good technical solutions for key problems, e.g. • Authentication and authorization • Resource discovery and monitoring • Reliable remote service invocation • High-performance remote data access • This & good engineering is enabling progress • Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support • Growing community code base built on tools

  46. Globus Toolkit: Evaluation (-) • Protocol deficiencies, e.g. • Heterogeneous basis: HTTP, LDAP, FTP • No standard means of invocation, notification, error propagation, authorization, termination, … • Significant missing functionality, e.g. • Databases, sensors, instruments, workflow, … • Virtualization of end systems (hosting envs.) • Little work on total system properties, e.g. • Dependability, end-to-end QoS, … • Reasoning about system properties

  47. Job manager Job manager Globus Toolkit Structure Service naming Soft state management Reliable invocation GRAM MDS GridFTP MDS ??? Notification GSI GSI GSI Other Service or Application Compute Resource Data Resource Lots of good mechanisms, but (with the exception of GSI) not that easily incorporated into other systems

  48. Grid Evolution:Open Grid Services Architecture • Refactor Globus protocol suite to enable common base and expose key capabilities • Service orientation to virtualize resources and unify resources/services/information • Embrace key Web services technologies for standard IDL, leverage commercial efforts • Result: standard interfaces & behaviors for distributed system management: the Grid service

  49. Open Grid Services Architecture:Transient Service Instances • “Web services” address discovery & invocation of persistent services • Interface to persistent state of entire enterprise • In Grids, must also support transient service instances, created/destroyed dynamically • Interfaces to the states of distributed activities • E.g. workflow, video conf., dist. data analysis • Significant implications for how services are managed, named, discovered, and used • In fact, much of OGSA (and Grid) is concerned with the management of service instances

  50. Open Grid Services Architecture • Defines fundamental (WSDL) interfaces and behaviors that define a Grid Service • Required + optional interfaces = WS “profile” • A unifying framework for interoperability & establishment of total system properties • Defines WSDL extensibility elements • E.g., serviceType (a group of portTypes) • Delivery via open source Globus Toolkit 3.0 • Leverage GT experience, code, community • And commercial implementations

More Related