1 / 59

Grid Computing Overview

Grid Computing Overview. Thanks to Mark Ellisman. Coordinate Computing Resources, People, Instruments in Dynamic Geographically-Distributed Multi-Institutional Environment Treat Computing Resources like Commodities Compute cycles, data storage, instruments Human communication environments

dsalomon
Download Presentation

Grid Computing Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing Overview Thanks to Mark Ellisman • Coordinate Computing Resources, People, Instruments in Dynamic Geographically-Distributed Multi-Institutional Environment • Treat Computing Resources like Commodities • Compute cycles, data storage, instruments • Human communication environments • No Central Control; No Trust Advanced Visualization Data Acquisition Analysis Computational Resources Imaging Instruments Large-Scale Databases

  2. Factors Enabling the Grid • Internet is Infrastructure • Increased network bandwidth and advanced services • Advances in Storage Capacity • Terabyte costs less than $5,000 • Internet-Aware Instruments • Increased Availability of Compute Resources • Clusters, supercomputers, storage, visualization devices • Advances in Application Concepts • Computational science: simulation and modeling • Collaborative environments  large and varied teams • Grids Today • Moving towards production; Focus on middleware

  3. Computational Grids & Electric Power Grids Similarities/Goals of CG and EPG Ubiquitous Consumer is comfortable with lack of knowledge of details Differences Between CG and EPG Wider spectrum of performance & services Access governed by more complicated issues Security Performance Socio-political factors

  4. Growth of Data and Load vs. Moore’s Law Courtesy of Rick Stevens Metabolic Pathways Pharmacogenomics Human Genome Combinatorial Chemistry Computational Load ESTs Genome Data Moore’s Law 1990 2000 2010

  5. A Short History of the Grid • Grand Challenge Problems (1980s) • NSF and DOE initiatives • “Science is a team sport” • Initiate multi-resource projects involving computation, instruments, visualization, data • Evolution of Related Communities • Parallel computation • Address resource limitations • Networking • Gigabit testbed program • Investigate potential testbed network architectures • Explore usefulness for end-users CASA Gigabit Testbed (1990s)

  6. The Globus Project(Ian Foster and Carl Kesselman) Applications High-level Services and Tools GlobusView Testbed Status DUROC MPI MPI-IO CC++ Nimrod/G globusrun Core Services Nexus GRAM Metacomputing Directory Service Globus Security Interface Heartbeat Monitor Gloperf GASS Local Services Condor MPI TCP UDP LSF AIX Irix Easy NQE Solaris The Grid as a Layered Set of Services • Globus model focuses on providing key Grid services • Resource access and management • Grid FTP • Information Service • Security services • Authentication • Authorization • Policy • Delegation • Network reservation, monitoring, control

  7. NSF Extensible TeraGrid Facility Caltech: Data collection analysis ANL: Visualization LEGEND Visualization Cluster Cluster IA64 Sun IA32 0.4 TF IA-64 IA32 Datawulf 80 TB Storage 1.25 TF IA-64 96 Viz nodes 20 TB Storage IA64 Storage Server Shared Memory IA32 IA32 Disk Storage Backplane Router Extensible Backplane Network LA Hub Chicago Hub 30 Gb/s 40 Gb/s 30 Gb/s 30 Gb/s 30 Gb/s Figure courtesy of Rob Pennington, NCSA 30 Gb/s 10 TF IA-64 128 large memory nodes 230 TB Disk Storage GPFS and data mining 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 EV7 IA64 Sun EV68 IA64 Pwr4 Sun SDSC: Data Intensive NCSA: Compute Intensive PSC: Compute Intensive

  8. Critical Resources: WNY Computational & Data Grids • Computational & Data Resources (CCR) • 10TF Computing & 78TB Storage • Instruments (HWI, RPCI) • Microarray; Diffractometer; NMR • High-Throughput Crystallization Laboratory • Data Generation (HWI) • 7TB per year • Databases (UB-N, UB-S, BGH, CoE) • SnB; Multiple Sclerosis; Protein/Genomic

  9. Network Connections Medical/Dental BCOEB

  10. Network Connections (New) Medical/Dental BCOEB

  11. Advanced CCR Data Center (ACDC) Computational Grid Overview Nash: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Joplin: Compute Cluster Mama: Compute Cluster 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space ACDC: Grid Portal 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space Young: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Crosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 1 Dual Processor 250 MHz IP30 IRIX 6.5 Expanding RedHat, IRIX, Solaris, WINNT, etc CCR 19 IRIX, RedHat, & WINNT Processors Computer Science & Engineering School of Dental Medicine Hauptman-Woodward Institute 25 Single Processor Sun Ultra5s 9 Single Processor Dell P4 Desktops 13 Various SGI IRIX Processors Fogerty: Condor Flock Master T1 Connection Note: Network connections are 100 Mbps unless otherwise noted.

  12. ACDC Data Grid Overview Nash: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Joplin: Compute Cluster Mama: Compute Cluster 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space ACDC: Grid Portal 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space Young: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Crosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 182 GB Storage 70 GB Storage 100 GB Storage 100 GB Storage 56 GB Storage 136 GB Storage Network Attached Storage 480 GB CSE Multi-Store 2 TB Storage Area Network 75 TB Note: Network connections are 100 Mbps unless otherwise noted.

  13. WNY Grid Highlights • Heterogeneous Computational & Data Grid • Currently in Beta with Shake-and-Bake • WNY Release in March • Bottom-Up General Purpose Implemenation • Ease-of-Use User Tools • Administrative Tools • Back-End Intelligence • Backfill Operations • Prediction and Analysis of Resources to Run Jobs (Compute Nodes + Requisite Data)

  14. Advanced CCR Data Center (ACDC) Computational Grid Overview Nash: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Joplin: Compute Cluster Mama: Compute Cluster 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space ACDC: Grid Portal 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space Young: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Crosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 1 Dual Processor 250 MHz IP30 IRIX 6.5 Expanding RedHat, IRIX, Solaris, WINNT, etc CCR 19 IRIX, RedHat, & WINNT Processors Computer Science & Engineering School of Dental Medicine Hauptman-Woodward Institute 25 Single Processor Sun Ultra5s 9 Single Processor Dell P4 Desktops 13 Various SGI IRIX Processors Fogerty: Condor Flock Master T1 Connection Note: Network connections are 100 Mbps unless otherwise noted.

  15. Data Grid Motivation & Goal • Motivation: • Large data collections are emerging as important community resources. • Data Grids inherently complements Computational Grids, which manipulate data. • A data grid denotes a large network of distributed storage resources such as archival systems, caches, and databases, which are linked logically to create a sense of global persistence. • Goal: • To design and implement transparent management of data distributed across heterogeneous resources, such that the data is accessible via a uniform web interface.

  16. Data Grid Summary Data Grid 56 GB Storage 70 GB Storage 100 GB Storage Network Attached Storage 480 GB 100 GB Storage 136 GB Storage 182 GB Storage CSE Multi-Store 2 TB Storage Area Network 75 TB • 544 GB Storage • Located on 6 heterogeneous ACDC-Grid resources • 480 GB Storage • Located on 1 dual processor Dell PowerVault server • 75,000 GB Storage (10/03) • Served by 4 – 16 processor HP GS1280 servers • 2,000 GB Storage • Served by Sun Ultra-60 servers • 78,024 GB Total Data Grid Storage available and accessible from the ACDC-Grid Portal

  17. Grid-Based SnBObjectives • Install Grid-Enabled Version of SnB • Job Submission and Monitoring over Internet • SnB Output Stored in Database • SnB Output Mined through Internet-Based Integrated Querying Tool • Serve as Template for Chem-Grid & Bio-Grid • Experience with Globus and Related Tools

  18. Grid Enabled SnB • Problem Statement • Use all available resources in the ACDC-Grid for determining a single molecular structure. • Grid Enabling Criteria • All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application. • All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure database. • There are three modes of operation: • Continue submitting SnB application jobs until • the grid-enabled SnB application determines a solution has been found, or • “X” number of trials have been evaluated, or • indefinitely (grid job owner determines when a solution has been found).

  19. Grid Services and Applications Applications ACDC-Grid Computational Resources Apache Oracle Shake-and-Bake MySQL Globus Toolkit High-level Services and Tools NWS MPI C, C++, Fortran, PHP MPI-IO globusrun ACDC-Grid Data Resources Core Services Metacomputing Directory Service Globus Security Interface GRAM GASS Local Services Condor Stork WINNT MPI RedHat Linux Maui Scheduler TCP PBS UDP Irix Solaris LSF Adapted from Ian Foster and Carl Kesselman

  20. Notes • Apache – web portal server • PHP - used by apache server for dynamic web portal pages • MDS – traditional to use MDS with LDAP but we use MDS with MYSql grid portal database to keep information of available resources (we poll every 15 mins) • GRAM – Globus Resource Allocation Manager – API for requesting comptuational jobs • GASS – Global Access to Secondary Storage – API for accessing files stored on various platforms • Stork – Condor module for transporting job files within a flock

  21. Grid Enabled SnB • Required Layered Grid Services • Grid-enabled Application Layer • Shake – and – Bake application • Apache web server • MySQL database • High-level Service Layer • Globus, NWS, PHP, Fortran, and C • Core Service Layer • Metacomputing Directory Service, Globus Security Interface, GRAM, GASS • Local Service Layer • Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux

  22. Required Grid Services Applications High-level Services and Tools GlobusView Testbed Status DUROC MPI MPI-IO CC++ Nimrod/G globusrun Core Services Nexus GRAM Metacomputing Directory Service Globus Security Interface Heartbeat Monitor Gloperf GASS Local Services Condor MPI TCP UDP LSF AIX Irix Easy NQE Solaris Grid Implementation as a Layered Set of Services • Application Layer • Shake-and-Bake • Apache web server • MySQL database • High-level Services • Globus, PHP, Fortran, C • Core Services • Metacomputing Directory Service, Globus Security Interface, GRAM, GASS • Local Services • Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux

  23. Grid Enabled SnB Execution • User • defines Grid-enabled SnB job using Grid Portal or SnB • supplies location of data files from Data Grid • supplies SnB mode of operation • Grid Portal • assembles required SnB data and supporting files, execution scripts, database tables. • determines available ACDC-Grid resources. • ACDC-Grid job management includes: • automatic determination of appropriate execution times, number of trials, and number/location of processors, • logging/status of concurrently executing resource jobs, & • automatic incorporation of SnB trial results into the molecular structure database.

  24. ACDC-Grid Portal

  25. ACDC-Grid Portal Login Grid Portal login screen

  26. Data Grid Capabilities Browser view of “mlgreen” user files stored in the Data Grid

  27. Data Grid Capabilities Browser view of “miller” group files published by user “rappleye”

  28. Data Grid Capabilities Browser view of “public” user files published by user “miller”

  29. Data Grid Capabilities

  30. Data Grid Capabilities

  31. Grid Portal Job Status • Grid-enabled jobs can be monitored using the Grid Portal web interface dynamically. • Charts are based on: • total CPU hours, or • total jobs, or • total runtime. • Usage data for: • running jobs, or • queued jobs. • Individual or all resources. • Grouped by: • group, or • user, or • queue.

  32. Grid Portal Job Status

  33. ACDC-Grid Portal Condor Flock CondorView integrated into ACDC-Grid Portal

  34. ACDC-Grid Portal User Management Administrator based user based

  35. ACDC-Grid Portal Resource Management Administrator grants a user access to ACDC-Grid resources, software, and web pages.

  36. ACDC-Grid Administration

  37. ACDC-Grid Administration

  38. Grid Enabled Data Mining • Problem Statement • Use all available resources in the ACDC-Grid for executing a data mining genetic algorithm optimization of SnB parameters for molecular structures having the same space group. • Grid Enabling Criteria • All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application. • All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure databases.

  39. Grid Enabled Data Mining • There are two modes of operation and two sets of stopping criteria: • Data mining jobs can be submitted in • a dedicated mode (time critical), where jobs are queued on ACDC-Grid resources, or • in a back fill mode (non-time critical), where jobs are submitted to ACDC-Grid resource that have unused cycles available. • There are two sets of stopping criteria: • Continue submitting SnB data mining application jobs until • the grid-enabled SnB application determines optimal parameters have been found, or • indefinitely (grid job owner determines when optimal parameters have been found).

  40. Grid Enabled Data Mining ACDC-Grid Data Grid Data Mining Criteria ACDC-Grid Computational Resources Grid Portal Workflow Job Manager Molecular Structure Database

  41. SnB Molecular Structure Database Molecular Structure Database

  42. Grid Enabled Data Mining • Execution Scenario • User defines a Grid-enabled data mining SnB job using the Grid Portal web interface supplying: • designate which molecular structures parameter sets to optimize, • data file metadata, and • Grid-enabled SnB mode of operation dedicated or back fill mode, and • Grid-enabled SnB stopping criteria. • The Grid Portal assembles the required SnB application data and supporting files, execution scripts, database tables, and submits jobs for parameter optimization based on the current database statistics. • ACDC-Grid job management includes: • automatic determination of appropriate execution times, number of trials, and number of processors for each available resource, • logging and status of all concurrently executing resource jobs, • automatic incorporation of SnB trial results into the molecular structure database, and • post processing of updated database for subsequent job submissions.

  43. ACDC Data Grid Database Schema ACDC-Grid Data Grid

  44. Grid Portal Job Status ACDC-Grid Computational Resources

  45. Data Grid Overview • Enable the transparent migration of data between various resources while preserving uniform access for the user. • Maintain metadata information about each file and its location in a global database table. • Currently using MySQL tables. • Periodically migrate files between machines for more optimal usage of resources.

  46. Data Grid Functionality • Implement basic file management functions accessible via a platform-independent web interface. • Features include: • User-friendly menus/ interface. • File Upload/ Download to and from the Data Grid Portal. • Simple web-based file editor. • Efficient search utility. • Logical display of files for a given user in three divisions (user/ group/ public). • Hierarchical vs. List-based • 3 divisions: (user/ group/ public) • Sorting capability based on file metadata, i.e. filename, size, modification time, etc.

  47. Data Grid Functionality • Support multiple access to files in the data grid. • Implement basic Locking and Synchronization primitives for version control. • Integrate security into the data grid. • Implement basic authentication and authorization of users. • Decide and enforce policies for data access and publishing.

  48. Data Grid File Migration • Migration Algorithm • File migration depends upon a number of factors: • User access time • Network capacity at time of migration • User profile • User disk quotas on various resources

  49. Data Grid File Migration • We need to mine log files in order to determine • How much data to migrate in one migration cycle? • What is an appropriate migration cycle length? • What is a user’s access pattern for files? • What is the overall access pattern for particular files?

  50. Data Grid File Aging • Global File Aging vs. Local File Aging • User aging attribute • Indicative of a user’s access across their own files. • Attribute of a user’s profile. • During migration time, this attribute will determine which user’s files should be migrated off of the grid portal onto a remote resource. • Function of (file age, global file aging, resource usage)

More Related