Grid Computing and Middleware
250 likes | 421 Views
Grid Computing and Middleware. Shawn Malhotra Monday, February 5 th , 2007. Overview. Background and definition Importance of middleware Globus Toolkit Sample Applications. What is Grid Computing?. Computing model that leverages the power of many networked resources Not just CPUs
Grid Computing and Middleware
E N D
Presentation Transcript
Grid Computing and Middleware Shawn Malhotra Monday, February 5th, 2007
Overview • Background and definition • Importance of middleware • Globus Toolkit • Sample Applications
What is Grid Computing? • Computing model that leverages the power of many networked resources • Not just CPUs • Storage devices, special equipment (i.e. telescope) • Share resources across administrative domains • Requires security features • Different than traditional cluster computing • Programmer sees a single ‘virtual computer’ • Web ↔ Information as Grid ↔ Computing Power
Why is Grid Computing Important? • Helps solve computationally expensive problems • Flexible enough to handle many small problems • Share costly resources amongst institutions • Federally funded research labs / academic institutions • Make resources available to anybody • Cost barrier is lowered • ‘Pay as you go’ type service • Increases overall bandwidth
Motivation for Middleware • Need robust, efficient ways to pool resources • Previous ‘ad-hoc’ methods not sufficient • Need for standardization! • Distributed Computing System (DCS) • Developed at the University of California at Irvine • Early 1970s • Focus on CPU management • Poor security solution • Abandoned in the 1980s
Globus Toolkit • Broader scope, more complete solution • CPU Management • Storage Management • Monitoring Services • More details to come … • Most popular grid computing framework • Implements several standards • OGSA, WSRF, SOAP, WSDL
Globus Toolkit - Overview • Facilitates grid application development • Open, extensible, flexible, high abstraction
Job Submission • GRAM interface • Grid Resource Allocation and Management • Specify resource requirements and flow • Uniform way to submit remote jobs • Translate request for local resources • Offers a variety of features • Retrieve job status • Send job signals (kill, start, restart) • Uses Web services interface
Job Scheduling • What happens after the job is submitted? • Submitted to a scheduler • Queues jobs decides where/when to run • Requirement matching, priority systems, etc. • Abstracts resources from user • Pool heterogeneous resources together • Can have multiple layers of scheduling • Local schedulers vs. Metaschedulers
Security • Access to resources must be controlled • Grid Security Infrastructure (GSI) • Provides basic security constructs • Certificate-based PKI system • Supports single sign-on over the grid • Supports delegation • Access control left to individual services • Infrastructure provides necessary info and control • Uses Web services interface
Other Provided Modules • Data management • Facilitates file transfer, access to data stores • Monitoring and discovery • APIs to get status, subscribe to content • Important since ‘grid’ is never down, only components • Collaboration tools • Facilitates person-to-person collaboration • Build web portals for chat, e-mail, etc.
Example Applications • What can you build with such a toolkit? • Applications range from the depths of the sea to the stars above! • LOOKING deep sea research • Condor batch computing infrastructure • BIRN medical resource pooling • LEAD meteorological data • NVO virtual observatory
Workload management system • Queuing, scheduling, prioritization, monitoring • Pool desktops into batch system • Use when idle, auto-detect when busy again • ClasAd mechanism • Novel way to match resources with requests • Flocking • Seamless combination of multiple networks http://www.cs.wisc.edu/condor
Make tools / data related to oceanography available to all researchers • ‘20,000 Terabits Beneath the Sea’ • Presented at iGrid2005 • Real-time high definition deep sea video • Monitor active underwater volcanoes http://lookingtosea.ucsd.edu/
Resource pooling • Tools for research and diagnoses • Collaboration • Common user interface • Better hypotheses testing • Use a distributed patient population http://www.nbirn.net/
Sharing meteorological resources • Algorithm Development and Mining (ADaM) • Works on observational data • Provides analysis tools • ARPS Data Assimilation System (ADAS) • Provides visualization tools • Earth Science Markup Language (ESML) • Uniform way of expressing data • Data Access Systems • Allow uniform access to distributed data https://portal.leadproject.org/gridsphere/gridsphere
Expose the vast amount of astronomical data for all to use • Telescopes will produce 7 petabytes per year by 2012 • Standardized way of expressing data • VOTable • Creation of tools to produce required data • ConeSearch • Make accessing data like using real tools http://www.us-vo.org/
The WISDOM Project • Analyze potential anti-malaria drugs • Focus lab tests on promising compounds • Uses up to 5000 computers in 27 countries • Simulate drug interaction with malaria protein • Test 80,000 drugs per hour, 140 million in total • Shows the power of collaboration • Many computers borrowed from particle physics simulator in the UK – GridPP • Shared spare capacity http://grid.globalwatchonline.com/epicentric_portal/site/GRID/
Grid Computing – The Future • Currently the domain of ‘Big Science’ • Make it more mainstream for ‘Little Science’ • Technology is not the barrier • Evolution of the standards • Continued enhancement of the toolkit • Better front-end design • Promote peer-to-peer collaboration • Security is still a challenge
Summary • Grid computing is a powerful collaborative computing model • Grid computing requires efficient, fully featured middleware to thrive • Grid computing enables research and development that is not possible in isolation
References • Globus site • http://www.globus.org/ • Wikipedia • http://en.wikipedia.org/wiki/Grid_computing • Grid Café • http://gridcafe.web.cern.ch/gridcafe/
The Need for Grid Solutions • Grids are essential to sustain Moore’s Law as physical limitations will eventually limit what individual computing stations can achieve • It will become less necessary as individual resources become more powerful since technology grows faster than the complexity of our research
The Corporate Barrier • True grid computing will never be embraced by corporations due to security issues and sensitivity of data. This will limit the scope and power of the technology • Much like Web 2.0 has caused a shift in corporate presence on the internet, a ‘Grid 2.0’ will eventually force corporations to embrace this technology
Grid Middleware • Middleware designed to manage a grid will eventually merge with software designed to handle multiple CPUs on one motherboard to form a common solution. • Grid computing is far too different from multi-CPU processing to ever offer a common solution.
Expanding User Base • Development of a good middleware solution that abstracts most details of the grid will bring grid computing to ‘Little Science’ and eventually individual users. • The complexity of grid computing and lack of demand will prevent grid computing from ever becoming part of the main stream.