1 / 34

R-GMA – DataGrid’s Monitoring System 1/7/2003

R-GMA – DataGrid’s Monitoring System 1/7/2003. Werner Nutt (Heriot-Watt University) <w.nutt@hw.ac.uk>. RGMA = Relational Grid Monitoring Architecture. Grid Monitoring and Information System developed within DataGrid (Work Package 3)

molly
Download Presentation

R-GMA – DataGrid’s Monitoring System 1/7/2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-GMA – DataGrid’s Monitoring System 1/7/2003 Werner Nutt (Heriot-Watt University) <w.nutt@hw.ac.uk>

  2. RGMA = Relational Grid Monitoring Architecture • Grid Monitoring and Information System developed within DataGrid (Work Package 3) • Based on the “Grid Monitoring Architecture” of the Global Grid Forum • Code is open source and freely availableHomepage: type “wp3” into Google R-GMA -DataGrid's Monitoring System

  3. Contributors • Heriot-Watt, Edinburgh • Andrew Cooke, Alasdair Gray, Lisha Ma, Werner Nutt • IBM-UK • James Magowan, Manfred Oevers, Paul Taylor • Queen Mary, University of London • Roney Cordenonsi • CCLRC/PPARC • Rob Byrom, Laurence Field, Steve Hicks, Manish Soni, Antony Wilson, Jason Leake • Linda Cornwall, Abdeslem Djaoui, Steve Fisher, Robin Middleton • SZTAKI, Hungary • Peter Kacsuk, Norbert Podhorszki • Trinity College Dublin • Brian Coghlan, Stuart Kenny, David O’Callaghan R-GMA -DataGrid's Monitoring System

  4. Overview • Grid monitoring: Requirements • The R-GMA approach: A virtual monitoring database • Components of R-GMA: • Schema • Producers and Consumers • Registry • Republishers • Query Planning R-GMA -DataGrid's Monitoring System

  5. Job Submission StatusInformation MonitoringSystem Resource Broker User Interface Logging and Bookkeeping StorageElement ComputingElement ReplicaCatalogue Computer Computer Computer Computer Computer Computer Data Transfer Major Components of DataGrid R-GMA -DataGrid's Monitoring System

  6. WP7: R-GMA Collects Network Monitoring Data R-GMA -DataGrid's Monitoring System

  7. The Grid Monitoring Problem In a Grid we have • Computers • Storage elements • Network nodes and connections • Application programmes, … Monitoring: • What is the current state of the system? • How did the system behave in the past ? R-GMA -DataGrid's Monitoring System

  8. Monitoring Data Come in two Kinds A Grid monitoring system makes available two kinds of data • static data “pools”, e.g., databases on • network topology, nodes connected • applications available (versions, licences, ...) • “streams” of data, e.g., • sensor data (cpu load, network traffic, ...) Data streams may give rise to data pools if they are archived Today:R-GMA is tailored towards streams, but not pools R-GMA -DataGrid's Monitoring System

  9. Examples of Monitoring Queries • “Show me the (average) cpu-load of computers at Heriot-Watt!” • “Between which nodes was yesterdaythe average transportation time for 1 MB packets higher than than 0.… seconds?” • For every computing element CE, how many computers of CE have currentlya cpu-load of no “ more than 30%?” R-GMA -DataGrid's Monitoring System

  10. Grid Monitoring Requirements • Support for publishing data “pools” and “streams” • Support for locating data sources(automatic, if possible) • Queries with different temporal interpretations(continuous, latest state, history) • Scalability(there may be thousands of data sources) • Resilience to failure(data sources may become unavailable) • Flexibility (we don’t know which queries will be posed) R-GMA -DataGrid's Monitoring System

  11. Architecture Approach 1: A Monitoring Data Warehouse Idea: • store all data about the Grid status into a huge database • and query it Not realistic: • Loading takes time • Data occupy space • Connections to the warehouse may fail • Often monitoring data flow as data streams, and queries ask for data streams as output R-GMA -DataGrid's Monitoring System

  12. DirectoryService find/register Consumer Monitoring-Application Producer Sensor Data Base Approach 2: Monitoring with a “Multi-agent System” The Grid Monitoring Architecture (GMA) of the Global Grid Forumdistinguishes between: • Consumers of information • Producers of information • Directory Service • Producers register their supply • Consumers register their demand Directory Service mediatesbetween producers and consumers R-GMA -DataGrid's Monitoring System

  13. Questions about GMA: • Which kinds of producers and consumers are there? • In which language do producers register their supplyand consumers their demand ? • What is the meaning of a registration? • How does a consumer find suitable producers? And how does a producer find suitable consumers? • Producers have different capabilities to answer queries (e.g. selections, joins, …). Which of them should they register? R-GMA -DataGrid's Monitoring System

  14. DB Query DB-Producer Stream Producer Consumer Views on S Registry V1V2...Vn V Sensor Global Schema S R-GMA: A Virtual Monitoring Data Warehouse • Language of producers and consumers: relational queries (SQL) • Vocabulary: Relations in a global schema • Consumer: poses queries over global schema • Producer: • has a type(stream p., database p.) • publishes relationsR1,…,Rk • for every R, registers a simple view V on the global schema R-GMA -DataGrid's Monitoring System

  15. Schema & Contributions R-GMA -DataGrid's Monitoring System

  16. Contributions are Views SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’RAL’ SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’GLA’ R-GMA -DataGrid's Monitoring System

  17. Keys in the Global Schema Network throughput: tp(src, dest, method, pcktSize, timestamp, time) Intuitively, tphas the primary key (src, dest, method, pcktSize, timestamp). We need to know the primary keys • to understand the global schema • to answer latestsnapshot queries Primary keys aredeclared, butnotenforced! Although, sometimes they hold globally if they hold locally ! R-GMA -DataGrid's Monitoring System

  18. Metaphor: Roles and Agents R-GMA Clients: Grid components or Grid applications • Clientscan play therolesof producers or consumers A client would need special capabilities for a role: • Clients are supported in their roles byagents Implementation: • APIs for client roles: “new StreamProducer(…)” • Agents are objectson a Web server R-GMA -DataGrid's Monitoring System

  19. Primary Producers Database producer • supports queries over fixed set of tuples (static queries) • can be used to publish a database Stream producer • supports queries over changing set of tuples (continuous queries) • supports “latest snapshot queries” • offers up-to-date values for each primary key in a db Today: DatabaseProducer’s and StreamProducer’s in R-GMA are different from the above! R-GMA -DataGrid's Monitoring System

  20. ProducerServlet ConsumerServlet Producer Consumer IIIIIIII... IIIIIIII... Queue Queue Communication Modes of Stream Producers Stream Producers may offer two communication modes for continuous queries: • lossless (… but tuples could become stale) • lossy (… but tuples are fresh) Today: R-GMA’s StreamProducer’s are resilient and support lossless communication R-GMA -DataGrid's Monitoring System

  21. Republishers Publish Query Answers Archiver: shows the history of a stream. Stream Republisher: enables • merging, • thinning, • summarising of streams … R-GMA -DataGrid's Monitoring System

  22. Republishers in R-GMA Today Republishers are called “archivers” (although some of them don't archive anything) An archiver (= republisher) • is defined by a query • consumes only from “stream producers” • publishes the query result according to its type, using • a “stream producer”, or • a “latest snapshot producer”, or • a “database producer” (which keeps an archive) Republishers are used to answercomplex queries! R-GMA -DataGrid's Monitoring System

  23. National Republisher country = ‘uk’ Local/site Republisher site =‘ral’ site = ‘hw’ Stream Producers ral hw The Next Step: Hierarchies of Stream Republishers R-GMA -DataGrid's Monitoring System

  24. Republisher Hierarchies:The Issues • Republishers are defined by queries:hierarchies have to be maintained automatically • newstream producers must only be added to republishers at “lowest level” • hierarchy has to be replanned if a republisher fails • difficult: transition from one plan to the other without loss of tuples • How well can we describe the content of a stream?Possibly need for descriptions that join • stream relations CPULoad(machineID, load, timestamp) • static relations locatedAt(machineID, site) R-GMA -DataGrid's Monitoring System

  25. What is the Meaning of a Query in R-GMA? Assumption: the views of (primary) producers are selections on a single relation, i.e., queries of the form SELECT * FROM cpu_load WHERE machine_id = ‘AB123’ AND loc = ‘hw’(each producer contributes its parts of a relation) • The virtualdatabase contains the union of the data of all the primary producers • Conceptually, a query is evaluated over the entire virtual db R-GMA -DataGrid's Monitoring System

  26. Stream Queries can have Various Temporal Interpretations Consider a query over the relation “Transport Time” tt(src, dest, pcktSize, method, timestamp, time) SELECT * FROM tt WHERE src = ral AND dest = bologna What is meant? Measurements • from now ?(Continuous Query) • up until now ?(History Query) • right now ?(Latest Snapshot Query) Today: Queries can be “flagged” with their type R-GMA -DataGrid's Monitoring System

  27. Advanced Queries: Mixing Temporal Query Types • “Which connections have currentlya transportation time that is higher than last week's average?”(latest snapshot and history) • “Show me the cpu load of those machines where it is lower than yesterday's load average!” (continuous and history) We do not intend to support such queries by R-GMA! R-GMA -DataGrid's Monitoring System

  28. In R-GMA Query Answering Needs Mediation SupposeP1, P2publish for tp (throughput) P1:… WHERE src = hw P2:… WHERE src = ral AND pcktSize > 20 A global consumer poses its query over global relations SELECT * FROM tp WHERE pcktSize > 10 A mediator translates this into queries over local relations SELECT * FROM P1.tp WHERE pcktSize > 10 UNION SELECT * FROM P2.tp Today: R-GMA’smediator handles simple queries like the one above R-GMA -DataGrid's Monitoring System

  29. Global and Local Consumers • Global consumers pose queries over global relations SELECT * FROM tp WHERE pcktSize > 10 , which are translated into queries over local relations SELECT * FROM P1.tp WHERE pcktSize > 10 UNION SELECT * FROM P2.tp • Local consumerspose queries over local relations directly SELECT * FROM P1.tp WHERE method = ping Today: a consumer can be global or local, but local relations cannot be referred to explicitly R-GMA -DataGrid's Monitoring System

  30. How does the Mediator Find Suitable Publishers? P1, P2, P3publish for tt (Transport Time) P1:… src = hw P2:… src = ral AND pcktSize > 20 P3:… src = ral AND method = ping Q: SELECT * FROM tt WHERE src = ral AND method = ping We see: P1 is not suitable for Q, but P2 and P3 are. Why? src = hwANDsrc = ral AND method = ping is never true src = ral AND pcktSize > 20AND… is sometimes true Satisfiability Test! Today:implemented R-GMA -DataGrid's Monitoring System

  31. … So Which Publishers Should the Mediator Ask? P2:… src = ral AND pcktSize > 20 P3:… src = ral AND method = ping Q: SELECT * FROM tt WHERE src = ral AND method = ping All answers to Q returned by P2 are also returned by P3 : whenever src = ral AND pcktSize > 20ANDsrc = ral AND method = ping is true, then src = ral AND method = pingANDsrc = ral AND method = ping is true. Hence, R-GMA only needs to askP3 Entailment Test! Needed for Republisher Hierarchies! (not yet implemented) R-GMA -DataGrid's Monitoring System

  32. … But What Did the Producers Promise? P registers view V Does P promise • someof V ? (sound description) • allof V? (sound and complete description) • The Entailment Test only makes sense when the registered views are sound and complete descriptions • Producers should register completeness flags R-GMA -DataGrid's Monitoring System

  33. … Why May a Producer not be Complete? • The language of views is more restricted than the language of queriesHence: republishers may be unable to say exactly what they publish • Archivers may archive in lossy mode • Producers may lose tuples • A producer may not know everything about the real world • Open to debate R-GMA -DataGrid's Monitoring System

  34. Summary (1) Monitoring data come in Pools and Streams Global Schema • primary keys Types of Stream Queries • continuous vs. history vs. latest snapshot Producers • DBproducers: publish database • stream producers: lossless vs. lossy communication modes R-GMA -DataGrid's Monitoring System

More Related