r gma architecture and query mediation 24 4 2003 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
R-GMA – Architecture and Query Mediation 24/4/2003 PowerPoint Presentation
Download Presentation
R-GMA – Architecture and Query Mediation 24/4/2003

Loading in 2 Seconds...

play fullscreen
1 / 27

R-GMA – Architecture and Query Mediation 24/4/2003 - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

R-GMA – Architecture and Query Mediation 24/4/2003. Werner Nutt (Heriot-Watt University) <w.nutt@hw.ac.uk>. Contributors. Rob Byrom RAL Andy Cooke Heriot-Watt Roney Cordenonsi QMUL Abdeslem Djaoui RAL Laurence Field PPARC Steve Fisher RAL Alasdair Gray Heriot-Watt

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'R-GMA – Architecture and Query Mediation 24/4/2003' - uyen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
r gma architecture and query mediation 24 4 2003

R-GMA – Architecture and Query Mediation 24/4/2003

Werner Nutt (Heriot-Watt University)

<w.nutt@hw.ac.uk>

contributors
Contributors
  • Rob Byrom RAL
  • Andy Cooke Heriot-Watt
  • Roney Cordenonsi QMUL
  • Abdeslem Djaoui RAL
  • Laurence Field PPARC
  • Steve Fisher RAL
  • Alasdair Gray Heriot-Watt
  • Steve Hicks RAL
  • Jason Leake RAL
  • Lisha Ma Heriot-Watt
  • James Magowan IBM-UK
  • Werner Nutt Heriot-Watt
  • Norbert Podhorszki SZTAKI
  • Manish Soni PPARC
  • Paul Taylor IBM-UK
  • Antony Wilson PPARC

R-GMA - Architecture and Query Mediation

grid monitoring where are the concepts
Grid Monitoring: Where are the Concepts?

There are two styles of talking about the Grid:

  • General metaphors(virtual organisations, services,…)
  • Low-level technicalities and jargon (LDAP, XML, SOAP, OGSA, OGSI, ...)

What is missing

  • Clear definitions of the problems
  • intuitive concepts for solving them

Needed for communication with both, users and developers

R-GMA - Architecture and Query Mediation

the grid monitoring problem
The Grid Monitoring Problem

In a Grid we have

  • Computers
  • Storage elements
  • Network nodes and connections
  • Application programmes, …

Monitoring:

  • What is the current state of the system?
  • How did the system behave in the past ?

R-GMA - Architecture and Query Mediation

monitoring data come in two kinds
Monitoring Data Come in two Kinds

A Grid monitoring system makes available two kinds of data

  • static data “pools”, e.g., databases on
    • network topology, nodes connected
    • applications available (versions, licences, ...)
  • “streams” of data, e.g.,
    • sensor data (cpu load, network traffic, ...)

Data streams may give rise to data pools if they are archived

Today:R-GMA is tailored towards streams,

but not pools

R-GMA - Architecture and Query Mediation

examples of monitoring queries
Examples of Monitoring Queries
  • “Show me the (average) cpu-load of computers at Heriot-Watt!”
  • “Between which nodes was yesterdaythe average transportation time for 1 MB packets higher than than 0.… seconds?”
  • For every node N, how many computers connected to N have currentlya cpu-load of no “ more than 30%?”

R-GMA - Architecture and Query Mediation

stream queries can have various temporal interpretations
Stream Queries can have Various Temporal Interpretations

Consider a query over the relation “Transport Time”

tt(src, dest, pcktSize, method, timestamp, time)

SELECT * FROM tt

WHERE src = ral AND dest = bologna

What is meant? Measurements

  • from now ?(Continuous Query)
  • up until now ?(History Query)
  • right now ?(Latest Snapshot Query)

Today: Queries can be “flagged” with their type

R-GMA - Architecture and Query Mediation

advanced queries mixing temporal query types
Advanced Queries: Mixing Temporal Query Types
  • “Which connections have currentlya transportation time that is higher than last week's average?”(latest snapshot and history)
  • “Show me the cpu load of those machines where it is lower than yesterday's load average!”

(continuous and history)

We do not intend to support such queries by R-GMA!

R-GMA - Architecture and Query Mediation

architecture approach 1 a monitoring data warehouse
Architecture Approach 1: A Monitoring Data Warehouse

Idea:

  • store all data about the Grid status into a huge database
  • and query it

Not realistic:

  • Loading takes time
  • Data occupy space
  • Connections to the warehouse may fail
  • Often monitoring data flow as data streams, and queries ask for data streams as output

R-GMA - Architecture and Query Mediation

approach 2 monitoring with a multi agent system

DirectoryService

find/register

Consumer

Monitoring-Application

Producer

Sensor

Data Base

Approach 2: Monitoring with a “Multi-agent System”

The Grid Monitoring Architecture (GMA) of the Global Grid Forumdistinguishes between:

  • Consumers of information
  • Producers of information
  • Directory Service
    • Producers register their supply
    • Consumers register their demand

Directory Service mediatesbetween producers and consumers

R-GMA - Architecture and Query Mediation

questions about gma
Questions about GMA:
  • Which kinds of producers and consumers are there?
  • In which language do producers register their supplyand consumers their demand ?
  • What is the meaning of a registration?
  • How does a consumer find suitable producers? And how does a producer find suitable consumers?
  • Producers have different capabilities to answer queries (e.g. selections, joins, …). Which of them should they register?

R-GMA - Architecture and Query Mediation

r gma a virtual monitoring data warehouse

DB

Query

DB-Producer

Stream Producer

Consumer

Views on S

Registry

V1V2...Vn

V

Sensor

Global Schema S

R-GMA: A Virtual Monitoring Data Warehouse
  • Language of producers and consumers: relational queries (SQL)
  • Vocabulary: Relations in a global schema
  • Consumer: poses queries over global schema
  • Producer:
  • has a type(stream p., database p.)
  • publishes relationsR1,…,Rk
  • for every R, registers a simple view V on the global schema

R-GMA - Architecture and Query Mediation

primary producers
Primary Producers

Database producer

  • supports queries over fixed set of tuples (static queries)
  • can be used to publish a database

Stream producer

  • supports queries over changing set of tuples (continuous queries)
  • supports “latest snapshot queries”
    • offers up-to-date values for each primary key

Today: DatabaseProducer’s and StreamProducer’s in R-GMA are different from the above!

R-GMA - Architecture and Query Mediation

communication modes of stream producers

ProducerServlet

ConsumerServlet

Producer

Consumer

IIIIIIII...

IIIIIIII...

Queue

Queue

Communication Modes of Stream Producers

Stream Producers may offer two communication modes for continuous queries:

  • lossless (… but tuples could become stale)
  • lossy (… but tuples are fresh)

Today: R-GMA’s StreamProducer’s are resilient and support lossless communication

R-GMA - Architecture and Query Mediation

republishers publish query answers
Republishers Publish Query Answers

Archiver: shows the history of a stream.

Stream Republisher: enables

  • merging,
  • thinning,
  • summarising of streams …

R-GMA - Architecture and Query Mediation

republishers in r gma today
Republishers in R-GMA Today

Republishers are called “archivers” (although some of them don't archive anything)

An archiver (= republisher)

  • is defined by a query
  • consumes only from “stream producers”
  • publishes the query result according to its type, using
    • a “stream producer”, or
    • a “latest snapshot producer”, or
    • a “database producer”

(which keeps an archive)

R-GMA - Architecture and Query Mediation

which view should a republisher register
Which View should a Republisher Register?

Problem:

Republishers may compute complex queries

… but complex views would confuse the “mediator”!

Ideas:

  • register a simplified view for a complex query
  • register a new table

R-GMA - Architecture and Query Mediation

what is the meaning of a query in r gma
What is the Meaning of a Query in R-GMA?

Assumption: the views of (primary) producers are selections on a single relation, i.e., queries of the form

SELECT *

FROM cpu_load

WHERE machine_id = ‘AB123’ AND loc = ‘hw’(each producer contributes its parts of a relation)

  • The virtualdatabase contains the union of the data of all the primary producers
  • Conceptually, a query is evaluated

over the entire virtual db

R-GMA - Architecture and Query Mediation

in r gma query answering needs mediation
In R-GMA Query Answering Needs Mediation

SupposeP1, P2produce for tt (Transport Time)

P1:… WHERE src = hw

P2:… WHERE src = ral AND pcktSize > 20

A global consumer poses its query over global relations

SELECT * FROM tt WHERE pcktSize > 10

A mediator translates this into queries over local relations

SELECT * FROM P1.tt WHERE pcktSize > 10

UNION

SELECT * FROM P2.tt

Today: R-GMA’smediator handles simple queries like the one above

R-GMA - Architecture and Query Mediation

global and local consumers
Global and Local Consumers
  • Global consumers pose queries over global relations

SELECT * FROM tt WHERE pcktSize > 10 ,

which are translated into queries over local relations

SELECT * FROM P1.tt WHERE pcktSize > 10

UNION

SELECT * FROM P2.tt

  • Local consumerspose queries over local relations directly

SELECT * FROM P1.tt WHERE method = ping

Today: a consumer can be global or local,

but local relations cannot be referred to explicitly

R-GMA - Architecture and Query Mediation

how does the mediator find suitable producers
How does the Mediator Find Suitable Producers?

P1, P2, P3produce for tt (Transport Time)

P1:… src = hw

P2:… src = ral AND pcktSize > 20

P3:… src = ral AND method = ping

Q: SELECT * FROM tt WHERE src = ral AND method = ping

We see: P1 is not suitable for Q, but P2 and P3 are. Why?

src = hwANDsrc = ral AND method = ping is never true

src = ral AND pcktSize > 20AND… is sometimes true

Satisfiability Test!

Today:implemented

R-GMA - Architecture and Query Mediation

so which producers should the mediator ask
… So Which Producers Should the Mediator Ask?

P2:… src = ral AND pcktSize > 20

P3:… src = ral AND method = ping

Q: SELECT * FROM tt WHERE src = ral AND method = ping

All answers to Q returned by P2 are also returned by P3 :

whenever

src = ral AND pcktSize > 20ANDsrc = ral AND method = ping

is true, then

src = ral AND method = pingANDsrc = ral AND method = ping

is true.

Hence, R-GMA only needs to askP3

Entailment Test!

Today:not implemented

R-GMA - Architecture and Query Mediation

but what did the producers promise
… But What Did the Producers Promise?

P registers view V

Does P promise

    • someof V ? (sound description)
    • allof V? (sound and complete description)
  • The Entailment Test only makes sense when the registered views are sound and complete descriptions
  • Producers should register completeness flags

R-GMA - Architecture and Query Mediation

why may a producer not be complete
… Why May a Producer not be Complete?
  • The language of views is more restricted than the language of queriesHence: republishers may be unable to say exactly what they publish
  • Archivers may archive in lossy mode
  • Producers may lose tuples
  • A producer may not know everything about the real world
  • Open to debate

R-GMA - Architecture and Query Mediation

keys in the global schema
Keys in the Global Schema

tt(src, dest, method, pcktSize, timestamp, time)

Intuitively, tthas the primary key

(src, dest, method, pcktSize, timestamp).

We need to know the primary keys

  • to understand the global schema
  • to answer latestsnapshot queries

But can we enforce them?

Sometimes, they hold globally if they hold locally !

Today:global tables have keys, which

are used to keep a latest snapshot cache

R-GMA - Architecture and Query Mediation

summary 1
Summary (1)

Types of Stream Queries

  • continuous vs. history vs. latest snapshot

Producers

  • primary producers vs. republishers
  • DBproducers: publish database
  • stream producers: lossless vs. lossy communication modes
  • republishers:materialised views vs. archivers vs. stream republishers

R-GMA - Architecture and Query Mediation

summary 2
Summary (2)

Global Schema

  • primary keys

Consumers

  • global vs. local consumers

Mediator

  • translates global query into local queries
  • applies SatisfiabilityTest to find suitable producers

Query Planning

  • EntailmentTest
  • sound vs. sound and complete producers

R-GMA - Architecture and Query Mediation