1 / 15

OGSA-DQP

OGSA-DQP. Steven Lynden University of Manchester. Introduction . OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points People involved: University of Manchester

sherise
Download Presentation

OGSA-DQP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OGSA-DQP Steven Lynden University of Manchester

  2. Introduction • OGSA-DQP is a servicebased distributed query processor • It evaluates queries over distributed data sources wrapped by OGSA-DAI • It is built using OGSA-DAI extensibility points • People involved: • University of Manchester • Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou, Norman Paton • University of Newcastle • Jim Smith, Arijit Mukherjee, Paul Watson • OGSA-DAI • Prototype release 3.0 available from the OGSA-DAI website Data access & integration with OGSA-DAI: GGF 17

  3. OGSA-DQP high-level overview • OGSA-DQP uses a middleware approach. • It can be seen as a mediator over OGSA-DAI wrappers. • Usability: use it as an OGSA-DAI data service. • DQP is capable of planning, scheduling and executing in parallel the distributed queries • Calls to analysis (Web) services can be declared within queries and invoked by DQP. Query Results OGSA-DQP OGSA-DAI OGSA-DAI DBMS DBMS data data Data access & integration with OGSA-DAI: GGF 17

  4. OGSA-DQP architecture Evaluator QE DQP activities installed OGSA-DAI data service Evaluator perform QE Evaluator QE The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator” AKA Grid Query Evaluation Service (GQES) Data access & integration with OGSA-DAI: GGF 17

  5. OGSA-DQP architecture • DQP evaluator services: • Are plain Web services • Implement the QueryEvaluation port type: • evaluate – the input is a query plan partition which is subsequently executed • receiveData – allows the evaluator to receive data from other evaluators • OGSA-DAI extensions: • DQP resource – a resource which encapsulates a distributed query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor. • OQL query statement activity – enables the submission of a query in Object Query Language (OQL) • DQP factory activity – enables the creation and configuration of DQP resources. Data access & integration with OGSA-DAI: GGF 17

  6. Example query • Given two DBMSs and one analysis tool (i.e., a Web service): • goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service • protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service • Blast (sequence alignment scoring Web service); • We want to obtain alignment scores for a sequence against proteins of a certain kind • The user submits a single query referencing data stored at multiple sites. • The author of the query need not be aware of how/where data is stored. • Queries are written in Object Query Language (OQL): select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId Data access & integration with OGSA-DAI: GGF 17

  7. Client interaction with OGSA-DQP • Two kinds of client/server interactions: • Configuration: the client sends a perform document requesting the service to create a DQP data service resource • Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1) The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource Data access & integration with OGSA-DAI: GGF 17

  8. DQP configuration <perform> <DQPFactory> Evaluator URLs OGSA-DAI data service resources Web service URLs </DQPFactory> </perform> OGSA-DAI data service GetRP OGSA-DAI data service OGSA-DAI data service GetRP perform DQP factory activity Result: resource ID of created DSR creates DQP DSR • Global schema of imported DBs & analysis services • Set of evaluators that can be used • Physical DB metadata (used to optimise queries) Data access & integration with OGSA-DAI: GGF 17

  9. DQP query evaluation <perform> <OQLQueryStatement> <expression> OQL query </expression> </OQLQueryStatement> </perform> Evaluator OGSA-DAI data service perform QE OGSA-DAI data service Evaluator Analysis service transport perform . . . QE OQLQueryStatement DQP DSR Evaluator OGSA-DAI data service perform QE Result: WebRowSet XML Stream Data access & integration with OGSA-DAI: GGF 17

  10. OQL Query Statement activity detail query single-node optimiser parser logical optimiser physical optimiser evaluators partitioner scheduler multi-node optimiser query results Data access & integration with OGSA-DAI: GGF 17

  11. Logical optimisation • Consider the query: select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId • Plan is expressed as a logical algebra • Multiple equivalent plans are generated reduce op_call (Blast) join (proteinId) reduce reduce scan (protein) scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17

  12. Physical optimisation • Plan is expressed as a physical algebra • Plan is chosen by cost-ranking of equivalent plans reduce op_call (Blast) hash_join (proteinId) reduce reduce table_scan (protein) table_scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17

  13. Query partitioning • Plan is transformed into a parallel algebra (physical operators + data exchange) • Exchange operators are placed where data exchange must take place reduce op_call (Blast) exchange hash_join (proteinId) exchange exchange reduce reduce table_scan (protein) table_scan termId=GO:0005942 (goTerm) Data access & integration with OGSA-DAI: GGF 17

  14. 4,5 reduce op_call (Blast) 2 exchange hash_join (proteinId) exchange exchange reduce reduce 1 3 table_scan (protein) table_scan termId=GO:0005942 (goTerm) Query scheduling • Allocate operators to evaluator nodes partitioned parallelism pipelined parallelism select p.proteinId, Blast(p.sequence) from protein p, goTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId Data access & integration with OGSA-DAI: GGF 17

  15. Conclusion • OGSA-DQP is a service based distributed query processor that is: • Exposed as a service • Implemented as an orchestration of services • Benefits: • Queries are executed in parallel • OGSA-DAI OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI • Web services can be invoked during query execution, merging data access with data analysis Data access & integration with OGSA-DAI: GGF 17

More Related