1 / 14

Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories

Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories. Alasdair J.G. Gray 1 M. Howard Williams 1 Werner Nutt 2 1 School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK. 2 Faculty of Computer Science,

zanna
Download Presentation

Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories Alasdair J.G. Gray1 M. Howard Williams1 Werner Nutt2 1School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK. 2Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. 5th December 2006

  2. Overview • Publishing distributed data streams • Incomplete stream histories • Answering conjunctive queries • Conclusions and Future Work A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  3. Main sources: sensors Characteristics: Unbounded Append only Frequency Managed by: Sensor networks Network/Grid monitoring Ubiquitous/Pervasive computing environments Streams of Data Reading A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  4. Consumer Consumer Secondary Producer Registry R-GMA as a Data Publishing Service • Grid Monitoring and Information Service • Strategy • PP & SP register in Registry using global schema • Consumer issues queries over agreed global schema • Mediator translates global query into local queries over sources Primary Producer Primary Producer Primary Producer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  5. Consumer Secondary Producer Data Stream Histories • Three types of queries • Primary Producers publish stream of data • Secondary Producer • Collects streams • Stores history in database • Only stores finite amount • Consumer queries stream history Store Primary Producer Primary Producer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  6. Problem of Incompleteness • Distribution: streams published by distributed sources • Network failures, lost data • Configuration errors • Finite memory: Secondary Producers store finite amount of history • Each SP has a Retention Period • Old tuples discarded • Different SPs may store similar data but different history length, frequency, etc. A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  7. Representing Missing Data • We assume that: • Producer can detect when there are tuples missing, e.g. • If PP produces fixed frequency • Sensor sequence number • Stream made up of channels • Channel: Tuples agree on key values For each channel, the missing tuples can be represented by a gap consisting of [start, end] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  8. Query Answers Query q can have 3 types of answer tuples: • Certain Positive Answer Tuple would be returned over complete data set • Certain Negative Answer Tuple would not be returned over complete data set • Possible Answer Tuple may be returned over complete data set A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  9. Example: Grid Monitoring Query 1 Machines with more than 5 running jobs in last 24 hours q1(CEId) compEle(CEId, fCPUs, rJobs, ts) /\ rJobs > 5 /\ [hist = 24hrs] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  10. Example: Grid Monitoring Query 2 • Machines with more than 5 running jobs in last 12 hours and • are linked to a storage element • q2(CEId)  compEle(CEId, fCPUs, rJobs, ts) /\ CESEBind(CEId, SEId) /\ • rJobs > 5 /\ [history = 12hrs] • Query answer: (2) • Query can be answered completely despite missing the data. A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  11. Example: Grid Monitoring Query 3 • Machines with more than 5 running jobs in last 24 hours and • are linked to a storage element with a load greater than 75 • q3(CEId)  compEle(CEId, fCPUs, rJobs, ts1) /\ storEle(SEId, cIO, ts2) /\ • CESEBind(CEId, SEId) /\ rJobs > 5 /\ cIO > 75 /\ [history = 24hrs] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  12. Co-operative Answer • Gaps affected answer to q3 • Return information about the relevant gaps • Allows users to reason about the effects of the incompleteness Very unlikely that there were any answers to q3 A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  13. Conclusions • Data streams are often incomplete • Stored histories of the stream will be incomplete • Presented a model for representing incompleteness • Developed algorithms for: • Answering conjunctive queries • Providing meta-data about the answer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

  14. Future Work • Investigate answering queries under different assumptions • Extend expressivity of queries to allow aggregate functions • Develop an implementation by extending R-GMA’s Mediator A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006

More Related