1 / 29

Précis: The Essence of a Query Answer

Précis: The Essence of a Query Answer. Georgia Koutrika , Alkis Simitsis , Yannis Ioannidis University of Athens ICDE 2006: 69-78. Presented by: Pavithra Panneerselvam March 28, 2011. “Woody Allen”.

toki
Download Presentation

Précis: The Essence of a Query Answer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Précis: The Essence of a Query Answer Georgia Koutrika, AlkisSimitsis, Yannis Ioannidis University of Athens ICDE 2006: 69-78 Presented by: PavithraPanneerselvam March 28, 2011

  2. “Woody Allen” “Woody Allen was born on December 1, 1935 in Brooklyn, New York, USA. As a director, Woody Allen’s work includes Match Point (2005), Melinda and Melinda (2004), Anything Else (2003). As an actor, Woody Allen’s work includes Hollywood Ending (2002), The Curse of the Jade Scorpion (2001).”

  3. Why require Précis Answers to Queries? • Exploring the contents of a vast web database and finding useful information is a difficult process and end users want to achieve their goals with minimum cognitive load. • Précis answers to queries help inexperienced users to easily search a database with no specific knowledge of a query language. • A precis answer may be incomplete in many ways. But it provides some information about the keyword being searched and provides directions for further searching.

  4. Challenges • Supporting precis queries using relational databases is not straight forward: • Pre-specified queries embedded in user-interface forms are not realistic. • Result relation is a single relation containing attributes from numerous relations and are not useful in creating precis answers.

  5. Contributions • Précis query framework: • Result schema generation • Result data generation • System Architecture • Customized Query Processing Algorithms • Experiments

  6. Data Model A database schema is represented in the form of a directed graph G(V,E) • Two types of Nodes (V) are: • Relation nodes • Attribute nodes • Two types of edges are: • Projection edges (∏) • Join edges (J) Example: • THEATRE(tid, name, phone, region) • PLAY(tid, mid, date) • GENRE(mid, genre) • MOVIE(mid, title, year, did) • CAST(mid, aid, role) • ACTOR(aid, aname, blocation, bdate) • DIRECTOR(did, dname, blocation, bdate)

  7. Database Graph

  8. Weights in Database Graph • Weights are assigned to each edge of the graph showing the significance of bond between the nodes.

  9. Weights in Database Graph • Weights may be set by the user at query time using an appropriate user interface. • Example – A user can specify what details they want about the given query. • Sets of weights may be created by a designer targeting different groups of users. • Example – movie critics need intense information than movie viewers. • Multiple sets of weights corresponding to different user profiles may be stored in the system. Using user-specific weights allows generating personalized answers. • Example: A user who always looks for theater location will have higher weightage for location.

  10. Transfer of Weights over Paths • The weight of a path is a function of the weight of constituent edges, and should decrease as the length of the path increases.

  11. Query Model • Given a database D and a precis query Q = {k1, k2,.. Km}, the result of applying Q on D is D’ that should satisfy the following conditions. • Relations in D’ should be a subset of D. • Attributes in a relation Ri of D’ is a subset of attributes in the relation Ri in D. • Tuples in a relation Ri of D’ is a subset of tuples in the relation Ri in D (when projected on a set of attributes). • The result database D’ is generated by join queries starting from where the key word specified in Q appears and then transitively expanding on the database schema. • The final set of relation names, attributes and tuples in D’ are determined by constraints.

  12. Constraints • Degree constraints – Number attributes and relations in a result database D’. • Maximum number of attributes in D’ • Minimum weight of projection path in the database schema graph • Cardinality constraints – Number of tuples in the result database D’ • Maximum number of tuples in D’. • Maximum number of tupes in a relation in D’ • One in each category should be provided to produce proper result of query Q.

  13. System Architecture • Q = {‘Woody Allen’} • Only projections with weight >=0.9 should be considered (Degree constraint) ie d = 0.9 • Up to three tuples should be retrieved per relation ie c =3 Q = {k1, k2,.. Km} Inverted Index Result schema Generator d Result Database Generator c Translator Template Data

  14. Inverted Index • Input - Q = {k1, k2,.. Km} • Output - ∀ki, ki, {(Rj,Alj,Tidslj)} • Example: • Input • Woody Allen • Output • Actor-Aname-A1234 • Director-Dname-A1234 • The authors built their own inverted index that allowed efficient retrieval of all occurrences of a token. • Q = {‘Woody Allen’} • Only projections with weight >=0.9 should be considered (Degree constraint) ie d = 0.9 • Up to three tuples should be retrieved per relation ie c =3

  15. Result Schema Generator • Responsible for finding which part of database may contain information related to the query. • Result Schema Generator Algorithm • Input: Database schema graph G(V,E) , degree criterion d • Output is result schema graph G’ (E’,V’)

  16. Result Schema Generator • Q = {‘Woody Allen’} • Only projections with weight >=0.9 should be considered (Degree constraint) ie d = 0.9 • Up to three tuples should be retrieved per relation ie c =3

  17. Result Database Generator • Responsible for producing resultant database corresponding to the schema generated. • Input: • Relation-Attribute-IDs • Result schema graph • Cardinality criterion c • Output: • Result database D’

  18. Result Database Generator Algorithm

  19. Result Database Generator Algorithm - Result • * The result only shows the graph with Woody Allen as a director

  20. Result Database Translator • To describe semantics of a relation in natural language, a relation is considered to have a conceptual meaning and a physical meaning. • Conceptual meaning of a relation is obtained from the name of the relation. Example – The relation name “Movie”. • Physical meaning of the relation is obtained from one of the attributes. In case of the relation movie, it is the attribute “Title”. (By default, this attribute has weightage 1). This attribute is called the “Heading Attribute”

  21. Result Database Translator – Template Labels • Template label (u,z) is assigned to each edge of the database schema graph. • The edge that connects an attribute with a relation has the label that signify the relationship between that attribute and heading attribute. This is applicable to join edges also. • Template label comprises of the following elements : • Label ID: Lid • Name of the starting node: l(u) • Name of the ending node: l(z) • Expressions Result of the Label can have the form: Expression1 + l(u) + expression2 + l(z) + expression3 • These labels are produced for all relations in the final query result.

  22. Result Database Translator - Example • The template clauses: • Director relation: • @DNAME + “ was born on ” + @BDATE + “ in ” + @BLOCATION • Movie relation • @TITLE + “ (” + @YEAR + “)” • Joins of Director and a Movie • “As a director” + @DNAME + “’s work includes” + MOVIE_LIST DEFINE MOVIE_LIST as [i<arityOf(@TITLE)] {@TITLE[$i$]+“ (”+@YEAR[$i$]+“),”} [i=arityOf(@TITLE)] {@TITLE[$i$]+“ (”+@YEAR[$i$]+“).”}

  23. Final Result of Precis Query “Woody Allen was born on December 1, 1935 in Brooklyn, New York, USA. As a director, Woody Allen’s work includes Match Point (2005), Melinda and Melinda (2004), Anything Else (2003). Match Point is Drama, Thriller. Melinda and Melinda is Comedy, Drama. Anything Else is Comedy, Romance.”

  24. Experiments: Evaluation of Result Schema Generator System: Using a prototype system implemented on top of Oracle 9i R2. Data: Internet Movies Database with information about over 340000 films. Degree d was set to the maximum number of attributes projected in the answer, and 20 randomly generated sets of weights for the edges of the database schema graph were used. Inference: Execution time of the Result Schema Generator is very small even for large values of d. Result Schema Generator execution time can be considered negligible.

  25. Experiments – Evaluation of Result Database Generator Cost models were compared with the algorithm results. Inference: Time increases linearly with respect to number of relations and tuples per relation. Execution time of the Result Database Generator is the most time-consuming part of the system.

  26. Conclusion In this paper, the authors: • Introduced the concept of precis queries. • Presented a framework that supports precis queries and algorithms for that. • Presented a set of results evaluating each part of the system implemented showing the potential of this work.

  27. Future Work • Investigate the possibility of having weights on data values as well. • Extend the translator capabilities. • Provide a graphical tool intended for use by a domain expert. • Further optimize of the whole process.

  28. References • Koutrika, G., Ioannidis, Y. (2005). Personalized Queries under a Generalized Preference Model. In ICDE’05. • Koutrika, G., Simitsis, A., Ioannidis, Y. (2005). Précis: The Essence of a Query Answer. Extended Version. url: http://www.dblab.ntua.gr/~asimi/publications/KoSI05.pdf • Agrawal, S., Chaudhuri, S., Das, G. (2002). DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE’02, 5-16.

  29. Thank You! Questions?

More Related