1 / 25

Keyword Searching and Browsing in Databases using BANKS

Keyword Searching and Browsing in Databases using BANKS.  Gaurav Bhalotia  Arvind Hulgeri  Charuta Nakhe  Soumen Chakrabarti  S. Sudarshan 18th International Conference on Data Engineering (ICDE'02), 2002 Kushal Bansal. Outline. Introduction Database and Query Model

julietta
Download Presentation

Keyword Searching and Browsing in Databases using BANKS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keyword Searching and Browsing in Databases using BANKS  GauravBhalotiaArvindHulgeriCharutaNakhe  SoumenChakrabarti S. Sudarshan 18th International Conference on Data Engineering (ICDE'02), 2002 KushalBansal

  2. Outline • Introduction • Database and Query Model • Searching for the Best Answers • Interface and Templates of BANK System • Experiment and Performance • Conclusion

  3. Introduction • Web Search engines make use of unstructured queries • Users have to type in keywords and follow hyperlinks • Relational databases use structured query languages like SQL • Users need to know the schema of the database • Difficult for naïve users • For data stored in databases, keyword based techniques is not much useful • Data often splits across the tables due to normalization

  4. Introduction • BANKS (Browsing And Keyword Searching) • It is a system which provides search engine type interface to search and browse relational databases. • Allows interaction with controls on the displayed results. • No query language or programming required.

  5. Outline • Introduction • Database and Query Model • Informal Model • Formal Model • Query and Answer Model • Searching for the Best Answers • Interface and Templates of BANK System • Experiment and Performance • Conclusion

  6. Database and Query Model Informal Model Each database is modeled as a directed graph Each tuple in the database is modeled as a node in the graph. Every Primary – Foreign key relation is modeled as a directed edge.

  7. Database and Query Model Informal Model 4. An answer to a query is a subgraph connecting nodes matching the keywords. 5. The importance of a link depends upon the relations it connects and on its semantics

  8. Database and Query Model • The Schema

  9. Database and Query Model • Fragment of the Database

  10. Database and Query Model Formal Database Model • Node Weight • Each node u in the graph is assigned a weight N(u) • Node weight is also known as the node prestige • N (u) = Indegree of the node • Node score N = Root node weight + Sum of leaf node weights

  11. Database and Query Model • Formal Database Model • Edge Weights • Weight of the directed edge (u,v) given by • (u,v) exists but (v,u) does not = s (R(u), R(v)) • (v,u) exists but (u,v) does not = IN(u) s (R(v),R(u)) • If both exists = min [ s(R(u),R(v)), IN(u) s (R(v),R(u)) ]

  12. Database and Query Model • Formal Database Model • Edge Weights • Escore(e) of an edge = w(e)/w min • Escore overall = 1/ (1 + ∑ Escore(e)) • Escore overall is in the range [0,1]

  13. Database and Query Model Formal Database Model • Overall relevance score = Node weights + Edge Weight • Using weighting factor  • Additive: (1- ) E + N • multiplicative: E * N 

  14. Database and Query Model • Query and Answer Model • Query • Query consists of search terms t1 ,t2, ……tn • For each term ti we find set of nodes Si that are relevant to ti S = {S1,S2,…Sn} • Answer Model • An answer to a query is a rooted directed tree connecting keyword nodes • Relevance score of an answer tree • Relevance scores of it nodes and its edge weight

  15. Database and Query Model • Result of query “soumen and sunita”

  16. Outline • Introduction • Database and Query Model • Searching for the best answers • Backward expanding search algorithm • Interface and Templates of BANKS • Experiment and Performance • Conclusion

  17. Searching for the Best Answer • Backward expanding search algorithm • Assumes that the graph of the database fits in memory • Starts at leaf nodes each containing a query keyword • Run concurrent single source shortest path algorithm from each such node • Traverses the graph edges in reverse direction • Common vertex along the backward paths identify answer tree roots • Tree formed is a connection tree and root of tree is information node.

  18. Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion

  19. Interface • BANKS system provides • A rich interface to browse data stored in a relational database • Schema browsing and data browsing • Hyperlink to the referenced tuple • Columns can be projected away (dropped) • Selections can be imposed on any column • Tuples can be sorted by a specified column

  20. Templates • BANKS system provides several predefined templates • Cross – tabs • Group by • Folder Views • Graphical Interface for display in bar, line or pie chart

  21. Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion

  22. Experiment and Performance • Computed absolute value of the rank difference of the ideal answer and answer for each parameter setting. • Sum of the rank differences gives the raw Error score • Setting  = 0.2 with log scaling of edge weights did best, with an error score of 0.0

  23. Error scores vs. parameter choices

  24. Outline • Introduction • Database and Query Model • Searching for the best answers • Interface and Templates of BANKS • Experiment and Performance • Conclusion

  25. Conclusion • BANKS system • Provides an integrated browsing and keyword querying system for relational databases • Allows users with no knowledge of database systems or schema to query and browse relational database with ease • Reduces the effort involved in publishing relational data on the web and makes it searchable.

More Related