1 / 15

EasyQuerier: A Keyword Interface in Web Database Integration System

EasyQuerier: A Keyword Interface in Web Database Integration System. Xian Li 1 , Weiyi Meng 2 , Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton. Traditional Integrated Interface. Domain list. Manually. Q. Manually. Integrated interface of Job. What does EasyQuerier look like.

pearl
Download Presentation

EasyQuerier: A Keyword Interface in Web Database Integration System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li1, Weiyi Meng2, Xiaofeng Meng1 1 WAMDM Lab, RUC & 2 SUNY Binghamton

  2. Traditional Integrated Interface Domain list Manually Q Manually Integrated interface of Job

  3. What does EasyQuerier look like • EasyQuerier Q Manually Automatically Q …… Automatically Q Integrated interface of Job

  4. New Features of EasyQuerier • Automatically domain mapping • User do not need to select domain from long list • More flexible Keyword Query • Different kinds of data type • Text, numeric, currency, date • More logic relation covered • “and”, “or”, “between…and” • Q1: New York or Washington, education, $2000-$3000 • U1={New York, Washington}, logic: or • U2={education} • U3={$2000, $3000}, logic: range • Automatically query translation

  5. EasyQuerier: overview • Part 1: Domain Map • Collect the domain knowledge from candidate domains • Similarity based domain mapping strategy • Part 2: Query translation • Partially Keyword-attribute map • Holistically Keyword-attribute map

  6. Challenge 1: Domain Mapping • Problem statement • Map a user query to the correct domain automatically without domain information to be separately entered. • Our solution • Domain representation model • Term weight assignment • Query-domain similarity

  7. Domain mapping(1) • Domain representation model • D =< d_ID; CT; AT; V T > • d_ID: unique domain identifier. • CT = {cti|i=1,2,…} is a set of Conceptual Terms, which describe the whole domain concept • AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms consisting of attribute labels of the products in this domain • InteLabel, LocalLabel, OtherLabel • VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms associated with the products’ attributes in the domain • Text Attribute: inteValue, LocalValue, Other Value • Non-text Attribute: VT can be characterized by the pre-defined ranges available on the integrated interfaces.

  8. Domain mapping(2) • Different terms have different ability to differentiate the domains. • “price” is less powerful than “title” in differentiating the book from others • Term weight assignment • Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases • Ifijmeans how many times tjappears in either AT or VT in DiCVVjas the CVV for tj • Weight(Di tj) = CVVj * ifij.

  9. Domain mapping(3) • Q = {u1, u2, …, un}, ui ={vi1, vi2, …} • Q1 example • U1= {New York, Washington}, vi1={New York}, vi2= {Washington} • For each term tj in VT or AT we only record the most matching term tj • =

  10. Challenge 2: Query translation • Problem statement • Translate the query to the integrated interface • Just like filling the integrated interface with a set of keywords • Computation model • Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). • Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching degree. • Def 4.3 (Query Translation Solution (QTS)) A QTS represents a strategy of filling in the query interface. A QTS is comprised of severalKAMs. • Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS

  11. Query translation(1) • Computation of DM • For Q = {u1, u2, …, un}, ui ={vi1, vi2, …} , Sim(vxi, Aj) is the maximum value of all Sim(vxi,tj) • Where the tj in the VT of Aj , Sim(vxi,tj) (same as domain map)

  12. Query translation(2) • Conviction • Conviction value of a QTS is a weighted sum of the DMs of the related KAMs • Why weight? • If an attribute appears in more local interfaces of a domain, it is more important in the domain. • weight w(Aj) for each attribute Ajbased on its interface frequency ifi • For an attribute within the domain D

  13. Experiment • Settings • 9 domains, each covers 50 web databases • 10 students, 20 keyword queries for each domain • Measurement • Correct/acceptable/wrong • Overall/with domain/with attribute label/value only Fig2: query translation accuracy Fig1: domain mapping accuracy

  14. Conclusion • In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases. • We developed solutions to two technical challenges • map keyword query to appropriate domains • translate the keyword query to a query for the integrated search interface of the domain

  15. Thank you~

More Related