1 / 26

Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly

Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. Zhen Zhang, Bin He, and Kevin C. Chang. The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web. Explorer source discovery source modeling source indexing. FIND sources. Amazon.com.

arleen
Download Presentation

Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Light-weight Domain-based Form Assistant:Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang

  2. The Context:MetaQuerier @ UIUC Exploring and integrating the deep Web • Explorer • source discovery • source modeling • source indexing FIND sources Amazon.com Cars.com db of dbs • Integrator • source selection • schema integration • query mediation Apartments.com QUERYsources 411localte.com unified query interface

  3. The Need: Querying alternative sources in the same domain • Sources are proliferating in the same domain • 2004 survey found 10% Web sites are “deep” • totaling 450,000 DBs on the Web • Each query can often find many useful DBs • Different query needs different sources • How to query across dynamic sources?

  4. The Problem: Query translation on-the-fly • Challenge: • No pre-configured source-specific translation knowledge • Requirements: • Within domain: Source generality • Across domain: Domain portability

  5. Dynamic query translation – Essential tasks • Reconcile three levels of query heterogeneities • Attribute level: schema matching • Predicate level: predicate mapping • Query level: query rewriting

  6. Demo. Form Assistant to help navigate the deep Web.

  7. U Tom Clancy Tom Clancy Translation objective: Closest among the valid Source query Qs on source form S Target query form T Input: • Two goals: • Syntactic valid • semantic close Query Translation Filter:σtitle contain“red storm” and price < 35andage > 12 output: Union Query Qt*:

  8. Tom Clancy What is valid? Each source has a query model • Vocabulary: predicate templates { P1, P2, P3, P4, P5 } • Syntax: valid combination of predicate templates { F1, F2, F3, F4, F5, F6, F7, F8 } P1 P2 P3 P4 P5 F5: F6:

  9. What is close? Define semantic closeness. • Minimal subsuming Cmin • No false positive • Miss no correct answer • Minimizing false negative • Contain fewest extra answers • Clear semantic • Database content independent • Modular translation • Reduce translation complexity ? 0 35 s: 0 25 t1: 25 45 t2: 45 65 t3: Cmin 25 65 t1 v t2: 25 65 t2 v t3:

  10. What is close? Define semantic closeness. 0 35 s: 0 25 t1: 25 45 t2: • Minimal subsuming Cmin • No false positive: Miss no answer • Minimizing false negative: Fewest extra answers • Clear semantics: DB content independent • Modular translation: Reduce translation complexity ? 45 65 t3: Cmin 0 45 t1vt2: 0 65 t1vt2vt3:

  11. What mechanism? Source Query Query Translation Target Query Search for closest Enumerate valid ? Source Query Target Query Cmin Attribute Match Predicate Mapping Query Rewriter

  12. System architecture: Modular & lightweight • Modularized mechanism • Lightweight domain knowledge [ZhangHC- SIGMOD04] Form Extractor Form Extractor Source query Qs Target query form QI [RahmBernstein- VLDBJ01] [HeChang- SIGMOD03] Domain-specific Thesaurus Attribute Matcher: Syntax-based schema matching [WuYDM- SIGMOD04] ? Domain-specific type handlers Predicate Mapper: Type-based search-driven mapping Query Rewriter: Constraint-based query rewriting [Halevy-VLDBJ01] Target query Qt*

  13. U Predicate Mapping Predicate Mapping The core challenge: Predicate mapping • Objective • Minimal subsuming • Tasks • Choose operator • Fill in values Input: output: Union of target predicate t*

  14. Is source-specific translation applicable? price<$t  if $t<25: [price:between:0,25] elseif $t<45: … … … 1 ……… 1 ………….. adult = $t  passenger = $t … … …… 1 1 …….

  15. Enable source-generic predicate mapping? What is the scope of translation? What is the mechanism of translation?

  16. The right scope? Survey 150 sources for the Correspondence Matrix. • Correspondences occur within localities!

  17. Source predicates Target template P Predicate Mapper Type Recognizer Domain Specific Handler Text Handler Numeric Handler Datetime Handler Target Predicate t* The right scope? Correspondence locality  Type-based translation • Correspondences occur within localities • Translation by type-handler

  18. The right mechanism: Is pairwise-rule based mechanism suitable? Rule: attr<$t  if $t<25: [attr:between:0,25] elseif $t<45: … … … Template 1 n n+1 1 n new template n+1 • Adding one template needs to add 2n rules! • And need knowledge of the old templates.

  19. More extendable mechanism? Search-driven. s Templates of same type t u evaluator evaluator Values of the type (virtual database) -infinite 0 1 +infinite Evaluate over “database” 0 35 s: 0 25 t1: 25 45 Search for closest t2: Evaluation results 25 45 t1v t2: … …

  20. Greedy search to construct Cmin mapping • Find mapping iteratively • Each iteration, greedily choose the one covering maximal uncovered 0 35 s: 0 25 t1: 25 45 t2: 45 65 t3:

  21. Experiments • Translating 120 queries in total • Between randomly paired sources from 8 domains • With domain thesaurus but no type handler • Accuracy as ratio of correct condition per query Extraction 40% 18% Matching 42% Mapping Basic: 3 domains New: 5 domains Average accuracy Error distribution

  22. Conclusion • System: • Form assistant for querying Web databases • Problem • Dynamic query translation • Contributions: • Framework: Light-weight domain-based architecture • Techniques: Type-based search-driven pred. mapping

  23. Thank You! For more information: http://metaquerier.cs.uiuc.edu kcchang@cs.uiuc.edu

  24. Experiment: Accuracy distribution Accuracy distribution for New dataset Accuracy distribution for Basic dataset

  25. Text handler: Search space • Conceptually, union of all target predicate • Practically, close-world assumption

  26. Text handler: Closeness estimation • Ideally, logic reasoning • Practically, evaluation-by-materialization • Materialize query against a “complete” database 

More Related