1 / 28

Effective XML Keyword Search with Relevance Oriented Ranking

Effective XML Keyword Search with Relevance Oriented Ranking. Presentation by Volker Rehberg. Paper by Zhifeng Bao , Tok Wang Ling, Bo Chen, Jiaheng Lu. Agenda. I ) Motivation and Background II) Inferring Keyword Search Intention III ) Relevance Oriented Ranking

tanaya
Download Presentation

Effective XML Keyword Search with Relevance Oriented Ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective XML Keyword SearchwithRelevanceOriented Ranking Presentationby Volker Rehberg Paper by ZhifengBao, Tok Wang Ling, Bo Chen, Jiaheng Lu

  2. Agenda I ) Motivation and Background II) Inferring Keyword Search Intention III ) RelevanceOriented Ranking IV ) Algorithms V ) Experimental Evaluation VI ) Conclusion

  3. Motivation and Background Whatis „Effective XML Keyword SearchwithRelevance Oriented Ranking“ all about? • Keyword search Issue 1: identifysearchfornodeIssue 2: identitysearch via node Issue 3: rank each query result

  4. Motivation and Background Ambiguities in interpretingthesearchfornodeandsearch vianode: Ambiguity 1: Keyword canappearas a xml tag nameandas a textvalueofsomeothernodes.

  5. Motivation and Background Ambiguities in interpretingthesearchfornodeandsearch via node: Ambiguity 2: Keyword canappearasthetextvaluesof different typesofxmlnodesandcarry different meanings.

  6. Motivation and Background Keyword query: Customer interestart SLCA returns 5 resultswithoutanyranking onlycomstumerwith ID C4 isdesiredandshouldbe top ranked

  7. Motivation and Background Problems of SLCA: • does not considersemanticsof query and XML Data • Keyword ambiguityproblem • Norelevanceorientedranking  answers irrelevant touserssearchintention • answers not meaningfulland informative enough

  8. Motivation and Background TF *IDF (Term Frequency * Inverse DocumentFrequency) • Rule 1: Inverse DocumentFrequency • Rule 2: Term Frequency • Rule 3: Normalization

  9. Motivation and Background query . flat document keyword Normalize document/term frequency: Number of documents occurencesof k in document d documents containing k Weightsof query q anddocument d:

  10. Inferring Keyword Search Intention Talking about “Art”: • Intuition :elementof „interest“ node, becausemanypeopleareinterested in art •  statisticsofunderlyingdatabase

  11. Inferring Keyword Search Intention Node type Tissearchfornodeif: 1: Tisintuitivelyrelatedtoevery query keyword in q. 2: Tis informative enoughtocontainenough relevant information 3: T does not containtomuch irrelevant information numberofT – typednodesthatcontainkaseithervaluesor tag names in theirsubtrees keyword in query q reductionfactor (range 0-1) normallychosentobe 0.8

  12. Inferring Keyword Search Intention Confidenceof a node type T tobedesiredsearchfornode: numberofT – typednodesthatcontainkaseithervaluesor tag names in theirsubtrees keyword in query q reductionfactor (range 0-1) normallychosentobe 0.8 Confidenceof a node type T tobedesiredsearch via node:

  13. Inferring Keyword Search Intention Keyword query: Customer name rock interestart • „art“ shouldbe in interestand „rock“ shouldbesearchedfor in name •  order ofkeywords in the query important

  14. Inferring Keyword Search Intention Value TypedDistance (Dist) Max(Distq (q, v, kt, k) , Dists (q, v, kt, k) In-Query Distance (IQD) Position distancebetweenktandk in q, ifktappearsbefore k in query StructuralDistance (Distq) Depthdistancebetweenv andthenearestkt – typedancestornodeofv node keyword that matches in v keyword that matches type of an anchester node of v

  15. Inferring Keyword Search Intention Keyword query: Customer name rock interestart

  16. RelevanceOriented Ranking Ranking Principles Searchingforcustomer via streetnodewith keyword query: Art Street Principle 1 only search via nodes affect relevance

  17. RelevanceOriented Ranking Ranking Principles Searchingforcustomersinterested in artusing query: „art“ Principle 1 Principle 2 only search via nodes affect relevance search via node should contain keyword

  18. RelevanceOriented Ranking Ranking Principles Keyword query: Customer name rock interestart Principle 1 Principle 2Principle 3 only search via nodes affect relevance search via node should contain keyword Order of keywords in query is important

  19. RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node similarityvaluebetween q and a query First (base) case: similaritiesbetweenleafnodeandthe query Recursivecase: recursivesimilaritiesbetweeninternalnodenandthe query

  20. RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node query similarityvaluebetween q and a similar to Classic TF*IDF: query flat document keyword

  21. RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node query similarityvaluebetweenqanda ConfidenceofTctobesearch via node childnodeof a Similaritybetweencandq (recursively) Overall weightofaforthegiven query q Intuition Intuition relevant ifchildrenhavehighconfidencetobe a search via nodeandare relevant toq more relevant childrenincreaserelevanceofnode type

  22. Algorithms Parsingtheinput XML document foreachnodenvisited: (1) Assign a DeweyIDton (2) Store theprefixpathprefixPathofn in hashtable

  23. Algorithms Build 2 indices: 1. Keyword invertedlist : (1): Dup : DeweyIDand XML TF*IDF (fa,k) (2): DupType: Dup + node type (prefixpath) (3): DupTypeNorm: DupType + normalizationfactorWa „Node“ tuple: <DeweyID, prefixPath, fa,k , Wa > 2. Frequency Table: - stores (frequencyofk in node type T)

  24. Algorithms The Algorithm: 1. Input: keywordsof query, invertedlist, frequencytable 2. Identifythesearchintentionandsearchfornode type 3. Rank bycomputing XML TF*IDF similaritybetweennandgiven query 4. returnrankedlist

  25. Experimental Evaluation XReal vs. SLCA vs. XSeek AimsofTesting: • Searcheffectiveness • Ranking effectiveness Datasets: • real Datasets (Washington XML Data Repository, DBLP) • syntheticdatasets (XMarkbenchmark)

  26. Experimental Evaluation

  27. Experimental Evaluation

  28. Conclusion • Identifysearchintentionand rank resultswithstatistics • Confidenceleveltobesearchfor/via nodewith XML TF*IDF • XML TF*IDF similarityrankingscheme • approachtriestosolveambiguityproblem • Prototype XReal

More Related