1 / 36

Static and Dynamic Scoring by Web Page Grouping

Static and Dynamic Scoring by Web Page Grouping. Hitoshi NAKAKUBO Takashi SATO. Introduction. Huge information exists on WWW space. The extraction of information, which the internet users need, is difficult. Web Search Engine It extracts information by a simple full-text search.

brodriquez
Download Presentation

Static and Dynamic Scoring by Web Page Grouping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Static and Dynamic Scoringby Web Page Grouping Hitoshi NAKAKUBO Takashi SATO

  2. Introduction • Huge information exists on WWW space. • The extraction of information, which the internet users need, is difficult. • Web Search Engine • It extracts information by a simple full-text search. • However, there is a limit in accuracy by a simple full-text search.

  3. Related Works • Link Structure Analysis • PageRankAlgorithm • This algorithm defines the link act as the recommendation act on linked web pages. • HITSAlgorithm • This algorithm defines two scores of Authority and Hub. • This algorithm can extract web communities which have similar information.

  4. Related Works’ Problem • PageRankAlgorithm • Is the link act a really recommendation act? • Web sites which refuse the link act excluding a specific page exist. • HITSAlgorithm • This algorithm has a known problem. • This algorithm cannot extract appropriate web communities at any time.

  5. Our Approachfor Problem Solving • PageRankAlgorithm • Problem • The score is decided based on the adjacent relation. • Our approach • We consider linking constructional adjacent relation recurrently solved. Enhancing of link structure

  6. Our Approachfor Problem Solving • HITSAlgorithm • Problem • An unrelated Web pages to the retrieval query are considered. • Our approach • We consider similarity of the algorithm application area and the retrieval query. Relation to retrieval query

  7. Proposal • Web Page Grouping • Making web page group with similar information • Static and Dynamic Scoring • Link structure analysis with Web Page Grouping. • Ranking • The final rank is decided by score annexation.

  8. Web Page Grouping • Purpose • Enhancing related to adjacent link structure • Concept • Groups are made on web pages with similar information. • Similar information: same authors / same contents • Two kinds of methods: directory structure / link structure

  9. Directory Structure Method Directory structure in a web site is defined as a tree structure. The leaves which have the same branch are made a group. A B D C E Grouping Algorithm Web Page Group Document root

  10. Static Scoring • Purpose • Decision of importance degree of Web pages • The problem of the PageRank algorithm is reduced. • Concept • Target document: all web pages • Target link structure: after Grouping

  11. Dynamic Scoring • Purpose • Decision of importance degree of web pages of retrieval query dependence • The problem of the HITS algorithm is reduced. • Concept • Target document: full-text search result set • Target link structure: before Grouping (#1) / after Grouping (#2)

  12. Ranking • Purpose • Decide a final score and the rank. • Concept • Each score is regularized. • The power root is applied to each score. • The weighting factor is multiplied to the score, and it adds. • The weighting factor is decided by the experiment.

  13. Experiment • Purpose • Effectiveness verification of proposal technique • Experiment item • Grouping evaluation • Score evaluation • Weighting factor best value verification

  14. Environment • Full-text search system • Variable-length gram base index • Retrieval target • Test collection “NW100G-01” (NTCIR-4 Web) • Retrieval query • 77 queries (NTCIR-4 Web) • Evaluation method • Weighted Reciprocal Rank

  15. Grouping Evaluation:# of Web Pages in Each Groups • Each groups have biased the number of web pages. • It influences the number of links in each groups. The technique of Grouping requires reexamining.

  16. > > > < Grouping Evaluation:Comparison of Grouping Result • Static: Number of Nodes: decrease Number of Links: decrease • Dynamic: Number of Nodes: decrease Number of Links: increase Processing result of expectation

  17. Score Evaluation: Comparison of Scoring Result • Leveling of score by Grouping Decrease in relevance document extraction ability

  18. Score Evaluation:Weighted Reciprocal Rank Grouping application Static Score unit: Relevant documents cannot be extracted.

  19. Score Evaluation:Weighted Reciprocal Rank Dynamic Score: Domination changes by the rank.

  20. Static Score Evaluation:Relevant Document Extraction Ratio None Not Apply & Apply Not Apply Not Apply Apply Apply Grouping … Not Apply: 61% / Apply: 13%

  21. Dynamic Score Evaluation:Relevant Document Extraction Ratio None Not Apply & Apply Not Apply Not Apply Apply Apply Grouping … Not Apply: 32% / Apply: 31%

  22. Score Evaluation:Each Score Feature

  23. The score is annexed based on a specific score. • The score of the Grouping application existence is annexed. Examination of Score Annexation • The influence on the rank is small in each score unit. • As for the Grouping application existence, the feature of the score is an opposite.

  24. Annexation ScoreCalculation Expression • Annex Score(p) = Wr ・Search Score(p) + Static Score(p) + Dynamic Score(p) • Static Score(p) = Ws1 ・Static Score w/o Grouping(p) + Ws2 ・Static Score w/ Grouping(p) • Dynamic Score(p) = Wd1 ・Dynamic Score #1(p)+ Wd2 ・Dynamic Score #2(p)

  25. (Wr, Ws1, Ws2, Wd1, Wd2) [ Rank ] Wr = {1, 2}, Wx = {0, 1, 2}, x∈{s1, s2, d1, d2} Weighting Factor Best Value Verification … … … …

  26. Annexation score that doesn't contain Dynamic Scores Weighting Factor Best Value Verification … … … …

  27. Annexation score including Dynamic Score #1 or #2 Weighting Factor Best Value Verification … … … …

  28. Annexation score including both Dynamic Scores Weighting Factor Best Value Verification … … … …

  29. +6% +180% vs. “Full-text Search+PageRank”Weighted Reciprocal Rank

  30. Consideration ofWeb Page Grouping • Method • Each groups have biased the number of web pages. • Effect • Static: Number of nodes: decrease Number of links: decrease • Dynamic: Number of nodes: decrease Number of links: increase Effectiveness is confirmed. The technique of grouping requires reexamining.

  31. Consideration ofStatic Scoring • Influence of Grouping • Change that applies score • It is scoring of documents different from existing techniques. • Leveling of score • The influence level to the ranking decreases. The documents that cannot be extracted by existing techniques are extractive. It is impossible to make the ranking change greatly.

  32. Consideration ofDynamic Scoring • Accuracy is very inferior. • A lot of incompatible documents are extracted. • Influence of accuracy of Grouping • Feature of each score • #1:The same document as existing technique A little influence • #2:Document different from existing technique Big influence It is necessary to experiment again.

  33. Consideration ofRanking • Evaluation result • The weighting factor of the best evaluation does not annex Dynamic Scores. • Influence of Grouping accuracy • Accuracy improvement of about 6% compared with existing technique • Score annexation expression • It is not possible to decide it by this experiment. The accuracy improvement by the proposal technique is confirmed.

  34. Conclusion • We proposed the ranking technique by Grouping. • Confirmation of effectiveness of each proposal technique • Confirmation of accuracy improvement by proposal technique • Future Work • Reexamination of Grouping • Investigation concerning Web page composition that each technique works effectively

  35. Thank you for your patience.

  36. < < < < > > < < Score Evaluation: Comparison of Scoring Result • The score distribution tendency changes by the Grouping application existence. Maximum: Not Apply > Apply Minimum: Not Apply < Apply

More Related