1 / 8

Challenge in Web Imformation Retrieval (2004)

Challenge in Web Imformation Retrieval (2004). 1.Imformation Retriveval on the Web. Identify the quality of pages PageRank and HITS Variants of they Anchor text An indication of the context of the web page. 1.Imformation Retriveval on the Web.

Download Presentation

Challenge in Web Imformation Retrieval (2004)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenge in Web Imformation Retrieval(2004)

  2. 1.Imformation Retriveval on the Web • Identify the quality of pages • PageRank and HITS • Variants of they • Anchor text • An indication of the context of the web page

  3. 1.Imformation Retriveval on the Web • Adersarial Classification: Dealing with Spam on the Web • Text spam and link spam • Adversarial classification • Evaluating Search Results • TREC • Using chickthrough data • Principled automate means for large-scale evaluation of ranking result

  4. 2.Using the Web to Create “Kernels” of Meaning • Relateness of fragments of text • A real-valued kernal function K(x,y) • Utilize external resources ,such as SE • Query expansion ,QE(x) and QE(y) • Compute the cosine between QE(x) and QE(y) • Open research issue • Effictive algorithm for a certain tasks • Identify poor expansions

  5. 3.Retrival of UseNet Article • Newsgroups and documents • Compute the inherent quality of an author • Netscan project • Ranking methods

  6. 4.Retrival of Images and Sounds • Retrieve images and sounds • Content Dectection • Content similarity assessment • Using surrounding textual imformation • Other • Near-duplicate • Rank • Video-retrieval

  7. 5.Harnessing Vast Quantities of Data • Spell correction • Probabilistic context sensitive model for SC • “Mehran Sahami”“Tehran”“Mehran Salhami” • Query classification into the open directory project • Short and ODP • A variety of different approaches • Enough training data make up for weaker modeling techniques

  8. 附:Clallenge in SE(2002) • Spam • Content quality • Quality evaluating • Web conventions • Duplicat hosts • Vaguely-structured Data

More Related