1 / 17

Web indexing

Web indexing. ICE0534 – Web-based Software Development July 21. 2005 Seonah Lee. Contents. News related to Web Indexing Web Indexing? Web Indexing: Styles Web Indexing: Tools Web Indexing in Search Engine Web Indexing in Google Summary References Question.

nikkos
Download Presentation

Web indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web indexing ICE0534 – Web-based Software Development July 21. 2005 Seonah Lee

  2. Contents • News related to Web Indexing • Web Indexing? • Web Indexing: Styles • Web Indexing: Tools • Web Indexing in Search Engine • Web Indexing in Google • Summary • References • Question

  3. Google tests tool to aid Web indexing By Dawn Kawamoto, CNET News.com, Monday , June 06 2005 12:00 AM

  4. Web Indexing? • Creating indexes for • individual web sites • Intranets • collections of HTML documents • collections of web sites. • Purpose for • helping users find information using a variety of keywords and gathering similar information.

  5. Web Indexing? • Indexes • systematically arranged items • entry points to go directly to desired information within a larger document or set of documents • Indexing • an analytic process of determining which concepts are worth indexing, what entry labels to use, and how to arrange the entries.

  6. Web Indexing: Styles (1/2) • Back-of-the-Book Style Web Indexing • Including “A-Z indexes” to websites or an Intranet • Some web indexes take the form of a list of hierarchical categories arranged in alphabetical order

  7. Web Indexing: Styles (2/2) • Metadata and Web Indexing • assigning keywords or phrases to web pages or web sites within a meta-tag field • so that the web page or web site can be retrieved with a search engine that is customized to search the keywords field.

  8. Web Indexing: Tools

  9. Web Indexing: The Most Famous Tool • HTML Indexer, by Brown Inc. • http://www.html-indexer.com/index.html

  10. Web Indexing in Search Engine • Phases of work of Web SE • Document gathering • Document indexing • Searching in response to a query • Visualization of search results The Web Parse Gathering Query Indexing Rank or Match Visualization

  11. Web Indexing in Search Engine • Almost every Web Search Engine uses a slightly different technique • The parsing discards some html marking • Some give different weight to terms in different html field • Some do not index the full text of the document, but only part of it • Some make full use of “metadata” • Very few make use of the information provided by linking: HITS and PageRank (Google)

  12. Web Indexing in Google • PageRank • Google assigns a number called the PageRank to every web page that it knows about. • Assumption: A page is important if other important web pages link to it • Each Page = Node • Directed Edge = a link from one to the other Main Page Google This Page Yahoo

  13. Web Indexing in Google • PageRank: Example Assumption: an average page has a PageRank of 1 R2 R2: 0.6 R1 R1: 1.2 R3 R3: 1.2 • R1 = R3 • R2 = R1 / 2 • R3 = R1 / 2 + R2 • R1 = 2R1 • R3 = R1 • 3 = R1 + R2 + R3

  14. Web Indexing in Google • HITS (Hyperlink-Induced Topic Search) • Divides pages relating to a topic into two groups • Authorities: pages with good content about a topic • Hubs: pages that link to many authority pages on a topic (directory) • Iteratively calculate hub and authority scores for each page in neighborhood and rank results accordingly • Document that many pages point to is a good authority • Document that points to many authorities is a good hub, pointing to many good authorities makes for an even better hub

  15. Summary • Web Indexing • Web Indexing Styles • Back-of-the-Book Style Web Indexing • Metadata and Web Indexing • Web Indexing Techniques in Google • HITS • PageRank

  16. References • News • http://news.com.com/2100-1032_3-5730744.html • Definition • http://www.marisol.com/websiteindexing.html • http://taxonomist.tripod.com/indexing/paperless.html • http://en.wikipedia.org/wiki/Web_indexing • Tools • www.stcsig.org/idx/articles/webindexing.pdf • Theory • http://amath.colorado.edu/outreach/demos/hshi/2001Sum/pagerank.html • http://www.cis.strath.ac.uk/~fabioc/04-mia/lects/11.pdf

  17. Question?

More Related