websphinx webgraph n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Websphinx & Webgraph PowerPoint Presentation
Download Presentation
Websphinx & Webgraph

Loading in 2 Seconds...

play fullscreen
1 / 7

Websphinx & Webgraph - PowerPoint PPT Presentation


  • 165 Views
  • Uploaded on

Websphinx & Webgraph. Inf 141 Information Retrieval Winter 2008. Assignment 3. See course webpage for specifications Due Friday Feb 8 th Working in groups of 2-3 people Email with subject: Inf 141 Team Registration Train your group

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Websphinx & Webgraph' - isabelle-hopper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
websphinx webgraph

Websphinx & Webgraph

Inf 141

Information Retrieval

Winter 2008

slide2

Assignment 3

  • See course webpage for specifications
  • Due Friday Feb 8th
  • Working in groups of 2-3 people
    • Email with subject: Inf 141 Team Registration
    • Train your group
      • Each member of your group must be able to run your architecture on their own for Assignment 04.
  • Quiz next wednesday
slide4

Websphinx

  • www.cs.cmu.edu/~rcm/websphinx/
  • To write a crawler, extend class Crawler and override shouldVisit () and visit() to create your own crawler.
    • visit(): The page is passed to the crawler's visit() method for user-defined processing.
    • shouldVisit(Link l): Callback for testing whether a link should be traversed.
      • Default returns true for all links.
      • Override for other behaviors.
    • http://www.cs.cmu.edu/~rcm/websphinx/doc/index.html
slide5

Websphinx

  • Create an array consisting of your seed set of links
    • Look at the Link Class
      • Links to webpage
      • Make a link from a string URL
      • Make a link from a start tag and end tag
    • Look at Page Class
      • Mainly supports automatically parsed HTML pages
      • Parsing produces a list of tags, words, an HTML parse tree, links
      • Can make pages
slide6

Webgraph

  • Webgraph is a framework to study the web graph
  • Use ArrayListMutableGraph class
    • Mutable graph class based on IntArrayList
    • Creates a new mutable graph copying a given immutable graph
      • ArrayListMutableGraph(ImmutableGraph g)
      • View ImmutableGraph class
  • http://webgraph.dsi.unimi.it/docs/