1 / 10

c raigslist++

c raigslist++. s ean a nastasi j oseph chen t atiana g ershanovich a ndreas sekine. our goal. to enhance craigslist’s interface show related items also being sold at craigslist show related items from other third-party sites. how we do it. main components crawler (heretrix)

barb
Download Presentation

c raigslist++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. craigslist++ seananastasi josephchen tatianagershanovich andreassekine cse454 craigslist++

  2. our goal • to enhance craigslist’s interface • show related items also being sold at craigslist • show related items from other third-party sites cse454 craigslist++

  3. how we do it • main components • crawler(heretrix) • clusterer (carrot2) • relevance sorting • user interface (greasemonkey) • other stuff cse454 craigslist++

  4. crawler • specific crawling needs • volatile data • questionable legalities • heritrix • only crawling one domain • problematic setup • our setup • 2 crawlers for new posts, 1 cleaner cse454 craigslist++

  5. clusterer • Carrot2 • what to cluster (title, body or title + body)? • need of reclustering and combination • WordNet • combination of synonym clusters cse454 craigslist++

  6. relevance sorting cse454 craigslist++

  7. relevance sorting (cont.) cse454 craigslist++

  8. user interface • greasemonkey • show related posts (grouped by clusters) • show which items have data • jquery • folding item lists • mouseover details/images cse454 craigslist++

  9. other • amazon product advertising api • yahoo term extraction • botnet cse454 craigslist++

  10. demo • greasemonkey plugin • https://addons.mozilla.org/en-US/firefox/addon/748 • craigslist++ script • http://cubist.cs.washington.edu/~lidor7/craigslistpp.user.js • craigslist • http://seattle.craigslist.org/ cse454 craigslist++

More Related