extracting local understandings from user generated reviews on city guide websites
Download
Skip this Video
Download Presentation
Extracting Local Understandings from User-Generated Reviews on City Guide Websites

Loading in 2 Seconds...

play fullscreen
1 / 12

Extracting Local Understandings from User-Generated Reviews on City Guide Websites - PowerPoint PPT Presentation


  • 253 Views
  • Uploaded on

Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006 Overview Motivations Corpus Processing Nickname discovery Ongoing experiments Attraction extraction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Extracting Local Understandings from User-Generated Reviews on City Guide Websites' - lotus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
extracting local understandings from user generated reviews on city guide websites

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

Andrea Moed

IS256 Applied Natural Language Processing

Professor Marti Hearst

December 6, 2006

overview
Overview
  • Motivations
  • Corpus
  • Processing
  • Nickname discovery
  • Ongoing experiments
    • Attraction extraction
    • Review classification
  • Future work

Andrea Moed | IS56 ANLP

motivations
Motivations
  • Local knowledge of well-known places… for locals
    • “Nobody goes there anymore, it’s too crowded”
    • Major draws (views, dishes, people…)
    • Best times/seasons/modes of transport?
    • Places to combine in one excursion
  • “A good place for X” vs. a Great Good Place*
    • *Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999

Andrea Moed | IS56 ANLP

corpus
Corpus
  • Yelp San Francisco
    • Social site organized around cities, launched 2004
    • Thousands of SF places, reviews and reviewers
    • Largely local interest (Mass Media, Pets)
    • Some areas useful for visitors (Night Life, Shopping)
    • Writerly culture high structural and stylistic variation in the text
  • Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor
    • Destinations
    • Frequently reviewed places: 20+ reviews

Andrea Moed | IS56 ANLP

processing
Processing
  • Used Dappit to build page scrapers
  • Generated XML; parsed in Python
    • Place objects consisting of location info + reviews
    • Corpus collects place objects from various categories
  • Challenges of screen scraping
    • Tradeoff between more places and places with most reviews (optimization requires exhaustive search)
    • TripAdvisor proved too difficult
  • Analysis with Python and NLTK Lite

Andrea Moed | IS56 ANLP

place nickname discovery
Place Nickname Discovery
  • Goal: Discover alternate search terms to surface more diverse local results in web search
  • Method: Regular expression matching

Andrea Moed | IS56 ANLP

place nickname discovery7
Place Nickname Discovery
  • Steps
    • Counted frequency of Yelp-given place name in reviews of that place
    • Tokenized name on whitespace
    • Rule-based generation of candidate nicknames: acronym, subsets of tokens
    • Compared frequencies of given name and each nickname
    • Potentially useful nicknames are those that occur at least half as often as the given name

Andrea Moed | IS56 ANLP

place nickname discovery8
Place Nickname Discovery
  • Results
    • From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each
    • 23 of 61 places appeared to have frequently used nicknames
    • BUT in 9 cases this was due to common words in names
    • First word most commonly used nickname in remaining cases
    • Hypothesis: Long tail of less predictable nicknames

Andrea Moed | IS56 ANLP

ongoing work
Ongoing Work
  • Attraction extraction
    • TF/IDF calculation to find the concepts most widely associated with a place
    • Further text analysis to collect understandings of key concepts
      • Specificity
      • Sentiment
      • Temporality

Andrea Moed | IS56 ANLP

ongoing work10
Ongoing Work
  • Attraction extraction
    • TF/IDF calculation to find the concepts most widely associated with a place
    • Further text analysis to collect understandings around key concepts
      • Specificity
      • Sentiment
      • Temporality

Andrea Moed | IS56 ANLP

ongoing work11
Ongoing Work
  • Classification of reviews: recommendation vs. narrative
    • Recommendations help people “use” a city
    • Narrative is associated with memorable and unique locations
  • Features for classification
    • Verb tense distribution
    • Paragraph breaks
    • Opinion words at beginning and end (recommendation)
    • Memory and relationship words (narrative)

Andrea Moed | IS56 ANLP

future work
Future Work
  • Relating understanding about location features to external data (geocoding, weather)
  • Visualization of extracted concepts
  • Development of a training set for classification

Andrea Moed | IS56 ANLP

ad