1 / 11

Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson

Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu. Goal. Create a specialized search engine for events in the Seattle area. Enable search by Location Date and Time Price Category Enable full-text search. The Internet.

elias
Download Presentation

Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu

  2. Goal • Create a specialized search engine for events in the Seattle area. • Enable search by • Location • Date and Time • Price • Category • Enable full-text search

  3. The Internet Nutch Crawler Event Extraction Event Indexing Geocoding Event Database Web Service Event Classification Web Application

  4. Reality • Site-specific extractors used rather than machine learning approach • Classifier is tuned for the events from specifically chosen websites rather than the full web. (overfitted, but in a good way) • Email notifications not implemented. • Less dynamic web interface.

  5. Demo http://amlia.cs.washington.edu:1987/eventfinder/search/

  6. What We Found Surprising • CRF++ limited in extracting attributes • CRF extracts multiple values for one attribute from an event description • If only extract from a paragraph of description, in most cases, some attribute are not contented in. • Use Site-specific structures • Extracted attributes values from corresponding Html tags • No more ambiguity • Can extract all the attributes from the each event page • Javascript's negative impact • Some sites not fully loading after the first Http Request • URLs generated by Javascript functions could not be retrieved

  7. Classifier Experiments • Variables / Results • Number of Categories: 8 • Training Data Source: Crawler Data • Scaled vs. Non-scaled Training Data: Scaled • Single Words vs. Word Pairs as attributes: Single Words Only • Scaled LIBSVM attribute values? Yes • Lower Bound: 0 • Upper Bound: 1 • Weights • URL Words: 50 • Title Words: 15 • Location Words: 10 • Tests (32 Recorded Tests) • Ablation: Turning On/Off features • Tuning: Adjusting Variable Values

  8. Usability Experiment • Survey • Participants went to the website and completed 3 tasks • Task completion and overall feedback • Only three submissions due to server load • Results • All found site navigable • Participants used list view, map view and individual event pages • Some results not relevant

  9. What We Learned • Information Retrieval / Extraction/Indexing: • Gained more experience with regex, Java servlet technology, and working with open-source projects. • Classifier: • Many methods of classification and related variables. Deciding on a classification method and setting variables can be as much of an art form as a science. • Front End: • Frameworks (Django) are very powerful but it takes a long time to learn. Watch your resources!

  10. Breakdown of Work • Justin - Classifier and Database • Jessica and Jenn - Front end • Sophia - Extraction, Nutch, Lucene, Database • Chuan - Extraction

  11. Questions?

More Related