1 / 13

I256: Applied Natural Language Processing

I256: Applied Natural Language Processing. Marti Hearst October 18, 2006. Community-based Summarizer. Results on training data with cross-validation?. Community-based Summarizer. Results on test data:. Problems with Community Code. Not reading the instructions: Hardcoding directory paths

gagnonj
Download Presentation

I256: Applied Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I256: Applied Natural Language Processing Marti Hearst October 18, 2006

  2. Community-based Summarizer • Results on training data with cross-validation?

  3. Community-based Summarizer • Results on test data:

  4. Problems with Community Code • Not reading the instructions: • Hardcoding directory paths • Hardcoding filenames of testing files • Here is an easy way to do it generally: import os files = os.listdir(“dirname”) • So the code should take two parameters: • Directory name containing the documents • Filename in which to write the output

  5. Problems with Community Code • Not reading the instructions: • Hardcoding directory paths within the code • Hardcoding filenames of testing files • Here is an easy way to do it generally: import os files = os.listdir(“dirname”) • So the code should take two parameters: • Directory name containing the documents • Filename in which to write the output

  6. Problems with Community Code • What I did wrong: • Had said in class that the files should be self-contained but didn’t put that into the assignment description. • Should have said explicitly that you should take as input a directory name and an output filename. • Should have made an easy way to indicate if external files were needed, and what they were. • Should have added another task: analyze the individual features contribution.

  7. Final Projects • I’d like proposals in two weeks (Nov 1) • Gives me a week to give you feedback • We’ll spend about 5 weeks on the projects • I want to give you one or two more homeworks • Class presentations the week of Dec 5, but projects due the following week • You can work in teams of 2 (maybe 3, depends on the project)

  8. Final Project Ideas • Blog analysis • Categorize blog topics (maybe including link analysis) • Segment blogs into pieces based on topics • Do blog author analysis • Summarize blog reaction to some event, e.g., what did people think of “An Inconvenient Truth” • There is a contest on this: • http://www.icwsm.org/ • Do analysis as input for an interesting viz: • http://benfry.com/linking/

  9. Final Project Ideas • Analyze the accuracy of best-paper awards* • Often given out for conferences • How prescient are these awards?

  10. Final Project Ideas • Create a Negativity/Emotion/Flame Recognizer • There is some related work, but this is somewhat under-explored

  11. Final Project Ideas • Improve an Automatic Faceted Hierarchy Creation Tool* • Students used this two years ago for making a hierarchy for photo text • Sample output on two collections: • http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/recipes-automated/Flamenco • http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/recipes-automated/Flamenco

  12. Final Project Ideas • Analyze profiles for online dating* • Use characteristics from social psychology to score them • Use other metrics as well.

  13. Final Project Ideas • Work on a timeline comparison project • One idea: use output of the new Google news archive • Create input for a visualizer built by students last semester: • http://www2.sims.berkeley.edu/courses/is247/f05/projects/timelinecompare/

More Related