1 / 44

Machine Learning in Practice Lecture 12

Machine Learning in Practice Lecture 12. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Plan for the Day. Announcements Assingment 5 handed out – Due next Thur Note: Readings for next two lectures on Blackboard in Readings folder

hazina
Download Presentation

Machine Learning in Practice Lecture 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeLecture 12 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Plan for the Day • Announcements • Assingment 5 handed out – Due next Thur • Note: Readings for next two lectures on Blackboard in Readings folder • See syllabus for specifics • Feedback on Quiz 4 • Homework 4 Issues • Midterm assigned Thur, Oct 21!!! • More about Text • Term Weights • Start Linguistic Tools

  3. Assignment 5

  4. Assignment 5

  5. Assignment 5 * 2 examples, but there are many more

  6. TA Office Hours • Possibly moving to Wednesdays at 3 • Note that there will be a special TA session before the midterm for you to ask questions

  7. ? How are we doing on pace and level of detail? 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check 33% Check

  8. Feedback on Quiz 4

  9. Feedback on Quiz 4 • Nice job overall!!! • I could tell you read carefully!  • Note that part-of-speech refers to grammatical categories like noun, verb, etc. • Named entity extractors locate noun phrases that refer to people, organizations, countries, etc. • Some people skipped they why and how parts of questions • Some people over-estimated the contribution of pos-tagging

  10. Homework 4 Issues

  11. Error Analysis

  12. Error Analysis

  13. Error Analysis

  14. Error Analysis

  15. Error Analysis • If I sort by different features, I can see whether rows of a particular color end up in a specific region • If I want to know which features to do this with, I can start with the most predictive features • Another option would be to use machine learning to predict which cell an instance would end up in within the confusion matrix

  16. Other Suggestions • Use RemoveMisclassified • unsupervised attribute filter • Separates correctly classified instances from incorrectly classified instances • Works in a similar way to the remove folds filter • Only need to use it twice rather than 20 times for 10-fold cross validation • Doesn’t give you as much information

  17. Computing Confidence Intervals • 90% confidence interval corresponds to z=1.65 • 5% chance that a data point will occur to the right of the rightmost edge of the interval • f = percentage of successes • N = number of trials •  • f=75%, N=1000, c=90% -> [0.727,0.773]

  18. Term Weights

  19. Document Retrieval/ Inverted Index Doc# Freq Word Positions Stemmed Tokens DocFreq Total Frequ … ** Easy to find all documents that have terms in common with your query. ** Stemming allows you to retrieve Morphological variants (run, runs, running, runner) ** Word positions allow you to specify that you want two terms to appear within N words of each other

  20. Evaluating Document Retrieval • Use standard measures: precision, recall, f-measure • Retrieving all documents that share words with your query will both over and under generate • If you have the whole web to select from, then under-generating is much less of a concern than over generating • Does this apply to your task?

  21. Common Vocabulary is Not Enough • You’ll get documents that mention other senses of the term you mean • River bank versus financial institution • Word sense disambiguation is an active area of computational linguistics! • You won’t get documents that discuss other related terms

  22. Common Vocabulary is Not Enough • You’ll get documents that mention a term but are not about that term • Partly get around this by sorting by relevance • Term weights approximate a measure of relevance • Cosine similarity between Query vector and Document vector computes relevance – then sort documents by relevance

  23. Computing Term Weights • A common vector representation for text is to have one attribute per word in your vocabulary • Notice that Weka gives you other options

  24. Why is it important to think about term weights? • If term frequency or salience matters for your task, you might lose too much information if you just consider whether a term ever occurred or not • On the other hand, if term frequency doesn’t matter for your task, a simpler representation will most likely work better • Term weights are important for information retrieval because in large documents, just knowing a term occurs at least once does not tell you whether that document is “about” that term

  25. Basics of Computing Term Weights • Assume occurrence of each term is independent so that attributes are orthogonal • Obviously this isn’t true! But it’s a useful simplifying assumption • Term weight functions have two basic components • Term frequency: How many times did that term occur in the current document • Document frequency: How many times did that term occur across documents (or how many documents did that term occur in)

  26. Basics of Computing Term Weights • Inverse document frequency: a measure of the rarity of a term • Idft = log(N/nt) where t is the term, N is the number of documents, and nt is the number of documents where that term occurred at least once • Note that inverse document frequency is 0 in the case that a term occurs in all documents • It approaches 1 for very rare terms

  27. TF.IDF – Term Frequency/Inverse Document Frequency • A scheme that combines term frequency with inverse document frequency • Wt,d = tft,d X idft,d • Weka also gives you the option for normalizing for document length • Since terms are more likely to occur in longer documents just by chance • You can then compute the cosine similarity between the vector representation for a text and that of a query

  28. Computing Term Weights • Notice how to set options for different types of term weights

  29. Trying Different Term Weights • Predicting Class1, 72 instances • Note that the number of levels for each separate feature is identical for Word Count, Term Frequency, and TF.IDF.

  30. Trying Different Term Weights • What is different is the relative weight of the different features. • Whether this matters depends on the learning method

  31. Linguistic Tools

  32. Basic Anatomy: Layers of Linguistic Analysis • Phonology: The sound structure of language • Basic sounds, syllables, rhythm, intonation • Morphology: The building blocks of words • Inflection: tense, number, gender • Derivation: building words from other words, transforming part of speech • Syntax: Structural and functional relationships between spans of text within a sentence • Phrase and clause structure • Semantics: Literal meaning, propositional content • Pragmatics: Non-literal meaning, language use, language as action, social aspects of language (tone, politeness) • Discourse Analysis: Language in practice, relationships between sentences, interaction structures, discourse markers, anaphora and ellipsis

  33. Sentence Segmentation • Breaking a text into sentences is a first step for processing • Why is this not trivial? • In speech there is no punctuation • In text, punctuation may be missing • Punctuation may be ambiguous (i.e., periods in abbreviations) • Alternative approaches • Rule based/regular expressions • Statistical models

  34. Tokenization • Segment a data stream into meaningful units • Each unit is called a token • Simple rule: a token is any sequence of characters separated by white space • Leaves punctuation attached to words • But stripping out punctuation would break up large numbers like 5,235,064 • What about words like “school bus”

  35. Automatic Segmentation

  36. Automatic Segmentation • Run a sliding window 3 symbols wide across the text • Some features from outside the window used also for prediction • Each position classified as a boundary or not • Boundary between 1st and 2nd position

  37. Automatic Segmentation

  38. Automatic Segmentation

  39. Automatic Segmentation

  40. Automatic Segmentation • Features used for prediction: • 3 symbols • Punctuation • Whether there have been at least two capitalized words since the last boundary • Whether there have been at least three non-capitalized words since the last boundary • Whether we have seen fewer than half the number of symbols as the average segment length since the last boundary • Whether we have seen fewer than half the average number of symbols between punctuations since the last punctuation mark

  41. Automatic Segmentation • Model trained with decision tree learning algorithm • Percent accuracy: 96% • Agreement: .44 Kappa • Precision: .59 • Recall: .37 • We assign 66% as many boundaries as the gold standard

  42. Stemmers and Taggers • Stemmers are simple morphological analyzers • Strip a word down to the root • Run, runner, running, runs: all the same root • Next week we will use the Porter stemmer, which just chops endings off • Taggers assign syntactic categories to tokens • Words assigned potential POS tags in the lexicon • Context also plays a role

  43. Wrap-Up • Feature space design affects classification accuracy • We examined two main ways to manipulate the feature space representation of text • One way is through alternative types of term weights • Also using linguistic tools to identify features of texts beyond just the words that make them up • Part-of-speech taggers can be customized with different tag sets • Next time we’ll talk about the tag set you will use in the assignment • We will also talk about parsers • You can use parsers to create features for classification

More Related