1 / 28

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text. Ivan Titov. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A. Outline. Introduction to the Topic Seminar Plan Requirements and Grading. What do we want to do with text?.

bruno-rojas
Download Presentation

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

  2. Outline • Introduction to the Topic • Seminar Plan • Requirements and Grading

  3. What do we want to do with text? • One of the ultimate goals of natural language processing is to learn a computer to understand text • Text understanding in an open domain is a very complex problem which you cannot possibly solve using a set of hand-crafted rules • Instead essentially all the modern approaches to natural language processing use statistical techniques

  4. Example of Ambiguites … Nissan car and truck plant is located in … … divide life into plant and animal kingdom … … (Article This) (Noun can) (Modal will) (Verb rust ) … The dog bit the kid. He was taken to a veterinarian | hospital). Tiger was in Washington for the PGA tour

  5. NLP Tasks • “Full” language understanding is beyond state of the art and cannot be approached as a single task, instead: • Practical Applications: • Relation extraction, question answering, text summarization, translation, …. • Prediction of Linguistic Representations: • Syntactic parsing, shallow semantic parsing (semantic role labeling), discourse parsing, …

  6. Supervised Statistical Methods • Annotate texts with (structured) labels and learn a model from this data

  7. Supervised Statistical Methods • More formally: • X – text, Y – label (e.g., syntactic structure) • Construct a parameterized model P(Y | X, W) • Estimate W on a collection {(Xi, Yi)}i=1…N : • Maximum likelihood estimation: • Predict a label for new example X:

  8. Supervised Statistical Models • Most task in NLP are complex and therefore large amounts of data are needed • E.g., the standard PennTreebank Wall Street Journal dataset around 40,000 sentences (2 mln words) • Annotation is not just YES or NO, but usually complex graphs • Domain variability: brittle when applied out-of-domain • A question answering model learned on biological data will be bad work on news data • Many languages • Need data: for every language, every domain, every task ? Not feasible for many tasks and very expensive for others

  9. Unsupervised and Weakly-Supervised Models • Virtually unlimited amount of unlabeled text (e.g., on the Web) • Unsupervised Models • Do not use any kind of labeled data • Model jointly P(H, X| W), where H represents interest for the task in question (latent semantic topics, syntactic relations, etc) • Estimation on an unlabeled dataset {Xi}i=1…N: • Maximum Likelihood estimation: Sum over the variable you do not observe

  10. Example: Unsupervised Topic Segmentation Location [The hotel is located on Maguire street, one block from the river. Public transport in London is straightforward, the tube station is about an 8 minute walk or you can get a bus for £ 1.50. ] [We had a stunning view (from the floor to ceiling window) of the Tower and the Thames.][One thing we really enjoyed about this place – our huge bath tub with jacuzzi, this is so different from usually small European hotels. Rooms are nicely decorated and very light.] ... View Rooms • Useful for: • Summarization (summarize multiple reviews along key aspects) • Sentiment prediction (predict star ratings for each aspect) • Visualization • ....

  11. Semi-Supervised Learning • Small amount of labeled data • Large amount of unlabeled data • Define a joint model P(X,Y | W) • Model estimated on both datasets: • Maximum Likelihood estimation Sum over the unobserved variable on unlabeled dataset

  12. Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional information • User-generated annotation Can we learn how to summarized, segment, understand using this information?

  13. Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional annotation • Temporal Relations between documents Can we learn to translate, or port semantic model from one language to another?

  14. Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional annotation • User-Generated annotation • Temporal Relations between documents • Links between documents • Clusters of similar documents • ....... • How useful is it? • Can we project annotated resources from language to language? • Can we improve unsupervised / supervised models? • Hot topic in NLP recently

  15. Why we will consider probabilistic models? • In the class we will focus on (Bayesian) probability models • Why? • They provide a concise way to define model and approximation assumptions • They are like LEGO blocks – we can combine different models as building blocks together to learn a new model for the task • Prior knowledge can be integrated in them in a simple and consistent way • Missing data can be easily accounted for (just some over the corresponding variable) • We saw an example in semi-supervised learning

  16. Goals of the seminar • Understand the methodology: • Classes of models considered in NLP • Approximation techniques for learning and inference • (Exact inference will not be tractable for most of the considered problems) • Learn interesting applications of the methods in NLP • See that sometimes we can substitute expensive annotation with a surrogate signal and obtain good results

  17. Plan • Next class (April 23): • Introduction: • Topic models (PLSA, LDA) • Basic learning / inference techniques: EM and Gibbs sampling • Decide on the paper to present • On the basis of the survey and the number of registered students, I will adjust my list and it will be online on Wednesday • Starting from April 30: paper presentations by you

  18. Topics • Modelling semantic topics of data collections: • Topic segmentation models (including modelling order of topics) • Topic hierarchies • Integrating syntax • Modeling syntax and topics • Shallow models of semantics • Grounded language acquisition • Joint modelling of multiple language • Modelling multiple modes: • Gestures and Discourse • Learning feature representations from text

  19. Requirements • Present a paper to the class • We will see how long the presentations should be depending on the number of students • Write 3 critical “reviews” of 3 selected papers (1.5 - 2 pages each) • A term paper (12-15 pages) for those getting 7 points • Make sure you are registered to the right “version” in HISPOS! • Read papers and participate in discussion

  20. Grades • Class participation grade: 60 % • You talk and discussion after your talk • Your participation in discussion of other talks • 3 reviews (5 % each) • Term paper grade: 40 % • Only if you get 7 points, otherwise you do not need one • Term paper

  21. Presentation • Present a paper in an accessible way • Have a critical view on the paper: discuss shortcomings, possible future work, etc • To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) • Links to the tutorials on how to make a good presentation will be available on the class web-page • Send me your slide 4 days before the talk by 6 pm • If we keep the class on Friday, it means that the deadline on Mon by 6 pm • I will give my feedback within 2 days of receiving

  22. Presentation • Present a paper in an accessible way • Have a critical view on the paper: discuss shortcomings, possible future work, etc • To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) • Links to the tutorials on how to make a good presentation will be available on the class web-page • Send me your slide 4 days before the talk by 6 pm • If we keep the class on Friday, it means that the deadline is Mon, 6 pm • I will give my feedback within 2 days of receiving (The first 2 presenters can send me slides 2 days before if they prefer)

  23. Term paper • Goal • Describe the paper you presented in class • Your ideas, analysis, comparison (more later) • It should be written in a style of a research paper, the only difference is that in this paper most of the work you present is not your own • Length: 12 – 15 pages • Grading criteria • Clarity • Paper organization • Technical correctness • New ideas are meaningful and interesting • Submitted in PDF to my email

  24. Critical review • A short critical (!) essay reviewing one of the paper presented in class • One or two paragraphs presenting the essence of the paper • Other parts underlying both positive sides of the paper (what you like) and its shortcomings • The review should be submitted before its presentation in class • (Exception is the additional reviews submitted for the seminars you skipped, later about it) • No copy-paste from the paper • Length: 1.5 – 2 pages

  25. Your ideas / analysis • Comparison of the methods used in the paper with other material presented in the class or any other related work • Any ideas on improvement of the approach • ....

  26. Attendance policy • You can skip ONE class without any explanation • Otherwise, you will need to write an additional critical review (for the paper which was presented while you were absent)

  27. Office Hours • I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes): • Office 3.22, C 7.4 • Otherwise, send me email and I find the time

  28. Other stuff • Timing of the class • Survey (Doodle poll?) • Select a paper to present and papers to review by the next class (we will use Google docs)

More Related