Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

Outline • Introduction to the Topic • Seminar Plan • Requirements and Grading

What do we want to do with text? • One of the ultimate goals of natural language processing is to learn a computer to understand text • Text understanding in an open domain is a very complex problem which you cannot possibly solve using a set of hand-crafted rules • Instead essentially all the modern approaches to natural language processing use statistical techniques

Example of Ambiguites … Nissan car and truck plant is located in … … divide life into plant and animal kingdom … … (Article This) (Noun can) (Modal will) (Verb rust ) … The dog bit the kid. He was taken to a veterinarian | hospital). Tiger was in Washington for the PGA tour

NLP Tasks • “Full” language understanding is beyond state of the art and cannot be approached as a single task, instead: • Practical Applications: • Relation extraction, question answering, text summarization, translation, …. • Prediction of Linguistic Representations: • Syntactic parsing, shallow semantic parsing (semantic role labeling), discourse parsing, …

Supervised Statistical Methods • Annotate texts with (structured) labels and learn a model from this data

Supervised Statistical Methods • More formally: • X – text, Y – label (e.g., syntactic structure) • Construct a parameterized model P(Y | X, W) • Estimate W on a collection {(Xi, Yi)}i=1…N : • Maximum likelihood estimation: • Predict a label for new example X:

Supervised Statistical Models • Most task in NLP are complex and therefore large amounts of data are needed • E.g., the standard PennTreebank Wall Street Journal dataset around 40,000 sentences (2 mln words) • Annotation is not just YES or NO, but usually complex graphs • Domain variability: brittle when applied out-of-domain • A question answering model learned on biological data will be bad work on news data • Many languages • Need data: for every language, every domain, every task ? Not feasible for many tasks and very expensive for others

Unsupervised and Weakly-Supervised Models • Virtually unlimited amount of unlabeled text (e.g., on the Web) • Unsupervised Models • Do not use any kind of labeled data • Model jointly P(H, X| W), where H represents interest for the task in question (latent semantic topics, syntactic relations, etc) • Estimation on an unlabeled dataset {Xi}i=1…N: • Maximum Likelihood estimation: Sum over the variable you do not observe

Example: Unsupervised Topic Segmentation Location [The hotel is located on Maguire street, one block from the river. Public transport in London is straightforward, the tube station is about an 8 minute walk or you can get a bus for £ 1.50. ] [We had a stunning view (from the floor to ceiling window) of the Tower and the Thames.][One thing we really enjoyed about this place – our huge bath tub with jacuzzi, this is so different from usually small European hotels. Rooms are nicely decorated and very light.] ... View Rooms • Useful for: • Summarization (summarize multiple reviews along key aspects) • Sentiment prediction (predict star ratings for each aspect) • Visualization • ....

Semi-Supervised Learning • Small amount of labeled data • Large amount of unlabeled data • Define a joint model P(X,Y | W) • Model estimated on both datasets: • Maximum Likelihood estimation Sum over the unobserved variable on unlabeled dataset

Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional information • User-generated annotation Can we learn how to summarized, segment, understand using this information?

Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional annotation • Temporal Relations between documents Can we learn to translate, or port semantic model from one language to another?

Weakly-Supervised Learning (Web) • Texts are not just isolated sequences of sentences • We always have additional annotation • User-Generated annotation • Temporal Relations between documents • Links between documents • Clusters of similar documents • ....... • How useful is it? • Can we project annotated resources from language to language? • Can we improve unsupervised / supervised models? • Hot topic in NLP recently

Why we will consider probabilistic models? • In the class we will focus on (Bayesian) probability models • Why? • They provide a concise way to define model and approximation assumptions • They are like LEGO blocks – we can combine different models as building blocks together to learn a new model for the task • Prior knowledge can be integrated in them in a simple and consistent way • Missing data can be easily accounted for (just some over the corresponding variable) • We saw an example in semi-supervised learning

Goals of the seminar • Understand the methodology: • Classes of models considered in NLP • Approximation techniques for learning and inference • (Exact inference will not be tractable for most of the considered problems) • Learn interesting applications of the methods in NLP • See that sometimes we can substitute expensive annotation with a surrogate signal and obtain good results

Plan • Next class (April 23): • Introduction: • Topic models (PLSA, LDA) • Basic learning / inference techniques: EM and Gibbs sampling • Decide on the paper to present • On the basis of the survey and the number of registered students, I will adjust my list and it will be online on Wednesday • Starting from April 30: paper presentations by you

Topics • Modelling semantic topics of data collections: • Topic segmentation models (including modelling order of topics) • Topic hierarchies • Integrating syntax • Modeling syntax and topics • Shallow models of semantics • Grounded language acquisition • Joint modelling of multiple language • Modelling multiple modes: • Gestures and Discourse • Learning feature representations from text

Requirements • Present a paper to the class • We will see how long the presentations should be depending on the number of students • Write 3 critical “reviews” of 3 selected papers (1.5 - 2 pages each) • A term paper (12-15 pages) for those getting 7 points • Make sure you are registered to the right “version” in HISPOS! • Read papers and participate in discussion

Grades • Class participation grade: 60 % • You talk and discussion after your talk • Your participation in discussion of other talks • 3 reviews (5 % each) • Term paper grade: 40 % • Only if you get 7 points, otherwise you do not need one • Term paper

Presentation • Present a paper in an accessible way • Have a critical view on the paper: discuss shortcomings, possible future work, etc • To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) • Links to the tutorials on how to make a good presentation will be available on the class web-page • Send me your slide 4 days before the talk by 6 pm • If we keep the class on Friday, it means that the deadline on Mon by 6 pm • I will give my feedback within 2 days of receiving

Presentation • Present a paper in an accessible way • Have a critical view on the paper: discuss shortcomings, possible future work, etc • To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) • Links to the tutorials on how to make a good presentation will be available on the class web-page • Send me your slide 4 days before the talk by 6 pm • If we keep the class on Friday, it means that the deadline is Mon, 6 pm • I will give my feedback within 2 days of receiving (The first 2 presenters can send me slides 2 days before if they prefer)

Term paper • Goal • Describe the paper you presented in class • Your ideas, analysis, comparison (more later) • It should be written in a style of a research paper, the only difference is that in this paper most of the work you present is not your own • Length: 12 – 15 pages • Grading criteria • Clarity • Paper organization • Technical correctness • New ideas are meaningful and interesting • Submitted in PDF to my email

Critical review • A short critical (!) essay reviewing one of the paper presented in class • One or two paragraphs presenting the essence of the paper • Other parts underlying both positive sides of the paper (what you like) and its shortcomings • The review should be submitted before its presentation in class • (Exception is the additional reviews submitted for the seminars you skipped, later about it) • No copy-paste from the paper • Length: 1.5 – 2 pages

Your ideas / analysis • Comparison of the methods used in the paper with other material presented in the class or any other related work • Any ideas on improvement of the approach • ....

Attendance policy • You can skip ONE class without any explanation • Otherwise, you will need to write an additional critical review (for the paper which was presented while you were absent)

Office Hours • I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes): • Office 3.22, C 7.4 • Otherwise, send me email and I find the time

Other stuff • Timing of the class • Survey (Doodle poll?) • Select a paper to present and papers to review by the next class (we will use Google docs)

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Presentation Transcript

Algorithms for Distributed Supervised and Unsupervised Learning

Supervised learning for text

Supervised and unsupervised wrapper generation

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Probabilistic Text Generation

Unsupervised Modeling of Twitter Conversations

Supervised learning vs. unsupervised learning

Supervised and Unsupervised learning for Natural language processing

Lab 5 Unsupervised and supervised clustering

Optimizing Average Precision using Weakly Supervised Data

Unsupervised and Semi-Supervised Learning of Tone and Pitch Accent

Classification Supervised and unsupervised

SUPERVISED CLASSIFICATION OF TEXT DOCUMENTS

Unsupervised and Supervised Tracking

Unsupervised Detection of Anomalous Text

Self-supervised Probabilistic Methods for Extracting Facts from Text

Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization

Unsupervised and weakly-supervised discovery of events in video (and audio)

Probabilistic Modeling and Uncertainty

Weakly Supervised Action Recognition

Unsupervised and weakly-supervised discovery of events in video (and audio)