1 / 15

Introduction to CL

Introduction to CL. Session 1: 7/08/2011. What is computational linguistics?. P rocessing natural language text by computers for practical applications ... or linguistic research Among practical applications Sometimes the computer only needs to classify or transform the text

marika
Download Presentation

Introduction to CL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to CL Session 1: 7/08/2011

  2. What is computational linguistics? • Processing natural language text by computers • for practical applications • ... or linguistic research • Among practical applications • Sometimes the computer only needs to classify or transform the text • ... but sometimes it needs to “understand” • Ex: Watson: winner of ‘Jeopardy’ • CL vs. NLP (natural language processing)

  3. NLP applications • Automatic speech recognition (ASR): speech  text • Machine translation (MT): L1  L2 • Information retrieval (IR): Query + documents  a subset of doc • Information extraction (IE): document  “database”

  4. NLP applications (cont) • Question answering (QA): Question + documents  Answer • Summarization: documents  summary • Natural language generation (NLG): representation  text

  5. Other Applications • Call Center • Spam filter • Spell checker • Sentiment analysis: product reviews • Bio-NLP: processing clinical data • ….

  6. Basic NLP tasks: Shallow processing • Tokenization: • He visited New York in 2003. • Morphological analysis: • visited  visit + -ed • Part-of-speech tagging • He/Pron visited/V New/?? York/N in/Prep 2003/CD • Name-entity tagging • He visited [LOCATION New York] in [YEAR 2003] • Chunking • [NP He] [V visited] [NP New York] in [NP 2003]

  7. Basic NLP tasks: Deep processing • Parsing • (S (NP (PRON he)) (VP (V visited) ….) • Semantic analysis • Semantic tagging: [AGENT He] visited [DEST New York] …. • Meaning: visit (he, New-York) • Discourse • Co-reference: “He” refers to “John” • Discourse structure • Dialogue • Generation

  8. Ambiguity • Phonological ambiguity: (ASR) • “too”, “two”, “to” • “ice cream” vs. “I scream” • “ta” in Mandarin: he, she, or it • Morphological ambiguity: (morphological analysis) • unlockable: [[un-lock]-able] vs. [un-[lock-able]] • Syntactic ambiguity: (parsing) • John saw a man with a telescope. • Time flies like an arrow.

  9. Ambiguity (cont) • Lexical ambiguity: (WSD) • Ex: “bank”, “saw”, “run” • Semantic ambiguity: (semantic representation) • Ex: every boy loves his mother • Ex: John and Mary bought a house • Discourse ambiguity: • Susan called Mary. She was sick. (coreference resolution) • It is pretty hot here. (intention resolution) • Machine translation: • “brother”, “cousin”, “uncle”, etc.

  10. Ambiguity resolution • Rule-based or knowledge-based: • Parsing: • I saw a man with a hat • I saw a man with a telescope (in my hand) • WSD: • “bank” • MT: • “brother”, “cousin”, “uncle” • Statistical approach: • Require training data • Build a statistical model • Knowledge and rules can be incorporated into the model as features etc.

  11. Major approaches to NLP • Rule-based approach • Statistical approach • Supervised learning • Semi-supervised learning • Unsupervised learning

  12. Supervised learning algorithms • Hidden Markov Model (HMM) • Decision tree • Decision list • Naïve Bayes • Transformation-based Learning (TBL) • Maximum Entropy (MaxEnt) • Support Vector Machine (SVM) • Conditional Random Field (CRF) • …

  13. Data • Raw text: • Monolingual: English/Chinese/Arabic Gigawords • Parallel data: UN data, EuroParl • Treebank: • Syntactic treebanks: a set of parse trees • Proposition Bank: • Discourse Treebank • Dictionaries • WordNet • FrameNet • …

  14. Task1 ML1 ML2 D1 D2 D_n Applications Task2 Task_i … … ML_m …

  15. The role of linguistics knowledge in NLP • An NLP system is language-independent. • Good or bad? • Good: it can be ported to many languages without any changes. • Bad: it cannot take advantage of properties of certain languages. • How to incorporate (linguistic) knowledge in statistical systems? • the design of models • as features • as filters • …  Building a treebank is an effective way.

More Related