1 / 65

The Semantics and Pragmatics of Natural Language Daniela G ÎFU

“AL EXANDRU I OAN CUZA” UNIVERSIT ATY OF IAŞI FACULT Y OF COMPUTER SCIENCE. The Semantics and Pragmatics of Natural Language Daniela G ÎFU http://profs.info.uaic.ro/~daniela.gifu/. Course 2 & 3. SPNL OVERVIEW. https://profs.info.uaic.ro/~daniela.gifu/. Who am I?.

seery
Download Presentation

The Semantics and Pragmatics of Natural Language Daniela G ÎFU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU http://profs.info.uaic.ro/~daniela.gifu/

  2. Course 2 & 3 SPNL OVERVIEW

  3. https://profs.info.uaic.ro/~daniela.gifu/ Who am I?

  4. “Alexandru Ioan Cuza” University of Iași THE HALL OF THE LOST STEPS

  5. Faculty of Computer Science BE AMONG THE FIRST…..

  6. What is this course about? • Meaning and Natural Language Processing (NLP) • Computational Semantics • Computational Pragmatics

  7. Familiarization with relevant Terminology • Semantics • Pragmatics • Natural language • Computational Linguistics • Natural Language Processing • …

  8. Language Sapir-Wharf Hypothesis

  9. Simulationof human (natural) intelligence by machines A discipline that spans theory and practice to understand computer systems and networks at a deep level. Interdisplinary field ~ Scientific study of language from a computational perspective

  10. Computational Linguistics (CL) vs. Natural Language Processing (CLP)

  11. The research domain CL= gives theoretical background (computational theories on language), linguistics models. NLP= applied CL, including: - natural language technology (NLT) - human language technology (HLT)

  12. Natural language technology Spoken language - speech processing (from speech to text to syntax and semantics to speech) Written language – my area of interest Language in correlation with other modalities (multimodality) - speech - intonation - image

  13. Written language technologies Document segmentation and interpretation – cleaning (elimination of dots, enhancing contrast, etc.) – separation of text from image, curved lines... – recognizing printed, semi-uncial characters, etc. • Optical Character Recognition (OCR) ~100% accuracy in scanning printed Latin script based material Challenge in OCR Students?

  14. OCR Handwriting – Why? = presents some unique particularities = many varieties of cursive writing see: http://www.cvisiontech.com/library/ocr/accurate-ocr/ocr-handwriting-sp-914996830.html

  15. OCR Handwriting very challenging = the interpretation of physician handwriting (Rasmussen, L.V. et al., 2012; Broda. B. & Piasecki, M., 2007) = analysis of old handwritten documents (useful for linguists, musicians, historians, etc.) Document Image Analysis

  16. Written language technologies • Analysis and understanding of written language • – sub-syntactic processing • • lexical units • • sentence splitting • • clause borders • • part of speech and morphological information • • lemmas • • entity names • • groups (nominal, verbal, prepozitional, etc.) and lexical attractions (colocations)

  17. Written language technologies • Language analysis and understanding • – semantic and discourse processing • • semantic disambiguation → word senses • • semantic roles labeling • • rhetorical structure of discourse and dialogue • • anaphora resolution • • text summarization

  18. Mathematical Linguistics the study of mathematical structures and methods that are of importance to linguistics. → Phonetics, → Phonology, → Morphology, → Syntax, and → Semantics, → and… Sociolinguistics → Language Acquisition. Mathematical Linguistics before Computational Linguistics…. ML ⇔ CL? 18

  19. = art of solving problems that need to analyze (or generate) natural language text. Find that metrics for a good solution to the engineering problem… NLP Google Translate – Don’t blame!!!! Romanian = Luceafărul de dimineață English = The morning gentleman (bad answer) = Morning star (good answer) Why???? explains how human translators do their job... Let’s try!

  20. NLP – a subdomain of Artificial Intelligence & Linguistics • Thematic Areas • Linguistics - mathematical linguistics - computational linguistics • Formal Language • Linguistic and Language Processing • The grammatical structure of utterances: the sentence, constituents, phrase, classifications and structural rules, syntactic processing ... • Parser • Semantics& Pragmatics

  21. = an area of Artificial Intelligence (AI) devoted to creating computers that use NL as input and/or output. NLP AI-hard problem = machine reading comprehension = produces language as output on the basis of data input

  22. = developing computational methods/models of human linguistics behavior. CL • INFORMATION RETRIEVAL • INFORMATION EXTRACTION • MACHINE TRANSLATION • QUESTION – ANSWERING • SUMMARIZATION • MACHINE READABLE DICTIONARIES • SPELLING & GRAMMAR CHECKERS • …

  23. CL – Applications • A discipline concerned with understanding written and spoken language from a computational perspective. • - detecting synonymy (Grigonytė et al., 2010); • - developing WordNet (including Romanian- Gala et Mititelu, 2013), (Iftene and Balahur, 2007)...; • WSD(Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,2002)...; • semantic annotation(Garcia et al., 2012)...; • reconstructing a diachronic morphology (Cristea et al., 2007/2012) • diachronic text classification(Mihalcea and Năstase, 2012; Popescu and Strapparava, 2015), etc. • epoch detection (Gifu, 2015/2016/2017) ...; Tools developed by students… 23

  24. Linguistic & Language Processing • 1. Linguistics • Science of language. Includes: • Sounds (phonology) • Word formation (morphology) • Sentence structure (syntax) • Meaning (semantics) and understanding (pragmatics)… • 2. Levels of linguistic analysis • Higher level → Speech Recognition (SR) • Lower levels → Natural Language Processing (NLP)

  25. Levels of Linguistic Analysis Speech Recognition Acoustic signal Phonetics – production and perception of speech Phonemes Phonology – Sound patterns of language Letters - strings Lexicon – Dictionary of words in a language Morphemes Morphology – Word formation and structure Words Syntax – Sentence structure Phrases & sentences Semantics – Intended meaning NLP Meaning out of context Pragmatics – Understanding from external info Meaning in context

  26. NLP Pipeline Course purpose

  27. MAIN CONCEPTS • 1. Natural Language • used by human beings for communication... • sign, system, symbols, rule-set (or grammar) • 2. Semantics • literal meaning determined from a word, phrase, sentence. • 3. Pragmatics • contextual meaning {situation, speaker, etc.}

  28. Natural or ordinary language • A system of speech symbols → (form criterion) • Types: • a) speech (spoken language) • - produced by articulate sounds. • b) signing (written language) - the representation of a spoken or gestural language. • The most important means of human communication → (function criterion)

  29. Natural Language… • Multiplicity of languages

  30. Formal language I* • 1. Symbol • a character, an abstract entity that has no meaning by itself • Ex: lettters, digits and special characters • 2. Alphabet • finite set of symbols • often denoted by Σ • Ex: • B = {0, 1} says B is an alphabet of two symbols, 0 and 1 • C = {a, b, c} – C an alphabet of 3 symbols, a, b and c * More about formal language: http://www.its.caltech.edu/~matilde/FormalLanguageTheory.pdf

  31. Formal language II • 3. String or word • a finite sequence of symbols from an alphabet • Ex: • 01110 and 111 are strings from the alphabet B above • aaabccc and b are strings from the C above • 4. Sentence • astring of words. • Ex: I saw the gentleman with the hat. • String = a b c d e b f

  32. Formal language III Define possible relations of parts of a string to each other? A. [I] saw the gentleman [with the binocular] = [a] b c d [e b f] B. I saw [the gentleman with the binocular] = a b [c d e b f ] We can represent structures with trees… Ex: I saw the gentleman with the binocular. I saw the gentleman with the binocular.

  33. Formal language IV • 5. Language • a set of strings of symbols from an alphabet. • 6. Natural Language or ordinary language • open-ended = built on three different knowledge components: the sound of words - phonology; the meaning of words - semantics; the grammatical rules according to which words are put together - syntax. • 7. Formal language • a set L of sequences/strings over some finite alphabet Σ • described using formal grammars (a set of rules for strings, specified to it). • many application (e.g. Prognosis wearable system)

  34. Formal language V Context-Free Grammars (CFG) - a finite set of grammar rules https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm = a quadruple (N, T, P, S) , where: N = a finite set of non-terminal symbols (character or variable). Note! Each n ∈ N = type of phrase/clause in the sentence. T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N:   N ∩ T = NULL. P = a finite set of (rewrite) rules or productions of the grammar, from N to P: N → (N ∪ T)* Note! The left-hand side of the production rule P does have any right context or left context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or characters→ a set N is written as N*(used for regular expressions). Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...} -{ε} (the language consisting only of the empty string) S = start symbol/start symbol, used to represent the whole sentence.

  35. Do you know other Grammars?

  36. Variations of Chomsky’s hierarchy, 1956 https://commons.wikimedia.org/wiki/File:Chomsky-hierarchy.svg

  37. Traffic Light – Visual Syntax, Semantics and Pragmatics S E M I O T I C S see Woo, C. W. H. (2010). Visual Syntax, Semantics and Pragmatics: Structure, Meaning & Context.(PPT). Retrieved from Universities Brunei Darussalam.

  38. Main Concepts - Examples • 1. Syntax = the proper ordering of words • Grammars, parsers, parse trees, etc. • 2. Semantics • Semantic classes, ontologies, formal semantics, etc. • 3. Pragmatics • Pronouns, reference resolution, discourse models, etc.

  39. Computational Semantics NLP vs. CL What can semantics do for NLP? What can computation do for theoretical models of NL semantics?

  40. Computational Semantics Automating Language Comprehension 1. Automate the process of associating NL expression with semantic representations (known as logical forms) 2. Automate the process of interpreting those SRs and drawing inferences from them.

  41. Computational Semantics Challenges • Unlimited number of NL expressions! • * The semantic representation of each phrase = a function of the SRs of its syntactic parts. 2. Tension between expressibility, inferential power & complexity. * No perfect solution (see Tarski). People always tailor logic to the application. Note: Focus on FOL (first-order logic) = formulas of predicate (https://www.cl.cam.ac.uk/teaching/1011/L107/semantics.pdf

  42. Big challenge – Ambiguity! Main Concepts - II A semantic scope ambiguity…. Every woman loves a man. ∀x(woman(x)→ ∃y(man(y)∧loves(x,y))) ∃y(man(y)∧ ∀x(woman(x)→loves(x,y))) … and its interaction with anaphora = NP (pronoun, definite NP, proper name) Every student worked on a project. It was about computational semantics. Every politician made a speech. It was about terrorism. Students?

  43. Main Concepts - II Other challenges – Combinatorics Constructing Lexical Functions (LF) directly from the NL’s syntax means that the quantifier scope ambiguity must correspond to a syntactic ambiguity. Every woman loves a man. → 2 unintuitive parses • 6 quantifiers • Unsophisticated interaction with pragmatics • Generate all possible LFs • Filter out inadmissible ones

  44. Main Concepts - II An alternative – Underspecified Semantics Use syntax to accumulate a set of constraints on the structure of the logical form. • A partial description of trees such as these… ∃ ∀ x y man woman ∃ ∀ man woman love love x y x y y x x y y x

  45. The Underspecified Logical Form Main Concepts - II • This description is satisfied by 2 trees: • l4 = l2 and l5 = l3 • l4 = l3 and l5 = l1 l1 : ∀ l2 : ∃ l5 l4 x y woman man ∀ ∃ woman man x y y x x y l3 : love x y

  46. More Challenges: Semantic Dependencies between an NL Phrase and its Context Main Concepts - II Pronouns Robert owns a house. It is orange. wrong: ∃x(house(x) ∧ own(r, x)) ∧ orange(y) complex construction: ∃x(house(x) ∧ own(r, x) ∧ orange(x)) Time Eva entered the room. She lit a lantern. It was red dark. Presuppositions David's son is bald. If baldness is hereditary, then David's son is bald. If David has a son, then David's son is bald.

  47. Main Concepts - II Dynamic Semantics e.g. Discourse Representation Theory (DRT) ~ “the discourse context” • The meaning of an expression/sentence depends on its context. • An expression changes that input context into a different output one: • Existentials change the context by adding new entities to it for interpreting subsequent expressions. • The result of interpretation is a new context.

  48. Main Concepts - II DRT* – A successfully theory Pronouns (anaphora) A man walks. He talks. Few farmers own a donkey. It's fed twice a day. Tense ~ grammatical form Clarke stood up. John greeted him. Max entered the room. It was pitch dark. Presuppositions (if… than…) If baldness is hereditary, then David's son is bald. If David has a son, then David's son is bald. Propositional attitudes (belief, desire, imagination = mental states) Robert beliefs that Dana likes him. * More about DRT: https://plato.stanford.edu/entries/discourse-representation-theory/#RepAttCom

  49. Main Concepts - II Problems???? ... Need Pragmatics! Counterexamples I. John can open Obama's safe. He knows the combination. II. David fell. Max pushed him. If Max scuba dives, he'll bring his son. vs. If Max scuba dives, he'll bring his regulator. Note: Need to resolve semantic underspecification to pragmatically preferred values.

  50. Computational Pragmatics The semantics / pragmatics interface Main Concepts - II Pragmatics = the study of what people meant, but didn’t explicitly say. • Linguistic form underdetermines content; • Pragmatics: commonsense reasoning about the context provides more specific content: • Lexical content • World knowledge • Conventions of language use • Beliefs and intentions of dialogue participants • The process of constructing the intended LF involves defaults. Interaction between context and interpretation must be automated.

More Related