1 / 40

Natural Language Processing

Natural Language Processing. A language is defined as a set of strings without reference to any world being described or task to be performed. By studying the language knowledge about the world is acquired. Acquisition can be in the form of :

Antony
Download Presentation

Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying the language knowledge about the world is acquired. Acquisition can be in the form of : written text , speech/voice, images /patterns etc….. Natural language means a native langauge like Hindi , English, French ,Urdu etc… For a NLP m/c requirement is how to : “Generate , Understand and Translate” By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  2. State of The Art • NLP includes both understanding & generation . • This is a subfield of AI and Linguistics deals with “problems of automated Generation and Understanding of language” • Conversion of computer database info into normal sounding human language. Samples of human language are converted to more formal representations that are easier for computer programs to manipulate. • “NLU is a AI complete problem” “Definition of understanding is a major problem in NLP system” Understanding something is to transform one representation into another By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  3. Entire NLP problem cn be sub-divided as: (i) Processing of written text using Lexical , Syntactic & Semantic Knowledge of language as well as real world information. (ii) Processing spoken language, using all info. Needed plus additional knowledge of “Phonolgy & Ambiguity Resolving ” Idea is to control a m/c by talking them in our native language a interactive manner. This requires firstly to find the underlying task and goal. “Natural language is ambiguous so it leads to difficulty in processing at various levels of Knowledge Domain” Till date human linguistics communication in speech form are used majorly as compared to written text. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  4. NLP methodology and the concerned problem domain have attracted the researchers & educationalist from different areas and discipline of knowledge such : • Classical & Computational Linguistics • Computer Sc. & Engg. • Psycholinguistics • Statistics “Open domain Question & answers are required. Multi document summarization and info. Interaction are required in a wide variety of languages”. Current Problems are : • Ambiguity at written as well as speech level. • Discourse Analysis. • Generation of various degrees of complexities in a Intelligent System. • Knowledge acquisition methods to incorporate data in World Net, Lexicon Methodology , KB system for Multi-Lingual text classification and Hyperlinking. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  5. Why is NLU task difficult ? • Natural language constructs are made up of an infinite no. of sentences. So Much ambiguity in Natural Language Constructs. Levels of Ambiguity • Syntactic ambiguity: Syntax relates to the structure of language , how the word are put together? “Can be more than one correct interpretations for a same sentence”. • E.g : “I hit the man with the hammer”. Was the hammer the weapon used or was it in the hand of the victim? • E.g : Back can be : an adverb (go back) , an adjective (back door), a noun (the back of room) or a verb (back up your files) By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  6. 2.Lexical ambiguity: Ambiguity in lexemes i.e. words having more than one meanings. eg: I went to the bank. Now whether Bank is finance org. or river bank…… 3. Referential ambiguity: Concerned with what the sentence refers to ? It my refer to more than one thing. E. g: “Ram killed Ravana because he liked Sita”. Who liked Sita, (Ram or Ravana) ? 4. Semantic level ambiguity: Ambiguity in meaning associated with a single sentence. • E.g: He saw her duck. Whether he dip down or saw a web footed bird. • Semantic ambiguity can also occur if no lexical /syntactic ambiguity E.g : A sentence “cat person” can be someone who likes felines…. or it may be the lead of movie ” Attack of the cat people”. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  7. 5. Pragmatic ambiguity: Level of interpretation within its context i.e. a same word /phrase may be interpreted differently in two distinct contexts/situations. E.g: “I went to the doctor yesterday “. Here yesterday depends on the context , when the sentence was spoken . Example: (i) I waited for a long time at the bank. (ii) There is a drought because it hasn’t rained for a long time. (iii) Dinosaurs have been extinct for a long time. “In above three sentences phrase a long time refers to different time intervals depending on their context”. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  8. Levels of Knowledge used in NLU • Phonological Knowledge: “Phoneme is the smallest unit of sound and relates to the sound of word”. This may lead to phonetic ambiguity in speech recognition system due to different accent used by different people from different parts/region. Syntactic Knowledge:How words are arranged together to form a coherent , grammatically correct sentence. Semantic Knowledge:Relates to the meaning of the word/phrases & how they combine to form a meaningful sentence. Morphological Knowledge: Word construction fromMorphemes. Pragmatic Knowledge: Relates to the use of sentences in different contexts & how contexts affects meaning of sentence. Word Knowledge: Language of the user to carry out conversation. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  9. Computational Model of Language Processing ** Naom Chomsky developed the theory of language processing. ** Designed Chomsky Classification/Chomsky Grammar • Syntactic Analysis • Semantic Analysis • Pragmatic Analysis • Morphological analysis • Discourse Integration ** Discourse is any string of language ususally one that is more than one sentence long. Eg: text books , novels, Web page , weather reports etc…. • Meaning of a sentence may depend on preceding as well as up coming words & phrases. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  10. E.g: “ Ram wanted it ”. ** In this sentenceit depends on the prior dicourse, like a CAR which Ram wants to purchase. ** Where as in “he purchased the car”, a next coming sentence , he is influenced by Ram in the previous sentence. Note: • This type of interpretation is of a PRONOUN/DEFINITE NOUN PHRASE which refers to the world object/entity/Agent. • Choosing the best referent is a process of disambiguation, depending on combining variation in Syntactic , Semantic & Pragmatic info. • Pronouns must agree in gender and number with their antecedents : he can refer to Bobby not Arisha. they can refer to a group , not a single person By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  11. An Example Sentence • Arisha dropped the cup on the plate. • Above sentence pose a problem that “Not clear whether cup /plate is referent ofit (ambiguity at referential level). Now consider a larger context: Arisha was fond of the blue cup. The cup was presented to her by her mother. Unfortunately, one day while washing utensils, Arisha dropped the cup on the plate. It broke. Here cup is the focus of attention and hence is the referent (Ambiguity resolved) By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  12. Syntactic processing & Formal Grammars • Parsing /Syntax analysis Two components (i) declarative representation, called grammar, of syntactic facts about the language. (ii) A procedure called a parser , that compares the grammar against i/p sentences to produce parsed structure. Formal Language: “ Infinite set of strings”. Each string is concatenation of terminal symbols, also called words. e.g: Java, First order predicate logic, C, C++ etc. These languages have strict mathematical definitions as compared to natural language like Hindi , English. Formal Grammar: G= { V, T , S , P } • V is the set of variables or non-terminals .Usually written in Upper Case • T is the finite set of terminals or lexemes or tokens, (Lower Case) • S is the start symbol of grammar rules. • P is the set of productions of the form By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  13. Key Points for Natural Language Grammar (e.g: English) • Most grammar rule formalisms are based on the idea of phrase structure i.e. strings are composed of sub strings called phrases Example : Noun Phrase (NP) , Verb phrase (VP) , Prepositional Phrase (PP) , Adverb Phrase (ADVP)…… Here NP, VP , PP , ADVP are all Non terminals/variables of formal grammar for a English sentence. • Other non –terminals can be Noun (N) , Verb (V) , Preposition (P) , Articles (ART) , Determiners (DET like a , an , the ). ART and DET can be used interchangeably. • Terminals/Lexemes/Tokens can be words like: a , an , the , Ram , Joseph, run , upon , into , put ,good , long , very , fast, etc………infinitely By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  14. Example: “Joseph ate the chicken” Grammar rules of G: • S → NP VP • NP → ART N • PP → PREP NP • VP → V | NP | V NP PP | V PP • N → Ram | Joseph | tree | tea | road | chicken • V → ate | walk | drink | sit • AUXV → is | am | are | was | were • PREP → with | under| into | on • ART → a | an | the V= { S, NP , VP , PP , PREP , ART , N , V , AUXV }, set of non terminals T = { Joseph , ate , the chicken } S is start symbol of grammar G. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  15. Top down Parsing S → NP VP → N VP → Joseph VP → Joseph V NP → Joseph ate ART N → Joseph ate the N → Joseph ate the chicken Bottom up Parsing → Joseph ate the chicken → N ate the chicken → N V the chicken → N V ART chicken → N V ART N → NP V NP → NP VP → S Top down & Bottom up parsing Parsing Techniques By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  16. O/P representation structure Parser I/P String LEXICON • to find the meaning of a word , parser access to lexicon. • While selecting a word from i/p stream parser locates the word in lexicon • Extracts possible meanings , attributes , syntax , semantics of that word. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  17. “ Lexicon is the dictionary words (like morphemes, tokens , lexemes, phonemes) containing syntactic , semantic , pragmatic knowledge “ Organization & enteries of lexicons vary from one implementation to another. Usually made up of variable length data structures such as lists, dynamic arrays, arranged in alphabetical order Depending upon usage frequency of words (e.g : a , an , the , to , by ,of , from etc…) lists can be initialized with these words to minimize the search time for locating lexemes. Access of words can be facilitated by : Indexing Binary search Hashing By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  18. Knowledge Based System Approaches in NLP 1. SHRDLU • System developed by Winograd at MIT in 1970’s • Controls a robot in a restricted “Blocks ” domain. • No. of blocks of various shapes , size , colors, textures. • Robot can manipulate the blocks world as per instructions given in natural language. Example: Instructions can be • Find a block which is taller than the one you are holding & place it in the box. Refer. Ambiguity. It refers to what?) • How many blocks are on the top of the green block? (Semantic ambiguity) 3. Put the red pyramid on the block in the box. (Syntactic Ambiguity, either block is in the box or red pyramid) By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  19. 2. Information matching & Extraction • Knowledge based system extraction/machine learning methods are deployed for rapid prototyping techniques and incorporating data acquisition. • Set of events , objects & their attributes built a Word Model. • Supports inheritance and transforms word model to Discourse model specific to a particular text. 3. Machine Translation • Began in 1950s….Norbert Weiner translated Russian script to English • IBM also worked on this…. • IBM introduced statistical approach to language & parameter estimation in m/c translation through Mathematical Models…… E.g: Hidden Markov Model (HMM), Boolean keyword model , probabilistic model based on Bayesian Classification By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  20. Machine Translation Approaches Rule Based Transl. Corpus based transl. Direct m/c translation knowledge Based Transl. Interlingua Based m/c Translation Transfer based m/c Translation. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  21. Direct Machine Translation • This carries out word by word translation with the help of a bilingual dictionary, usually followed by some syntactic arrangement • Monolithic Approach is followed i.e “Consider all the details of one language pair”. • Little analysis of source text required , no parsing. Lexical transfer using Bilingual dictionary Source text Morphological Text Target language text Local reordering By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  22. Corpus based m/c translation(CBMT) • Also called data driven translation • Overcomes the problem of knowledge acquisition in Rule Based m/c Translation (RBMT). • Uses bilingual parallel corpus to obtain knowledge for new incoming translation. • Fully automated , less human intervention as in RBMT Statistical Machine Translation (SMT) • Uses bilingual corpus to learn translation models • Uses monolingual corpus to learn the grammar of the target language. • SMT models are trained on a sentence aligned translation corpus which is based on : 1.) n- gram modeling and 2.) probability distribution of some target language pair in a very large corpus. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  23. Maximize Probabilities From Models Transl. model Bilingual Corpus P(S/T) Tranl. Result Language Model Monolingual Corpus P(T) T is target language, S is source language, Translation Probability P(S/T) , P(T) is target language probability. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  24. Advantages of SMT • No knowledge of linguistics required, so saves cost and time in knowledge acquisition from the Domain Experts 2). Expertise transfer is minimize. 3). Fast and less costly as compared to DMT. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  25. Input ES Intelligent Computing Model from English to Sanskrit m/c Translation tokenizer POS target module Adverb Conversation table module GNP detection module By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  26. Tense & sentence detection module From GNP module Sanskrit rule detection ANN based system Roop , Dhaatu detection Noun & object detection Dhaatu form generation Word form generation By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  27. Adverb conversation From word form From dhaatu form Output Sanskrit Source Concatenation of Kartaa , adjective , karma , adverb , verb • GNP module detects the gender , number & person of Noun in the English sentence • Noun & object detection module gives nouns for Sanskrit of equivalent English noun. • RoopDhaatu module gives verbs for Sanskrit of equivalent verbs. • ANN is a feed forward n/w , performs: Encoding of user data vector(UDV) , I/O generation of UDV & finally Decoding of UDV. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  28. Computer Vision What is Computer Vision? • “Computing properties of the 3D world from one or more digital images” • Sockman and Shapiro:To make useful decisions about real physical objects and scenes based on sensed images • Ballard and Brown:The construction of explicit, meaningful description of physical objects from images • Forsyth and Ponce: Extracting descriptions of the world from pictures or sequences of pictures By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  29. What is in this image? 1. A hand holding a man? 2. A hand holding a mirrored sphere? 3. An Escher drawing? Interpretations are ambiguous The forward problem (graphics) is well-posed By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  30. Changing viewpoint Moving light source Deforming Shape What do you see? By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  31. Changing viewpoint Moving light source Deforming shape What was happening? By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  32. Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who’s doing what) movie post-processing face recognition Various deep and attractive scientific mysteries How does object recognition work? Beautiful marriage of math, biology, physics, engineering Greater understanding of human vision Why study Computer Vision? By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  33. Some Objectives Segmentation • Breaking images and video into meaningful pieces • Reconstructing the 3D world – from multiple views – from shading – from structural models Recognition • What are the objects in a scene? • What is happening in a video? • Control • Obstacle avoidance • Robots, machines, etc. By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  34. Football Movies Surveillance HCI – hand gestures, American Sign Language Face recognition & Biometrics Road monitoring Industrial inspection Robotic control Autonomous driving Space: planetary exploration, docking Medicine – pathology, surgery, diagnosis Microscopy Military Remote Sensing Applications: Touching your life By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  35. Image Interpretation - Cues • Variation in appearance in multiple views – stereo – motion • Shading & highlights • Shadows • Contours • Texture • Blur • Geometric constraints • Prior knowledge By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  36. ILLumination Variability “The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.” By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  37. Early Vision in One Image • Representing small patches of image – For three reasons • We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points • Sharp changes are important in practice --- known as “edges”. • Representing texture by giving some statistics of the different kinds of small patch present in the texture. E.g : “Tigers have lots of bars, few spots while Leopards are the other way” By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  38. Segmentation • Which image components “belong together”? • Belong together=lie on the same object • Cues – similar color – similar texture – not separated by contour – form a suggestive shape when assembled By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  39. Boundary Detection: Local cues By: AnujKhanna(Asst. Prof.) www.uptunotes.com

  40. Boundary Detection Finding the Corpus Callosum By: AnujKhanna(Asst. Prof.) www.uptunotes.com

More Related