1 / 19

Seminar: Efficient Natural Language Processing

Seminar: Efficient Natural Language Processing. Overview of Topics. Overview. The following is a short ! o verview of each topic to give you a rough idea Exact definition of the problem : your task Some topics leave room for suggestions from your side

rafal
Download Presentation

Seminar: Efficient Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar: Efficient Natural Language Processing Overviewof Topics

  2. Overview • The followingis a short!overviewofeachtopictogiveyou a roughidea • Exactdefinitionoftheproblem: yourtask • Sometopicsleaveroomforsuggestionsfromyourside • In general: weareinterested in solutionsthatprovidereasonablequality AND efficiency (speed)

  3. 1. POS Tagging • POS = part-of-speech (“word category”) • POS tagging: assign each word in a sentence its word-category • Example:

  4. 2. Text Chunking • Task of chunking a text into phrases that contain “syntactically related words” • syntactically related words: noun-phrases, verb-phrases, … • Example: • “(SBARWhile) (PPin) (NPPrague) (NPhe) (VPmet) (NPAlbert Einstein) (PPfor) (NPthefirsttime) (O.)“

  5. 3. ClauseIdentification • Task of “dividing text into clauses” • Clausesusuallycontainsubjectandpredicate • Clauses form a hierarchicalstructure (a clausecancontainanotherclause) • Example: • “(Coach them in (handling complaints) (so that (they can resolve problems immediately)).)”

  6. 4. Entity Recognition • Identify entites(persons, locations, organizations …) in a text, based on a list of known entities • Example: • “[ENT1 Steve Jobs] was the CEO of [ENT2 Apple], situated in [ENT3 California].”

  7. 5. SemanticRoleLabeling • “recognize arguments of verbs in a sentence, and label them with their semantic role” • Example: • “[A0Chaplin] added [A1Eric Campbell, Henry Bergman, and Albert Austin] [A2to his stock company].” • Arguments of“added“: • Subject (A0): “Chaplin“ • Object (A1): “Eric Campbell, Henry Bergman, and Albert Austin” • IndirectObject(A2): “to his stock company”

  8. 6. ConstituentParsing • Recursivelyidentify all grammaticalparts/constituentsof a sentence • Output thesyntaxtree (constituenttree) • Example:

  9. 7. DependencyParsing • Foreachwordof a sentenceidentifyits „head“ - theword in thesentenceitdepends on • Output is a (di)graph • Example:

  10. 8. Link GrammarParsing • Link Grammar: a specialkindofgrammarexpressingrelationshipsbetweenthewordsof a sentence • Example: • “MV connects verbs (and adjectives) to modifying phrases like adverbs” -> “added” modified by “to his stock company” +--------------------------------------Xp-----------------------------------------+ | +-----------------MVp-----------------+--------Jp-------+ | | +------Os-----+ | +------D*u-----+ | +---Wd---+---Ss--+ +---G--+ | | +---AN---+ | | | | | | | | | | | LEFT-WALL Chaplin added.v Eric Campbell and Albert Austin to his stock.ncompany.n .

  11. 9. LTAG Parsing • LTAG = LexicalizedTree-AdjoiningGrammar • Tree-adjoininggrammar: a grammarthatconsistsoftrees • Parsing: treeoperations • Resultsimilarto a constituent parse

  12. 10. Text Simplification • Simplifycomplexsentencebyapplyinglexicalandsyntacticaloperations • Main motivation: maketextreadableforhumanswithreadingproblems (aphasics) • Other motivation: shortersentencesareeasier/fasterto parse “Mr. Anthony, who runs an employment agency, decries program trading, but he isn’t sure it should be strictly regulated.” “Mr Anthony runs an employment agency. MrAnthony decries program trading. But he isn’t sure it should be strictly regulated.”

  13. 11. Machine Learning • Importantmachinelearningtechniquesfor NLP • especially Support Vector Machines • Mapinstancesintovectorspace • Find hyperplane separatingclasses • andperceptrons • Train a neuralnetworktoclassifyexamples

  14. 12. QuestionAnswering • Answer a questiongiven in naturallanguage • WhatisthecapitalofMongolia? DEMO

  15. 13. EntityRetrieval • Answer a factoidquestionwith a listofentities • What cities in Germany have more than 100.000 inhabitants? DEMO

  16. 14. Apache UIMA • UIMA = Unstructured Information Management Applications • Aframeworkfor NLP applicationsproviding well-definedinterfaces • Managesthedataflowbetweencomponents • Usedby“Watson“, theIBM computerplayingtheJeapordy! Competition

  17. 15. NLP Applications • Anyapplicationrelyingheavily on NLP, likeContentus • Contentus: a researchprojectsponsoredbythegermangovernment • Goal: • Digitaliseandanalyzeinformation (text, pictures, audio, video … archived in libraries) • Providing an efficientwayofthematicresearch indiscovered information

  18. List of Topics • POS tagging • Text chunking • Clauseidentification • Entityrecognition • Semanticrolelabeling • Constituentparsinging • Dependencyparsing • Link grammarparsing • LTAG parsing • Text simplification • Machinelearning • Questionanswering • Entityretrieval • Apache UIMA • NLP application

  19. References • Barbella, D., Benzaid, S., Christensen, J. M., Jackson, B., Qin, X. V., and Musicant, D. R. 2009. Understanding support vector machine classifications via a recommender system-like approach. In Int. Conf. on Data Mining. 305–311. • Burges, C. 1999. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. • Carreras, X. 2005. Learning and inference in phrase recognition: A filteringranking architecture using perceptron. Ph.D. thesis, UniversitatPolitècnicade Catalunya. • Carreras, X. and Màrques, L. 2004. Introduction to the conll-2004 shared task: Semantic role labeling. In Proceedings of CoNLL-2004. Boston, MA, USA, 89–97. • Chandrasekar, R., Doran, C., and Srinivas, B. 1996. Motivations and methods for text simplification. In Proceedings of the 16th conference on Computational linguistics - Volume 2. COLING ’96. Association for Computational Linguistics, Stroudsburg, PA, USA, 1041–1044. • Sang, E. F. T. K. and Déjean, H. 2001. Introduction to the conll-2001 shared task: Clause identification. Computing Research Repository cs.CL/0107. • Siddharthan, A. 2003. Syntactic simplification and text cohesion. Ph.D. thesis. • Tjong Kim Sang, E. F. and Buchholz, S. 2000. Introduction to the conll-2000 shared task: chunking. In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7. ConLL ’00. Association for Computational Linguistics, Stroudsburg, PA, USA, 127–132.

More Related