1 / 38

The CHAOS Project: Theory and Practice

The CHAOS Project: Theory and Practice. Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma “Tor Vergata”. People. INVESTIGATORS Roberto Basili Fabio Massimo Zanzotto Maria Teresa Pazienza FORMER CONTRIBUTORS Daniele Pighin Daniele Previtali

lynnea
Download Presentation

The CHAOS Project: Theory and Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The CHAOS Project:Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma “Tor Vergata”

  2. People • INVESTIGATORS • Roberto Basili • Fabio Massimo Zanzotto • Maria Teresa Pazienza • FORMER CONTRIBUTORS • Daniele Pighin • Daniele Previtali • Alessandro Bahgat • Marco Pennacchiotti • Massimo Di Nanni • Michele Vindigni • Luigi Mazzucchelli • Paola Velardi • Paolo Zirilli • Alessandro Cucchiarelli • Alessandro Marziali • Fabrizio Grisoli • Gianluca De Rossi

  3. Outline • Theory: Customizable parsing architectures • XDG: eXtended Dependency Graph • Task oriented parsing design • Practice: System Implementation and Use • A component-based approach • An object-oriented platform • Linguistic data • Processing modules • How to use the parser in an application • Demo!!!

  4. Theory Customizable parsing architectures

  5. Motivation • The Chaos Project unofficially began in ’96 • … on the long tradition of ARIOSTO (Basili, Pazienza, Velardi) @ the University of Rome “Tor Vergata” (RTV) • Aim • building robust parsers for Italian and for English • that use verb sub-categorization (syntactic) lexicons induced from corpora • that can be used in applications • Constraints • use the long tradition @ RTV • “Social” background • Microtheories for microphenomena • Language analysis can be reduced to a cascade of modules (e.g., FSA) • Application-oriented language anaysis (e.g., IE) • Robust (formely, shallow) parsing approaches

  6. Inf(S1) Inf(S2) Motivation contribute-NP-PP(to) value-NP-PP(at) [ Mr. Gaubert] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]

  7. Motivation (found on vinyl supports) • Different NLP applications have different performance constraints in term of: • Accuracy • Throughput • Customizable parsing architectures are reusable in different application scenarios if: the architectural design supports performance control

  8. Customizable parsing architectures (found on vinyl supports) Modularization • clarifies the interdependency between different syntactic information (grammatical/lexicalized) • allows to control • throughput via eliciting modules • quality via a clear relation between modules (prerequisites/contributions)

  9. Modular approach • Syntactic parser SP(S,K)=I  SP(S)=I • Syntactic parsing module: Pi(Si,Ki)=Si+1 Pi(Si)=Si+1 • Modular syntactic parser SP = Pn... P2P1

  10. Modular approach • To push a modular approach we need: • a suitable annotation scheme • a classification of the processing modules

  11. A suitable annotation scheme • Requirements: • Modularization • a stable representation of partially analyzed structures • Lexicalization • a clear representation of the (semantic) head of a given structure able to activate the lexicalized rule

  12. XDG: Extended Dependency Graph • XDG combines constituency and dependency based formalisms XDGGD=(C,D) C = {(c,t,h)|cS,t,hc} D = {(c1,c2,t)| c1,c2C, t} • Nice property: allow to store persistent ambiguity (for interpretations projected by the same nodes)

  13. XDG: Extended Dependency Graph • C are constituents • syntactic head • potential semantic governor • D are dependencies among constituents

  14. Classification of parsing modules Pi(XDGi,Ki)=Pi(XDGi)=XDGi+1 • The classification is performed according to: • the type of information K used • how they manipulate the sentence representation

  15. Task oriented parsing design • Given: • The NLP application requirementsR • The test-bed T • A pool of parsing modules PM • The designing activity is: • The research of a combination of the parsing modulesPM that fits R on the T

  16. NLP application requirements • Target phenomena: es. VP_PP, NP_PP, etc • Metrics: • Recall R per sentence • Precision P per sentence • F-measure per sentence

  17. Dependencies Clauses Chunks NPK VPK PPK NPK VPK NNS TO VB IN NNS PRP MD VB POS CHAOS: Levels of Analysis Strategies to use with questions you cannot answer

  18. Inf(S1) Inf(S2) Verb dependencies and Clause Boundaries contribute-NP-PP(to) value-NP-PP(at) [ Mr. Gaubert] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]

  19. Verb dependencies and Clause Boundaries contribute-NP-PP(to) value-NP-PP(at) Inf(S1) Inf(S2) [ Mr. Gaubert] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]

  20. Verb dependencies and Clause Boundaries contribute-NP-PP(to) value-NP-PP(at) Inf(S1) Inf(S2) [ Mr. Gaubert] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]

  21. Verb dependencies and Clause Boundaries • The algorithm: • Initial Hypoteses: • Minimal boundaries of the clauses in the sentence • Derived Hierarchy • Until all verbs have not been analyzed: • Take the rightmost not analyzed verb v: • Take the lexicalized rules R(v) for the verb v • Find the dependencies of • Augment the clause boundaries

  22. Practice System Implementation and Use

  23. A Computational Framework • Object-oriented backbone • Objects for the different data • Objects for the different sub-processes • Linguistic sub-processors as libraries • Coexisting languages: Java, C++, C, Prolog

  24. System implementation • A component-based approach • An object-oriented platform • Linguistic data • Textual entities: Text, Paragraphs • XDG • Linguistic processors

  25. A Component-based Approach Advantages: • Computational efficiency • Rapid prototyping • Integration of different technologies • Easy reuse

  26. Linguistic processors

  27. Linguistic processors • Tokenizer, Complex Tokenizer • Dictionary lookup modules • Yellow page look-up • Morphology analyzer • Name Entity Recognition • Part-of-speech tagging • Chunker • Verb shallow analyzer • Shallow analyzer

  28. Linguistic modules • Each process is encapsulated in an object • initialize() • Load lexicons and rules (general or domain specific) • finalize() • Dismiss the process rules and lexicons • run() • Enrich the input with the contributes of the process

  29. Linguistic processors Microtheories for microphenomena • Each processor implements its own theory: • It has its language for describing rules • It is written in its own programming language

  30. Processor: Yellow page look-up, Morphology analyzer Dictionary compra comprare d(a) v.tran.sempl 2.sing.imper.pres ~:u:~ compra comprare d(a) v.tran.sempl 3.sing.ind.pres ~:u:~ comprai comprare d(a) v.tran.sempl 1.sing.ind.pass_rem ~:u:~ comprammo comprare d(a) v.tran.sempl 1.plur.ind.pass_rem ~:u:~ compran comprare d(a) v.tran.sempl 3.plur.ind.pres ~:u:~ comprando comprare d(a) v.tran.sempl geru.pres ~:u:~ comprano comprare d(a) v.tran.sempl 3.plur.ind.pres ~:u:~

  31. Processor: Chunker Rules … constituent_class([_cst1, _cst2, _cst3], 'VerFin', _mor, 1, 3):- verb_finite(_cst1), verb_to_have(_cst1), verb_past_particle(_cst2), verb_to_be(_cst2), verb_past_particle(_cst3), common_morfology(_cst1,_mor). …

  32. Processor: Verb Shallow Analyser Sub-categorization lexicon … pattern(comprare,[ [(oggetto,Post),(per,Post)], [(oggetto,Post),(da,Post),(per,Post)], [(oggetto,Post),(a,Post),(per,Post)],[(oggetto,Post)]]). pattern(comprendere,[[(oggetto,Post)],[],[(oggetto,Post)]]). pattern(comprimere,[[(oggetto,Post)],[(oggetto,Post)]]). pattern(compromettere,[[(con,Post)],[(oggetto,Post)]]). pattern(comunicare,[[], [(con,Post)], [(a,Post)], [(oggetto,Post),(a,Post)],[(oggetto,Post)]]). …

  33. Implemented Italian Shallow Grammar • Constituent Categories • Part-of-Speech Tags • Chunk Types • Dependency Categories • Dependency Categories over Chunk Types

  34. A survival user guide • Version stand-alone: • chaosparser -h • Version client-server: • chaosserver –h • chaosclient –h • XDG editor and actual gui: • choasgui

  35. Using CHAOS in applications • In JAVA applications: ConfigurationHandler.initialize(); ConfigurationHandler.parseKBPropFile(“LANGUAGE”,”KB”); Parser ms = new Parser(); ms.initialize(); • In Non-JAVA applications: • Using one of the possible output forms: • XDG in Xml • XDG in Prolog • XDG in QLF (in prolog)

  36. Perspective • Building a statistical Italian parser • Increasing the Itailan annotated corpora • Reusing existing corpora • TUT • SITAL • VIT

  37. Tools • XDG editor • DEMO!!!! • Syntactic annotation transformer

  38. People • INVESTIGATORS • Roberto Basili • Fabio Massimo Zanzotto • Maria Teresa Pazienza • FORMER CONTRIBUTORS • Daniele Pighin • Daniele Previtali • Alessandro Bahgat • Marco Pennacchiotti • Massimo Di Nanni • Michele Vindigni • Luigi Mazzucchelli • Paola Velardi • Paolo Zirilli • Alessandro Cucchiarelli • Alessandro Marziali • Fabrizio Grisoli • Gianluca De Rossi

More Related