1 / 9

GATE technical workshop: introduction gate.ac.uk/ nlp.shef.ac.uk/ Hamish Cunningham Sheffield, March 17/18, 2004

GATE technical workshop: introduction http://gate.ac.uk/ http://nlp.shef.ac.uk/ Hamish Cunningham Sheffield, March 17/18, 2004. Agenda. Thursday (G30) 10.30: API, CREOLE lifecycle, java for jape [1] (vt) 12.00: break 12.15: tests, writing, running; API etc. [2] (hc, vt) 1.30: lunch

rex
Download Presentation

GATE technical workshop: introduction gate.ac.uk/ nlp.shef.ac.uk/ Hamish Cunningham Sheffield, March 17/18, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GATE technical workshop: introduction http://gate.ac.uk/http://nlp.shef.ac.uk/ Hamish Cunningham Sheffield, March 17/18, 2004

  2. Agenda • Thursday (G30) • 10.30: API, CREOLE lifecycle, java for jape [1] (vt) • 12.00: break • 12.15: tests, writing, running; API etc. [2] (hc, vt) • 1.30: lunch • 2.30: corpora, evaluation tools (dm, kb) • 3.00: machine learning (vt) • 4.00: break • 4.15: ontologies (kb) • 5.15: wrapup • 5.30: close • Wednesday (G22) • 10.15: arrival, setup • 10.30: introductions, summary of background / skills • 10.40: mission, conventions, internal pages, GATE intro (hc) • 11.30: tools: cvs, jbuilder, tkdiff, building GATE (vt) • 12.00: break • 12.15: intro to the GUI (dm) • 1.30: lunch • 2.30: annie, jape (dm) • 4.00: break • 4.15: summary of projects (hc) • 5.30: close 2(9)

  3. Blah • mission • conventions • mailing lists • roles and responsibilities 3(9)

  4. Eight years old (!), with 000s of users at 00s of sites An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, a graphical development environment. Some free components... ...and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. Free software (LGPL). Download at http://gate.ac.uk/download/ GATE (the Volkswagen Beetle of Language Processing) is: 4(9)

  5. GATE team projects. Past: Conceptual indexing: MUMIS: automatic semantic indices for sports video MUSE, cross-genre entitiy finder HSL, Health-and-safety IE Old Bailey: collaboration with HRI on 17th century court reports Multiflora: plant taxonomy text analysis for biodiversity research e-science EMILLE: S. Asian language corpus ACE/ TIDES: Arabic, Chinese NE JHU summer w/s on semtagging Present: Advanced Knowledge Technologies: €12m UK five site collaborative project ETCSL: Sumerian digital library MiAKT: medical informatics / AKT SEKT: Semantic Knowledge Tech PrestoSpace: AV Preservation KnowledgeWeb; h-TechSight Thousands of users at hundreds of sites. A representative sample: the American National Corpus project the Perseus Digital Library project, Tufts University, US Longman Pearson publishing, UK Merck KgAa, Germany Canon Europe, UK Knight Ridder, US BBN (leading HLT research lab), US SMEs: Melandra, SG-MediaStyle, ... Imperial College, London, the University of Manchester, UMIST, the University of Karlsruhe, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities UK and EU projects inc. MyGrid, CLEF, dotkom, AMITIES, CubReporter, Poesia... A bit of a nuisance (our users) 5(9)

  6. Architectural principles • Non-prescriptive, theory neutral (strength and weakness) • Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of Protégé, Jena, Weka...) • (Almost) everything is a component, and component sets are user-extendable • (Almost) all operations are available both from API and GUI 6(9)

  7. CREOLE: a Collection of REusable Objects for Language Engineering: GATE components: modified Java Beans with XML configuration The minimal component = 10 lines of Java, 10 lines of XML, 1 URL Why bother? Allows the system to load arbitrary language processing components All the world’s a Java Bean.... 7(9)

  8. OBIE ANNIE … ADiff DocVR OntolVR ... Application Layer IDE GUI Layer (VRs) XMLDocumentFormat RTF docs XML docs HTML docs email PDF docs Corpus Document HTMLDocumentFormat DocumentContent AnnotationSet PDFDocumentFormat Annotation TRs NE POS Co-ref FeatureMap TEs … … Processing Layer (PRs) Corpus Layer (LRs) DocumentFormatLayer (LRs) … XML Oracle PostgreSql .ser DataStore Layer GATE APIs Onto-logy ProtégéOnto-logy Word- net Gaz-etteers ... Language Resource Layer (LRs) • NOTES (2) • eg: Protégé LR & VR both wrapped in Res. (bean) API • ontology repositories and inference should be the same: KAON + Sesame + Orenge + ? • NOTES • everything is a replaceable bean • all communication via fixed APIs • low coupling, high modularity, high extensibility 8(9)

  9. Happy Birthday Valy! 9(9)

More Related