1 / 6

Place Expressions: Use of Gazetteer DB in Annotation

Place Expressions: Use of Gazetteer DB in Annotation. Beth Sundheim SPAWAR Systems Center, San Diego beth.sundheim@navy.mil. What is being annotated?. Two pertinent efforts AQUAINT study (completed):

juana
Download Presentation

Place Expressions: Use of Gazetteer DB in Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Place Expressions: Use of Gazetteer DB in Annotation Beth Sundheim SPAWAR Systems Center, San Diego beth.sundheim@navy.mil DRAFT – not for public release

  2. What is being annotated? • Two pertinent efforts • AQUAINT study (completed): • A text mention of a place name is annotated with the unique ID of a gazetteer entry corresponding to the mention’s intended sense • ACE END task (in planning stages): • An ACE entity of type LOC or GPE is annotated with the corresponding gazetteer ID from the external END DB • If no corresponding entry exists, the entity is annotated with the ID of a new DB entry that captures the entity’s info on the place Notes: • AQUAINT study was manual annotation. The gazetteer is the Integrated Gazetteer Data Base (IGDB), which merges 4 source gazetteers • ACE END (Entity Normalization and Disambiguation) involves 2 of 3 place types (omits FAC) and a non-overlapping subset of IGDB DRAFT – not for public release

  3. Example Text: The Russian Interior Minister announced today that over two and a half tons of explosives have been seized in various parts of Russia since the explosion in a square in downtown Moscow last Tuesday. A Muscovite bomb expert said that the explosion, in an underpass beneath Pushkin Square in that city, was caused by a 1.3 kg TNT time bomb. … It was officially announced that the explosion in central Moscow last Tuesday resulted in 120 casualties. Output: Six ACE “place” entities; 2 GPEs (not the FAC for Pushkin Sq.) are included in the END task (have named mention in doc): Russia entity seed db attribute -> IGDB place entry #12345 (primary name = Russian Federation) Moscow entity seed db attribute -> IGDB place entry #67890 (primary name = Moscow) DRAFT – not for public release

  4. ACE END Status (no stats yet!) • Task parameters decided • Language: English • Domain/Genre: news (for pilot annotation, at least) • Corpora: ACE (LDC-provided) • Seed DB construction is nearing completion • An initial annotation tool (Callisto-based) is being prepared • Exploratory pilot annotation is planned • No funding has yet been identified to support production annotation DRAFT – not for public release

  5. Annotation Stats from AQUAINT • Ground truth data in form of 18,900 annotated names in topically and geographically diverse corpora. ITA between 2 annotators on portion of the data: • 95.3% F-measure agreement on “link-or-no-link” decision • 87%-99% agreement on “which-link”, depending on gazetteer • No stats on annotation speed (would be misleading anyway, since annotators did more than just annotate the linkage) DRAFT – not for public release

  6. Gazetteer DB Annotation Uses • Cross-doc IE and QA are major drivers (gazetteer DB provides attributes that help determine coreference and containment relations across a corpus); should also be useful for multidoc summ. • Any real-life application that requires geospatial grounding of textual place entities • Advanced user interaction in Q&A • Note: Not necessarily just for English docs/questions • Note: Similar points could be made re use of DBs of organization/person/artifact/etc. entities! DRAFT – not for public release

More Related