1 / 10

Improving ACE Performance

Improving ACE Performance. Edward Loper Seth Kulick. {. {. Person Person Location. (NE) (Nom) (NE). Soc. At. The ACE Task. John met his son at the beach. Detect and classify entities Person, Geo-Political Entity, Organization, Facility, Location Entity Types:

winter
Download Presentation

Improving ACE Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving ACE Performance Edward Loper Seth Kulick

  2. { { PersonPersonLocation (NE) (Nom)(NE) Soc At The ACE Task John met his son at the beach. • Detect and classify entities • Person, Geo-Political Entity, Organization, Facility, Location • Entity Types: • Named Entities: Cisco, George Washington • Nominals: a large crowd, a quaint library • Pronoun: he, it • Detect relations between entities • At, Role, Near, Social, Part

  3. The U. Penn ACE System • A rapidly developed IE system • Built using TIDES-PennTools • Pipelined Architecture • Easy to construct from existing components • Easy to plug in new components • Statistical Components • Require less hand-tuning • Easy to improve with new training data

  4. Tokenizing/Preprocessing Input File NE Tagging Parsing Nominal Tagging Relation Extraction Coreference Output File

  5. Improving the ACE System • Improve Pipeline Components • Add new features to existing models • Replace Pipeline Components • New machine learning techniques • Generate New Training Data • Active learning (WordFreak) • Improve the Architecture • Wide Pipeline architecture

  6. Improving Components • Use more informative features • Use features based on richer annotation • PropBank roles • Use PropBank roles as features to improve relation detection. • SuperTAGs • Use supertags instead of part of speech tags, to improve the detection and classification of named entities and nominals.

  7. Improving the Architecture • Disadvantages of a simple pipelined architecture: • Interaction between stages is limited • If one stage produces incorrect output, later stagescan’t recover. • Wide Pipeline architecture - Each component generates multiple weighted outputs. • Increased interaction between stages • Later stages can re-rank the earlier outputs. • We have built a prototype wide pipeline system • NE Classification only

  8. Replacing Components • Using improved ML algorithms, can we get better results with less training data? • Ryan McDonald implemented a NE tagger using Conditional Random Fields (CRF). • Outperforms our system’s Maxent NE tagger. • Experiment: Integrating the CRF tagger • Replace the Maxent NE tagger with a CRF tagger. • Exclude BBN training data (about 1/3 of the data) • Evaluate the changes in overall system performance

  9. Integrating CRF: Results Entity Scores Relation Scores • The CRF tagger significantly improves NE detection, giving a higher entity score. • Better NE detection allows the system to find more relations, giving a higher relation score. Maxent Maxent +BBN CRF Maxent Maxent +BBN CRF

  10. Conclusions • The architecture of the ACE System allows for: • Rapid improvement • Concurrent development • We are working to improve the system… • By improving the existing components. • By adding more sophisticated components. • By improving our training data with active learning. • By improving the basic system architecture.

More Related