1 / 23

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera , zanoli{@fbk.eu} Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project.

amiel
Download Presentation

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera, zanoli{@fbk.eu} Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project May, 2010

  2. Outline • EntityMentionDetection:Anextension of NER task. • Thesystemtobepresented: • MentionLevels: NAM, NOM, PRO • Entitytypes: GPE, LOC, ORG , PER • Drawingfrom 2 systems (ACE 2008, EVALITA 2009) • 2 new featurestorecognizementions • Applied in LiveMemories and Italianwikipedia • Available as a web service, tobeintegratedintoTextPro

  3. Mentions: Named Entities Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael ChávezFrías (28 July 1954) is the President of Venezuela. 4 mentions of type NAM (proper name ): 2 PER, 1 ORG, 1 GPE

  4. Mentions: Nominals Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. 3 nominal mentions (NOM): 3 PER

  5. Mentions: Pronominals Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael ChávezFrías (28 July 1954) is the President of Venezuela. 2 pronoun mentions (PRO): 2 PER

  6. Nested Mentions c Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. One-level mentions: Hugo Chavez Venezuelan Two-level mention: Venezuelan President Three-level: Venezuelan President Hugo Chavez

  7. Entities Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael ChávezFrías (28 July 1954) is the President of Venezuela. 6 different mentions refer to 1 entity of type PER

  8. The idea … • Exploiting a large corpus to improve the detection of • mentions: • Patterns • Data redundancy “ … Italia … “ “ … Rossi …” “ … Benetton … “

  9. PatternExtraction Candidates [Afterannotating the large corpus] • wordn-5 wordn-4 wordn-3 wordn-2 wordn-1 wordn wordn+1 wordn+2 wordn+3 wordn+4 wordn+5 MENTION • TF – IDF (Term Frequency – Inverse Document Frequency) : • Pattern Frequency: The more frequent the pattern occurs with a mention that belongs to an specific category, the more important is for the category. • Inverse Category Frequency : The more categories the pattern occurs with, the smaller its contribution in characterizing the semantics of a category which it co-occurs with.

  10. Data Redundancy • “... La giunta Coni sostiene la candidatura di Torino per le Olimpiadi giovanili 2010. ..” A GPE or an ORG (soccer team)? • Prob(“Torino”/type=“GPE”)? • Use a classifier to recognize all mentions in a large corpus in order to obtain the probability distribution for all mentions across all possible types. Mention=“Torino” B-GPE_NAM 11823 B-ORG_NAM 2950 B-LOC_NAM: 33 B-PER_NAM: 5 PER ORG GPE LOC

  11. SystemArchitecture Identifies the syntactic head of a mention and itsmentionlevel. For the extensionof a mention, weuse the MaltParserforItalian (Lavelli et al. 2009) Recognizes the typeof a mention

  12. SystemArchitecture 1.

  13. SystemArchitecture 2.

  14. SystemArchitecture 3.

  15. SystemArchitecture 4.

  16. SystemArchitecture 5.

  17. SystemArchitecture 6.

  18. Evaluation and FeatureAnalysis EVALITA 2009 EMD Task: value = 65.7% Feature Analysis:

  19. Applications … LiveMemories Project.- Identifying mentions in 2 Italian corpora: • Articles from the local newspaper “L’Adige” • Blogs posted by students living in the university residence of “San Bartolomeo”

  20. Applications … • Semantic Wikipedia for Italian (SWiiT) • http://textpro.fbk.eu/resources/SWiiT.html , annotated at 5 levels: • Basic NLP processing • Entity Mentions • Entity Subtypes (work in progress) • Entity Co-reference (work in progress) • Dependency parsing (work in progress)

  21. Systemavailable as … • A web service: http://textpro.fbk.eu/typhoon.html • Using Axis (open source, XML based web service framework) • Allows the user to submit a document and have it annotated with entity mentions using the IOB format • Part of TextPro: http://textpro.fbk.eu (work in progress)

  22. Conclusions and futurework Difficulties in recognizing pronominal mentions, coreference is needed. Data Redundancy improves the general FB1 in around 5%; and in around 20% for nominal names that refer to geopolitical entities. The results for patterns were not what was expected; probably because the selection of them for each class were not the appropriate ones. As future work we would like to find out how to select the right patterns for each class.

  23. References • Bartalesi Lenzi, V., Sprugnoli, R. (2009). EVALITA 2009: Description and Results of the Local Entity Detection and Recognition (LEDR) task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. • Bernaola Biggio, S.M., Zanoli, R., Giuliano, C., Uryupina, O., Versley, Y., Poesio, M. (2009). Local Entity Detection and Recognition Task. In Proceedings of Evalita 2009, workshop to held at AI*IA, 12 December 2009, Reggio Emilia, Italy. • Bernaola Biggio, S.M., Speranza M., Zanoli, R. Entity Mention Detection Using a Combination of Redundancy-Driven Classifiers. In Proceedings of LREC 2010, 7th Conference on Language Resources and Evaluation, Malta, Italy. • Lavelli, A., Hall, J., Nilsson, J., Nivre, J. (2009). MaltParser at the EVALITA 2009 Dependency Parsing Task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. • Magnini, B., Cappelli, A., Pianta, E., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R., Romano, L., Girardi, C., Negri, M. (2006). Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006. Florence, Italy. • Speranza, M. (2009). The Named Entity Recognition Task at EVALITA 2009. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy.

More Related