370 likes | 576 Views
7/11/03. Semantics and Info. Extraction. What is Semantics?. Theory of the relationship between formal aspects of language and objects and facts in the world.. 7/11/03. Semantics and Info. Extraction. Traditional Approach in NLP (and linguistics). Define a well-behaved logical languageIntensional l
E N D
1. Semantics and Information Extraction Douglas E. Appelt
Artificial Intelligence Center
SRI International
2. 7/11/03 Semantics and Info. Extraction What is Semantics? Theory of the relationship between formal aspects of language and objects and facts in the world.
3. 7/11/03 Semantics and Info. Extraction Traditional Approach in NLP (and linguistics) Define a well-behaved logical language
Intensional logic
Dynamic predicate logic
Discourse Representation Structures
Define a semantics for the logical language (using model theory)
Devise rules for translating natural language structures into the logical language that preserve truth conditions.
Apply principles of compositionality to build larger structures from smaller ones.
4. 7/11/03 Semantics and Info. Extraction Successes and Failures Success
Data base query applications (e.g. ATIS systems)
Dialog systems with narrow domain of application (e.g. TRAINS)
Failures
Extracting information from large corpora
Real syntax too complex
Coverage too weak for large corpora
5. 7/11/03 Semantics and Info. Extraction Semantics and Information Extraction General requirements of a semantic theory for information extraction
ACE as a specific approach to semantics for information extraction
Examine specific issues
Basic ontology
Coreference
Generic/Specific
Metonymy
Relations and Events
6. 7/11/03 Semantics and Info. Extraction Information Extraction:A Pragmatic Approach Let application requirements drive semantic analysis
Identify the types of entities that are relevant to a particular task
Identify the range of facts that one is interested in for those entities
Ignore everything else
7. 7/11/03 Semantics and Info. Extraction MUC and Scenario Templates Define a set of interesting entities
Persons, organizations, locations
Define a complex scenario involving interesting events and relations over entities
Example: management succession: persons, companies, positions, reasons for succession
This collection of entities and relations is called a scenario template.
8. 7/11/03 Semantics and Info. Extraction Problems with Scenario Template Encouraged development of highly domain specific ontologies, rule systems, heuristics, etc.
Most of the effort expended on building a scenario template system was not directly applicable to a different scenario template.
9. 7/11/03 Semantics and Info. Extraction Addressing the Problem Address a large number of smaller, more focused scenario templates (Event-99)
Develop a more systematic ground-up approach to semantics by focusing on elementary entities, relations, and events (ACE)
10. 7/11/03 Semantics and Info. Extraction The ACE Program Automated Content Extraction
Develop core information extraction technology by focusing on extracting specific semantic entities and relations over a very wide range of texts.
Corpora: Newswire and broadcast transcripts, but broad range of topics and genres.
Third person reports
Interviews
Editorials
Topics: foreign relations, significant events, human interest, sports, weather
Discourage highly domain- and genre-dependent solutions
11. 7/11/03 Semantics and Info. Extraction Components of a Semantic Model Entities - Individuals in the world that are mentioned in a text
Simple entities: singular objects
Collective entities: sets of objects of the same type where the set is explicitly mentioned in the text
Attributes - Timeless unary properties of entities (e.g. Name)
Temporal points and intervals
Relations - Properties that hold of two entities over a time interval
Events - A particular kind of relation among entities implying a change in relation state at the end of the time interval.
12. 7/11/03 Semantics and Info. Extraction Semantic Analysis: Relating Language to the Model Linguistic Mention
A particular linguistic phrase
Denotes a particular entity, relation, or event
A noun phrase, name, or possessive pronoun
A verb, nominalization, compound nominal, or other linguistic construct relating other linguistic mentions
Linguistic Entity
Equivalence class of mentions with same meaning
Coreferring noun phrases
Relations and events derived from different mentions, but conveying the same meaning
13. 7/11/03 Semantics and Info. Extraction Language and World Model
14. 7/11/03 Semantics and Info. Extraction NLP Tasks in an Extraction System
15. 7/11/03 Semantics and Info. Extraction The Basic Semantic Tasks of an IE System Recognition of linguistic entities
Classification of linguistic entities into semantic types
Identification of coreference equivalence classes of linguistic entities
Identifying the actual individuals that are mentioned in an article
Associating linguistic entities with predefined individuals (e.g. a database, or knowledge base)
Forming equivalence classes of linguistic entities from different documents.
16. 7/11/03 Semantics and Info. Extraction Choosing an Ontology for IE Semantics Ordinary native speakers should be able to annotate text with minimal training.
People should have well-developed intuitions about type classification
Is a museum an organization or facility? (A FOG?)
People should have well-developed intuitions about entity coreference
Peace in the Middle East
Entities should be extensional, not abstract, generic, counterfactual, or fictional
17. 7/11/03 Semantics and Info. Extraction The ACE Ontology and Annotation Standards Documents available online
http://www.ldc.upenn.edu/Projects/ACE/
Entity standards
Relations standards
Proposed event standards still under development
18. 7/11/03 Semantics and Info. Extraction The ACE Ontology Persons
A natural kind, and hence self-evident
Organizations
Should have some persistent existence that transcends a mere set of individuals
Locations
Geographic places with no associated governments
Facilities
Objects from the domain of civil engineering
Geopolitical Entities
Geographic places with associated governments
19. 7/11/03 Semantics and Info. Extraction Why GPEs An ontological problem: certain entities have attributes of physical objects in some contexts, organizations in some contexts, and collections of people in others
Sometimes it is difficult to impossible to determine which aspect is intentded
It appears that in some contexts, the same phrase plays different roles in different clauses
20. 7/11/03 Semantics and Info. Extraction Aspects of GPEs Physical
San Francisco has a mild climate
Organization
The United States is seeking a solution to the North Korean problem.
Population
France makes a lot of good wine.
21. 7/11/03 Semantics and Info. Extraction Metonymy Metonymy is when a speaker uses a mention to refer in a systematic way to an entity with a different name or type than that mentioned.
Metonymy is a property of mentions.
A literal mention is where the mention uses the name or type of the referential entity.
A metonymic mention violates that in some way.
A single entity can have both literal and metonymic mentions.
22. 7/11/03 Semantics and Info. Extraction Examples Name metonymy
Beijing announced a new policy toward North Korea.
Baltimore hit a home run in the ninth inning
SRI was severely damaged in the 1989 earthquake
Type metonymy
John works for the restaurant on the corner
23. 7/11/03 Semantics and Info. Extraction Problem Cases: literal and metonymic mentions both not types of interest
24. 7/11/03 Semantics and Info. Extraction Role AmbiguityWhy isnt it just metonymy? Iraq attacked Kuwait
Was the attack on the physical territory?
Was the attack on the government?
Was the attack on the people of Kuwait?
The answer is yes.
25. 7/11/03 Semantics and Info. Extraction Multiple Roles Iraq disputed its border with Kuwait
Governments dispute things
Physical real estate has borders
26. 7/11/03 Semantics and Info. Extraction Role Classification andSparse Data Problem Role determination through predicate-argument constraints
China announced a new policy regarding North Korea.
ACE Corpus: About 20K words in training corpus
GPE-PER: 84 configurations
GPE-LOC: 432 configurations
GPE-ORG: 504 configurations
GPE-GPE: 789 configurations
Only 131 configurations have more than 2 instances in the corpus (about 7%)
Many of those involve weakly constrained predicates (have, be, of, etc.)
27. 7/11/03 Semantics and Info. Extraction Generic vs Specific The assumed application is building a database using extracted information
Databases typically represent concrete entities
Specificity is a critical attribute of linguistic entities.
Specificity is a property of the entity, not the mention:
John is looking for a Java programmer.
He must have three years of experience.
Problem: assessment of specificity is a nuanced distinction subject to substantial inter-annotater disagreement
28. 7/11/03 Semantics and Info. Extraction Types of Linguistic Mentions Name mentions
The mention uses a proper name to refer to the entity
Nominal mentions
The mention is a noun phrase whose head is a common noun
Pronominal mentions
The mention is a headless noun phrase, or a noun phrase whose head is a pronoun, or a possessive pronoun
29. 7/11/03 Semantics and Info. Extraction Entity and Mention Example
30. 7/11/03 Semantics and Info. Extraction Relations Relations hold between two entities over a time interval.
Relations may be timeless or temporal interval is not specified
Relations have inertia, I.e. they dont change unless a relevant event happens.
31. 7/11/03 Semantics and Info. Extraction Explicit and Implicit Relations Many relations are true in the world. Reasonable knoweldge bases used by extraction systems will include many of these relations. Semantic analysis requires focusing on certain ones that are directly motivated by the text.
Example:
Baltimore is in Maryland is in United States.
Baltimore, MD
Text mentions Baltimore and United States. Is there a relation between Baltimore and United States?
32. 7/11/03 Semantics and Info. Extraction Another Example Prime Minister Tony Blair attempted to convince the British Parliament of the necessity of intervening in Iraq .
Is there a role relation specifying Tony Blair as prime minister of Britain?
A test: a relation is implicit in the text if the text provides convincing evidence that the relation actually holds.
33. 7/11/03 Semantics and Info. Extraction Explicit Relations Explicit relations are expressed by certain surface linguistic forms
Copular predication - Clinton was the president.
Prepositional Phrase - The CEO of Microsoft
Prenominal modification - The American envoy
Possessive - Microsofts chief scientist
SVO relations - Clinton arrived in Tel Aviv
Nominalizations - Anans visit to Baghdad
Apposition - Tony Blair, Britains prime minister
34. 7/11/03 Semantics and Info. Extraction Types of ACE Relations ROLE - relates a person to an organization or a geopolitical entity
Subtypes: member, owner, affiliate, client, citizen
PART - generalized containment
Subtypes: subsidiary, physical part-of, set membership
AT - permanent and transient locations
Subtypes: located, based-in, residence
SOC - social relations among persons
Subtypes: parent, sibling, spouse, grandparent, associate
35. 7/11/03 Semantics and Info. Extraction Event Types (preliminary) Movement
Travel, visit, move, arrive, depart
Transfer
Give, take, steal, buy, sell
Creation/Discovery
Birth, make, discover, learn, invent
Destruction
die, destroy, wound, kill, damage
36. 7/11/03 Semantics and Info. Extraction Problem: Collective and Distributive Reference
37. 7/11/03 Semantics and Info. Extraction Solution: Relations
38. 7/11/03 Semantics and Info. Extraction Summary Motivation for a semantic theory is a practical one driven by database filling needs
Pick a limited ontology of core concepts, and build out, motivated by application needs
Address a broad spectrum of semantic problems, but from a limited ontology that simplifies data annotation issues.