1 / 15

Indra

Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey. Indra. Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context

sofia
Download Presentation

Indra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indra: Emergent Ontologies from Text for Feeding Data to SimulationsDeborah DuongAugustine ConsultingTRAC-Monterey

  2. Indra • Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context • Iterative feedback finds global consensus on meaning, for accurate role discovery • Flexible emergent ontologies form, combining data driven with hypothesis driven approaches • Feedback facilitates data fusion with other modalities • A way to feed higher level information back to lower level extraction, introducing feedback to data fusion

  3. Language is Context Dependent • Language is deeply context dependent, but natural language programs complete each stage before the next starts in “pipelines” • Indra uses a feedback loop to let the parse, word sense assignment, and ontological assignments inform each other • The result is a flexible data driven ontology that can be aligned with other models

  4. Making “Sense” of Text • “Word sense” of entities and their actions • Inter-Document Coreference Resolution • Many ways of Naming a Person • Different Persons may have the same name • Link Normalization • Many ways of referring to a Behavior • Different Behaviors referred to with the same words

  5. General Roles and Role Relationships • Indra extracts general Role and Role relationships from text • These Role and Role relationships are arranged in ontological groupings • Iterative feedback allows different parts of the ontology to influence each other • Iterative feedback makes system deeply adaptive so outside data can have widespread influence

  6. Global Consensus on Sense • Grouping of entities and links increases the information with each iteration • With each iteration, the unsupervised scatter-gather finds the “sense” of named entities, finding which individuals they are based on their role • As information corrects senses of links and entities, and neighbors correct their neighbors, a global consensus on sense forms. • As links and entities are grouped, an emergent ontology is formed

  7. Iterative Feedback introduced in stages • Stage 1: Upper-lower feedback *Implemented • Larger clusters and smaller clusters influence each other • Stage 2: Side-to-side feedback *Implemented • Node clusters and link clusters influence each other • Stage 3: More Upper-lower feedback • Ontology and parse influence each other • Stage 4: Feedback with external systems • Seed hypotheses from analysts and inference engines have wide influence

  8. Stage 1: Upper –Lower Feedback • Roles are clustered according to link contexts, and Role relations are clustered according to entity contexts • Two separate ontologies form • Clusters at higher levels split clusters at lower levels • Essential for word sense (and “entity sense”) • For example, clusters for factories and autotrophs split the word “plant” • Clustering algorithms are either agglomerative or divisive: “unsupervised scatter gather” is both • Clusters split and divide until convergence

  9. Stage 2 : Side to Side feedback • Stage 1 was clustering entities based on links and links based on entities • Stage 2 is clustering entities based on link *clusters* and links based on entity *clusters* • The separate Role and Role relationship ontologies of stage 1 become intertwined • Needed for data smoothing and more consensus

  10. Stage 3: More upper-lower feedback • Choose parse based on ontology (parse already influences ontology in feedforward) • Choose parse based on how common it is for similar words to be attached in that way. • Example: • Jane ate the salad with a fork • “with” modifies “ate” because tools such as “forks” and “knives” are typically found to be used to “eat” or “consume” • Jane ate the salad with croutons • With modifies salad, because things that are “eaten” or “consumed” are typically foods such as “croutons” or “tomatoes” • Later, instead of using rule based parser, use mutual information to parse (Yuret), making Indra purely statistical • Can be used with any language

  11. Stage 4: Feedback with External Systems • Purpose of feedback is deep adaptivity, so external data can influence and be easily fused • Hypothesis Driven AND Data Driven Ontologies • If an analyst groups concepts: • Collocated paths found • These help develop analyst’s concept • More consonant concepts and paths found • RELATIVELY FEW points of correspondence needed

  12. Example Cluster • p:35805,n:34540.fes // morocco city • p:35805,n:37114.tenerife //spanish city • p:35805,n:37344.zaragoza //spanish city, with football club • p:35805,n:37548.boavista //portugese island, with football club • p:35805,n:38590.maritimo //portugese sports club known for football team • p:43243,n:39997. • p:39997,n:29474.saccoh • p:39997,n:29612.spaho //bosnia small town • p:39997,n:33375.spartak //Moscow football club • p:39997,n:34467.environmentastrit • p:39997,n:34721.haxhi //Albanian football player • p:43243,n:40629.tenerife • p:43243,n:41043.boavista • p:43243,n:42049.maritimo • p:46477,n:44423.bilbao //basque city • p:46477,n:44563.centreleft //football position • p:49912,n:48979.oviedo //spanish city

  13. Example Cluster • p:49224,n:50682.tenerife • p:56352,n:53348. • p:53348,n:46799. • p:46799,n:40301.rayo //football club in madrid • p:46799,n:41027.bilbao • p:53348,n:47751.bilbao • p:56352,n:53354. • p:53354,n:47225.shelling • p:56352,n:53766.shelling • p:56352,n:53814.spartak • p:56352,n:54104.youridjourkaeff • p:56352,n:54108.zaragoza • p:56352,n:54460.colo //chile football club • p:56352,n:55076.kickoff //football term • p:65663,n:62554. • p:62554,n:60508.youridjourkaeff • p:83660,n:85323.youridjourkaeff • p:86579,n:84114. • p:84114,n:81134. • p:81134,n:75091. • p:75091,n:73692.deportivo //spanish football club

  14. Ontologies Problematic • Indra will approximate most likely (highest mutual information) ontology • BUT, analysts want their own ontologies • Different experts look at same data • Data stored in primitive entities and paths • Indra to make semantic model on the fly tailored to ontology of who is looking at it • Tailored Ontologies towards ontologies of particular simulation models

  15. Hypothesis Driven AND Data Driven • Indra can flexibly take in analyst input • Indra can align its ontology to another with very few points of correspondence • Indra can fill in the gaps • Feedback gives Indra advantage over other systems that generate ontologies: • Global consensus • Ability to adapt to any amount of user input

More Related