1 / 17

Approximate subgraph matching for detection of topic variations

Approximate subgraph matching for detection of topic variations. Mitja Trampu š Dunja Mladenić AI Lab, Jožef Stefan Institute , Slovenia. Mining Diversity. Web content varies in many aspects , e.g. Topical Social (author, target audience, people written about)

dyanne
Download Presentation

Approximate subgraph matching for detection of topic variations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate subgraph matching for detection of topic variations MitjaTrampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia

  2. Mining Diversity • Web content varies in many aspects, e.g. • Topical • Social (author, target audience, people written about) • Geographical (publisher, places written about) • Sentiment (positive/negative) • Writing style (structure, vocabulary) • Coverage bias • This work: (micro-)topical diversity • Macroscopic = largely solved • Microscopic = challenge

  3. Task: Given a collection of texts on a topic, • identify a common template • align texts to the template

  4. Template representation • Syntactic • info1: X people were killed / killed X people / resulted in X casualties • info2: blew up Y / destroyed Y / attacked in a Y • Semantic • kill(bomber, people); count(people, X) • destroy(bomber, Y) X count Y bomber people destroy kill

  5. car drive 12 count church suicidebomber believers destroy kill exit vest 2 run wear count policestation bomber policeofficer blow up slaughter Prerequisite: Semantic Graph treatment attack execute 100 receive withstand count hospital terrorist patients demolish kill

  6. Mining Templates • Template := subgraph with frequent specializations • Specializations implied by background taxonomy (WordNet) • Threshold frequency manually defined 2 12 teardown count count policestation church suicidebomber bomber believers policeofficer blow up slaughter kill X count building bomber people destroy kill

  7. Semantic Graph Construction • Data: Google News crawl • HTML cleanup • Named entity tagging • Pronoun resolution (he/she/him/her) • Named entity consolidation (Barack Obama vs President Obama) • Parsing, triple/fact/assertion extraction(for now: subj-verb-obj only) • Ontology/taxonomy alignment • Merging triples into a graph

  8. Approximate subgraph matching number number 2 12 number count count teardown count count destroy slaughter blow up count location location church policestation location suicidebomber person bomber person person policeofficer believers people people people destroy destroy kill kill kill kill FREQUENT SUBTREE MINING GENERALIZE

  9. Approximate subgraph matching 12 2 number number teardown count count count count policestation building church location suicidebomber bomber person bomber policeofficer people people believers destroy blow up destroy slaughter kill kill kill SPECIALIZE

  10. Preliminary results • 5 test domains; for each: • ~10 graphs, ~10000 nodes • 10-60 seconds • At min. support 30% • 20 maximal patterns, 9 manually judged as interesting

  11. Conclusion • Future work: • Mapping text -> semantics • Other ontologies? • Interestingness measure for assertions and patterns • Evaluation (precision, recall; multiple domains) • Alternative approaches to generalizing subgraphs • Template extraction is achievable, but not easy • Human filtering of results hard to avoid • Current approach reasonably fast

  12. Q? Thank you.

  13. Can we extract all relations?Kind of … Thousands of small quakes resumed 18 months ago and continue to rattle Mammoth Lakes, June Lake and other Mono County resort towns. The temblors, most measuring 1 to 3 on the Richter scale, started beneath Mammoth Mountain. Subject – Verb – Object Triplets Semantic Graph

  14. Templates - why • Interpret content • news archives: structure/annotate old texts, enable semantic search • wikipedia: suggestions for infobox entries • Generate content • wikipedia: a starting point for new articles / a checklist of information to be included • No normative definition of “good template”

  15. Evaluation • Qualitative • Usage-specific • Not useful for tuning algorithms • Quantitative • Precision • Recall

More Related