trains of thought generating information maps n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Trains of Thought: Generating Information Maps PowerPoint Presentation
Download Presentation
Trains of Thought: Generating Information Maps

Loading in 2 Seconds...

play fullscreen
1 / 44

Trains of Thought: Generating Information Maps - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Trains of Thought: Generating Information Maps. Dafna Shahaf , Carlos Guestrin and Eric Horvitz. T he abundance of books is a distraction. ‘‘. ,,. Lucius Annaeus Seneca. 4 BC – 65 AD. So, you want to understand a complex topic… Now what?. Search Engines are Great.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Trains of Thought: Generating Information Maps


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Trains of Thought: Generating Information Maps DafnaShahaf, Carlos Guestrinand Eric Horvitz

    2. The abundance of books is a distraction ‘‘ ,, LuciusAnnaeus Seneca 4 BC – 65 AD

    3. So, you want to understand a complex topic… Now what?

    4. Search Engines are Great • But do not show how it all fits together

    5. Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

    6. Real Stories are not Linear

    7. Metro Map • A set of lines • Each line follows a coherent narrative thread • Structure + multiple aspects labor unions Merkel bailout Germany protests junk status austerity strike

    8. Map Definition • A map M is a pair (G,P) where • G=(V,E) is a directed graph • P is a set of paths in G (metro lines) • Each e Î E must belong to at least one metro line labor unions Merkel bailout Germany protests junk status austerity strike

    9. Game Plan

    10. Properties of a Good Map ??? Coherence

    11. Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Incoherent: Each pair shares different words Coherence is not a property of local interactions: 1 2 3 4 5 Greece Debt default Europe Republican Italy Protest

    12. Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Coherent: a small number of words captures the story A more-coherent chain: 1 2 3 4 5 Greece Debt default Austerity Republican Italy Protest

    13. Properties of a Good Map Is it enough? Coherence

    14. Max-coherence MapQuery: Clinton Clinton visitsBelfast Clinton setfor Dublin High hopes for Clinton visit Religion Leaders Divided on Clinton Moral Issue Clinton, Religious Leaders Share Thoughts Church Leaders Praise Clinton's 'Spirituality' Clinton Should Resign, 2 Religious Leaders Say

    15. Properties of a Good Map Coherence 2. Coverage • Should coverdiversetopicsimportantto the user

    16. Coverage Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09] • Select a small set of diverse articles that covers the most important stories January 17, 2009

    17. Coverage: The Idea • Documents cover concepts: CorpusCoverage

    18. High-coverage, Coherent Map Greek Civil ServantsStrike over Austerity Measures Greek Take to theStreets, but LacingEarlier Zeal Greece Paralyzedby New Strike Infighting Adds to Merkel’s Woes UK Backs Germany’s Effort It’s Germany that Matters Germany says the IMF should Rescue Greece IMF more Likely to Lead Efforts IMF is Urged to Move Forward

    19. Properties of a Good Map Coherence 2. Coverage 3. Connectivity

    20. Definition: Connectivity • Experimented with formulations • Users do not care about connection type • Encourage connections between pairs of lines

    21. Tying it all Together:Map Objective Consider all coherent maps with maximum possible coverage.Find the most connected one. • Coherence • Either coherent or not: Constraint • Coverage • Must have! • Connectivity • Nice to have

    22. Game Plan

    23. Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … 3. Increase Connectivity

    24. Coherence Graph: Main Idea 4 5 1 2 5 8 9 6 3 • Vertices correspond to short coherent chains • Directed edges between chains which can be conjoined and remain coherent 1 2 3 5 8 9

    25. Finding Vertices • Vertices are short, coherent chains • Can use [KDD’10] • Expensive • Solving many LPs • Take advantage of simplicityof short stories • No topic drift • Sampling-based (fast) algorithm

    26. Finding Edges • Problem:Combining several strong chains may result in a much-weaker chain Discontinuity: Change of focus

    27. m-Coherence • A chain is m-coherent if each sub-chain (di, …, di+m) is coherent. • Control discontinuity points: • m: size of user's ‘history window‘ • m=length(chain) : standard coherence • m=1: optimize transitions without context

    28. Observation • If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:

    29. Using the Observation 1 2 2 2 3 3 3 4 5 • If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent: • Useful for divide and conquer: • Add edge if m-1 overlap 1 2 3 5

    30. Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … 3. Increase Connectivity

    31. Finding High-Coverage Chains 1 2 2 3 2 3 5 3 4 • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles ? Cover( ) > Cover( ) 1 2 3 4 1 2 3 5

    32. Reformulation • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles • Submodular orienteering • [Chekuri and Pal, 2005] • Quasipolynomial time recursive greedy • O(log OPT) approximation a function of the nodes visited Orienteering

    33. Approach Overview: Recap Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Encodes all m-coherent chains as graph paths 3. Increase Connectivity

    34. Example Map: Greece Debt

    35. Game Plan

    36. Evaluation • User study • Document selection: capturing important content? • Micro-knowledge: question-answering • Macro-knowledge: high-level summaries • Effect of structure • New York Times (2008-2010) • 18K+ articles • Chile, Haiti, Greece

    37. Document Selection • Experts compose a list of important events • Subtopic recall (% of events in the map): Subtopicrecall # lines

    38. Micro-Knowledge (Question Answering) • Mechanical Turk • Competitors: • Google News • Event threading (TDT) [Nallapatiet al, 04] • Structurelessmaps • Results: minor gains • map structure helps Question 2: How many miners were trapped?

    39. Macro-Knowledge(High-Level Summaries) • Summarize complex story in a paragraph • Maps vs. Google News • ~15 paragraphs per task • Mturk to evaluate paragraphs: • Which paragraph provided a more complete and coherent picture of the story? • Justification: Paragraph A is more… • ~300 evaluations per task

    40. Macro-Knowledge: Results • Greece: 72% prefer maps • Justifications: • Haiti: 59% prefer maps • Map users mostly summarized one story line Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline Google News Maps

    41. Conclusions • Formulated metrics characterizing good maps • Efficient methods with theoretical guarantees • User studies highlight the promise of the method • Website on the way! • Personalization Thank you!

    42. Finding Coherent Chains • Goal: represent all coherent chains • Problem: intractable • Divide and conquer: • Find short coherent chains • Concatenate to form longer coherent chains

    43. Website