html5-img
1 / 70

Connecting the Dots Between News Articles

Connecting the Dots Between News Articles. Dafna Shahaf and Carlos Guestrin. Information overload is everywhere. Well, we have Google…. Search Limitations. Input. Output. Interaction. New query. Our Approach. Input. Output. Structured , annotated output.

courtney
Download Presentation

Connecting the Dots Between News Articles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connecting the Dots Between News Articles DafnaShahaf and Carlos Guestrin

  2. Information overload is everywhere

  3. Well, we have Google…

  4. Search Limitations Input Output Interaction New query

  5. Our Approach Input Output • Structured, annotated output • Phrase complex information needs Interaction Richer forms of interaction New query

  6. Connecting the Dots: News Domain 3.19.2008

  7. Input: Pick two articles(start, goal) Output: Bridge the gapwith a smooth chain of articles Input: Pick two articles(start, goal) Housing Bubble Input Input Output Output Interaction Interaction Bailout

  8. Input: Pick two articles(start, goal) Output: Bridge the gapwith a smooth chain of articles Housing Bubble • Keeping BorrowersAfloat • A Mortgage Crisis Begins to Spiral, ... • Investors Grow Wary of Bank's Reliance on Debt • Markets Can't Wait for Congress to Act • Bailout Plan Wins Approval Input Output Interaction Bailout

  9. Game Plan What is a good chain? Formalize objective Score a chain Find a good chain

  10. What is a Good Chain? • What’s wrong with shortest-path? • Build a graph • Node for every article • Edges based on similarity • Chronological order (DAG) • Run BFS s t

  11. Shortest-path Lewinsky • A1:alks Over Ex-Intern's Testimony On Clinton Appear to Bog • A2: Judge Sides with the Government in Microsoft Antitrust Trial • A3: Who will be the Next Microsoft? • trading at a market capitalization… • A4:Palestinians Planning to Offer Bonds on Euro. Markets • A5: ClintonWatches as PalestiniansVote to Rescind 1964 Provision • A6:ontesting the Vote: The Overview; Gore asks Public For Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience; Florida recount

  12. Shortest-path • A1:alks Over Ex-Intern's Testimony On Clinton Appear to Bog • A2: Judge Sides with the Government in Microsoft Antitrust Trial • A3: Who will be the Next Microsoft? • trading at a market capitalization… • A4:PalestiniansPlanning to Offer Bonds on Euro. Markets • A5: ClintonWatches as PalestiniansVote to Rescind 1964 Provision • A6:ontesting the Vote: The Overview; Gore asks Public For Talks Over Ex-Intern's TestimonyOn Clinton Appear to Bog Down Contesting the Vote: The Overview; Gore asks Public For Patience;

  13. Shortest-path • A1: • A2: Judge Sides with the Government in MicrosoftAntitrust Trial • A3: Who will be the Next Microsoft? • trading at a market capitalization… • A4:PalestiniansPlanning to Offer Bonds on Euro. Markets • A5: ClintonWatches as PalestiniansVote to Rescind 1964 Provision • A6: Talks Over Ex-Intern's TestimonyOn Clinton Appear to Bog Down Stream of consciousness? - Each transition is strong - No global theme Contesting the Vote: The Overview; Gore asks Public For Patience;

  14. More-Coherent Chain Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down • B1: • B2:ClintonAdmitsLewinskyLiaison to Jury • B3: G.O.P. Vote Counter in House PredictsImpeachmentofClinton • B4:Clinton Impeached; He Faces a Senate Trial • B5:Clinton’s Acquittal; Senators Talk About Their Votes • B6: Aides Say Clinton Is Angered As Gore Tries to Break Away • B7: As Election Draws Near, the Race Turns Mean • B8: Lewinsky Florida recount Contesting the Vote: The Overview; Gore asks Public For Patience;

  15. More-Coherent Chain Talks Over Ex-Intern's Testimony On Clinton Appear to Bog Down • B1: • B2:ClintonAdmitsLewinskyLiaison to Jury • B3: G.O.P. Vote Counter in House PredictsImpeachmentofClinton • B4:Clinton Impeached; He Faces a Senate Trial • B5:Clinton’s Acquittal; Senators Talk About Their Votes • B6: Aides Say Clinton Is Angered As Gore Tries to Break Away • B7: As Election Draws Near, the Race Turns Mean • B8: What makes it coherent? Contesting the Vote: The Overview; Gore asks Public For Patience;

  16. Word Patterns For Shortest Path Chain Topic changes every transition (jittery)

  17. Word Patterns For Coherent Chain Use this intuition to estimate coherence of chains Topic consistent over transitions

  18. What is a Good Chain? • Every transition is strong • Global theme • No jitteriness (back-and-forth) • Short (5-6 articles?)

  19. What is a Good Chain? • Every transition is strong • Global theme • No jitteriness (back-and-forth) • Short (5-6 articles?)

  20. Strong transitions between consecutive documents min(4,3,1)=1 4 3 1 w1: Lewinsky w2: Clinton d4 d2 d3 d1 w3: Oath w4: Intern w5: Microsoft min(4,3,1)=1

  21. Strong transitions between consecutive documents • ??? • Intuitively, high iff • di and di+1very related • w plays an important role in the relationship • Too coarse • Word importance in transition • Missing words min(4,3,1)=1

  22. Influence • Most methods assume edges • Influence propagates through the edges • No edges in our dataset • Intuitively, high iff • di and di+1very related • w plays an important role in the relationship min(4,3,1)=1

  23. Computing Influence(di, dj | w) Clinton w Judge Microsoft Gore di Clinton Admits Lewinsky Contestthe Vote Judge Sides with the Govmnt The Next Microsoft dj

  24. Computing Influence(di, dj | w) • 1. Run random walks • - Random restarts from di • - εcontrols expected length Clinton w Judge Microsoft Gore di Clinton Admits Lewinsky Contestthe Vote Judge Sides with the Govmnt The Next Microsoft dj

  25. Computing Influence(di, dj | w) Clinton w Judge Microsoft Gore di Clinton Admits Lewinsky Contestthe Vote Judge Sides with the Govmnt The Next Microsoft dj

  26. Computing Influence(di, dj | w) Clinton w • Calculate stationary distribution of dj • - Intuitively, high if documents are related • How important is w? • Check how many walks went through w Judge Microsoft Gore di Clinton Admits Lewinsky Contestthe Vote Judge Sides with the Govmnt The Next Microsoft dj

  27. Computing Influence(di, dj | w) w • dj no longer reachable: • All influence is due to w • 2. Influence(di, dj | w) = • Stationary distribution(dj) with w - Stationary distribution(dj) withoutw Clinton Judge Microsoft Gore di Clinton Admits Lewinsky Contestthe Vote Judge Sides with the Govmnt The Next Microsoft dj

  28. Influence: Reality Check • di: OJ Simpson trial article • dj: DNA evidence in OJ trial • dj: Super Bowl 49ers

  29. Coherence formulation No edges.Computed using random walks

  30. What is a Good Chain? • Every transition is strong • Global theme • No jitteriness (back-and-forth) • Short (5-6 articles?)

  31. Global Theme, No Jitter • Jittery chain can score well! • But need a lot of words… • Good chains can often be represented by a small number of segments

  32. Global Theme, No Jitter • Choose 3 segments to be scored on Score = 0 Good score

  33. Coherence: New Objective • Maximize over legal activations: • Limit total number of active words • Limit number of words per transition • Each word to be activated at most once

  34. Game Plan What is a good chain? Formalize objective Score a chain Find a good chain

  35. Scoring a Chain • Problem is NP-Complete • Softer notion of activation: [0,1] • Natural formalization as a linear program (LP)

  36. LP: Objective Pre-computed

  37. LP: Smoothness • A word is active if either • Active before • Just initialized • Each word is initialized at most once

  38. LP: Activation • Limit #words • Limit #words per transition

  39. Example • Scoring a chain • September 11th to Daniel Pearl Activation levels weighted by influence(rescaled) Activation levels

  40. Game Plan What is a good chain? Formalize objective Score a chain Find a good chain

  41. Finding a good chain • Can’t brute-force • nd possible chains: >>1020 after pruning • Joint LP: optimize activation and chain • New variables: • Is document di a part of the chain? • Does document dj come after di in the chain? • New constraints: • Chain structure • Length = K next(s,3) next(s,t) next(s,2) s 2 3 t

  42. Rounding • Unlike previous LP, we need to round • Extract a chain • Approximation guarantees • Chain length K in expectation • Objective: O(sqrt(ln(n/e)) with probability 1- e 0.1 0.3 0.6 0.7 0.9 s 2 3 t

  43. Scaling Up • LP has variables • Polynomial, but D is large • Restricting number of documents • Sparsifying the graph • Random walks

  44. Game Plan What is a good chain? Formalize objective How good is it? Score a chain Find a good chain

  45. Evaluation: Competitors • Shortest path • Google Timeline • Enter a query • Pick k equally-spaced articles • Event threading (TDT) [Nallapati et al ‘04] • Generate cluster graph • Representative articles from clusters

  46. Example Chain (1) Simpson trial • Simpson Strategy: There Were Several Killers • O.J. Simpson's book deal controversy • CNN OJ Simpson Trial News: April Transcripts • Tandoori murder case a rival for OJ Simpson case Simpson verdict Google News Timeline

  47. Example Chain (2) Simpson trial • Issue of Racism Erupts in Simpson Trial • Ex-Detective's Tapes Fan Racial Tensions in LA • Many Black Officers Say Bias Is Rampant in LA Police Force • With Tale of Racism and Error, Lawyers Seek Acquittal Simpson Verdict Connect-the-Dots

  48. Evaluation #1: Familiarity • 18 users • Show two articles • 5 news stories • Before and after reading the chain Do you know a coherent story linking these articles?

  49. Effectiveness (improvement in familiarity) Better Average fraction of gap closed Base familiarity: 2.1 3.1 3.2 3.4 1.9

More Related