1 / 21

CiteSight : Contextual Citation Recommendation with Differential Search

CiteSight : Contextual Citation Recommendation with Differential Search. Avishay Livne 1 , Vivek Gokuladas 2 , Jaime Teevan 3 , Susan Dumais 3 , Eytan Adar 1 1 University of Michigan, 2 Qualcom, 3 Microsoft. #SIGIR18 # JaimesBackyard.

irish
Download Presentation

CiteSight : Contextual Citation Recommendation with Differential Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CiteSight:Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft

  2. #SIGIR18 #JaimesBackyard

  3. CiteSight:Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft

  4. Search Engines Focus on Speed

  5. Why Do We Cite? • Paying homage to pioneers • Giving credit for related work • Identifying methodology • Providing background • Correcting one’s work • Correcting the work of others • Substantiating claims • … [Garfield, 1965]

  6. How Do We Cite? • Many resources • Search engines • Bibliographic tools • Colleagues • Work practice • Papers we know • Papers we should know

  7. Why × How = 2 Specs • Spec 1 • I know what I want, give it to me now • Citation context: • “… calculating the differences between blocks of text [“ • Spec 2 • I don’t know or can’t remember what I want • [cite] • Complex, dynamic search space = slow • Inherent trade-off • Can we build a system to support both?

  8. The CiteSight User Interface

  9. Microsoft Academic Split World Into Two Stuff I want fast = stuff I know about Stuff I don’t know about

  10. Strategy • Small, personalized index • Updated dynamically • What you’ve cited before • What you’ve cited now • What other people have cited • Venue, co-citation, etc. • Run a big index for everything else

  11. Ranking • Query: Citation context • “… calculating the differences between blocks of text [“ • Dynamic recommendations • Immediately: Search the cache • In the background: Search the full index • Rank retrieved papers: • Gradient boosted regression tree • Features: network + text • Popularity, author similarity, textual similarity,…

  12. Citation context is really good at picking out “winners” People talk about a paper the same way as you! Not the same way the author talks about their work Citation Context XYZ is similar to ABC […] Bob et al. introduced ABC in […] We utilize ABC to…[…] Paper text

  13. That’s nice… Citations (S. Redner, 1998)

  14. Context Coupling • A and B related • Co-cited: When B is mentioned, A is • “Borrow” contexts from A to B • Borrowed context used as a feature in ranking papers A B Popular paper Less-popular paper

  15. CiteSight Evaluation • Can CiteSight predict existing citations? • 1000 randomly selected CS papers (2011) • Criteria: 20-40 citations • 5-fold cross validation • Metric: NDCG • Gain of 1 when guesses correct citation • Gain related to # of co-citations for close guesses • User feedback from 5 CS grad students

  16. Results • Large improvement • Context coupling • All features

  17. Results • Large improvement • Context coupling • All features • Citation-related features > text • More info = better • Authors • Citations, to a point

  18. Cache v. Corpus • Relevance • Cache accounts for 46% of NDCG@10 of the corpus • 10% cache is better • Speed • Cache: 6 ms • Instantaneous! • Corpus: 450 ms

  19. Summary • Differential need for speed • CiteSight – differential search • Two different use cases = two indices • Local index updated dynamically, contextually • Global index with full content • Context coupling improves relevance • Local index improves speed • Able to provide instantaneous results • Often relevant because contextually updated

  20. Questions?

More Related