Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop

Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop Bachelor Thesis Leibniz University of HanoverMichał Kopycki Michał Kopycki

Bestseller How to write SPYWARE for “research purpose” and get paid for this Michał Kopycki

Personalization Research Issues (from Eelco’s presentation) Data Acquisition Knowledge Inference User Model Adaptation Decision Making Adaptation Mechanism Michał Kopycki

Outline • Motivation Motivation Logging Framework User study Conclusion and future work Michał Kopycki

[CGNP05] Haystack ‘97 [CN06] [CSC+07] Letizia ‘95 LifeStreams ‘96 [Her06] [CDH+08] JIRIT ‘00 User Context [RM00] Beagle++ ‘05 [TDH05] [BM02] Stuff I’ve Seen ‘03 [WJR02] Movielens Amazon LastFM StumbleUpon Libra Del.icio.us Michał Kopycki

User Context ... in our context Interaction with resource as context • Resource as context Sequence of access TFxIDF Sender Reading time Genre Reference Time windows Bookmarking GPS location Web address Printing document Michał Kopycki

What is user context good for ? • Relationships between resources • Elicitation of user interests • Activity based computing Sergey Chernov, Task Detection for Activity-Based Desktop Search, L3S Research Seminar Michał Kopycki 10/8/2014 7

“…exploiting usage analysis information about sequences of accesses to local resources…” (L3S 2006) Thesis goals • User context recognition support • Public Desktop dataset alternative „… The absence of shared information makes it difficult to focus research problems, and to compare research results…” (Newman 1997) “…an appropriate common test collection that is accepted by the community is required…” (Voorhees. 2001) “…Building a Desktop IR testbed seems to be more challenging…”(L3S 2007) • “…Desktop datasets within different research groups using a single methodology and a common set of tools …” (L3S 2008) Michał Kopycki

Outline • Motivation Logging Framework User study Conclusion and future work Michał Kopycki

Requirements • Automatic • Automatic • Cross-application • Cross-application • Implicit Feedback • Implicit Feedback Relevant A • Privacy preserving • Privacy preserving Not relevant Web Logging Framework Email Relevant • Extensible • Extensible B Not relevant New best Web browser plug-in New best Email client plug-in Relevant C IM Not relevant File System Michał Kopycki

Our approach Resources Applications Michał Kopycki

Component view User Activity Logger C\C++ File system drivers Desktop Internet Explorer Window Events Windows undocumented API Outlook Express Window hooks File System XUL VSTO Thudnerbird Firefox Outlook 2003 Outlook 2007 Thunderbird .NET Firefox C# JavaScript Michał Kopycki

Logging Framework Michał Kopycki

Supported notifications Michał Kopycki

Nepomuk adaptation Logging Framework User Observation Hub Michał Kopycki

User study • 21 participants • Average of 170 active logging days • 2,828,706 Events • Average of 2,815 distinct emails per user • Average of 9,337 distinct URLs per user • Average of 902 events per user per day • Average 5 hours of active interaction per user per day Michał Kopycki

Dataset activity coverage Michał Kopycki

Data collection Methodology: www l3s de google Encryption schemas: Michał Kopycki

A glimpse into user behavior Instant reader Moderate reader Michał Kopycki

Conclusion • Logging Framework • http://pas.kbs.uni-hannover.de/ • http://sourceforge.net/projects/activity-logger • User study • Desktop Dataset • Nepomuk integration • PIM’08 Workshop paper Michał Kopycki

Future work • Logging Framework: • centralized architecture • ontology based RDF output format • support for new applications and notifications • Vista support • Exploratory analysis of the Desktop dataset • Email interaction • Web search interaction • Application interaction Michał Kopycki

References • [BM02] Peter Brusilovsky and Mark T. Maybury. From adaptive hypermedia to the adaptive web. Communications of the ACM, volume 45, pages 30–33, 2002. • [CDH+08] Sergey Chernov, Gianluca Demartini, Eelco Herder, Michał Kopycki, and Wolfgang Nejdl. Evaluating Personal Information Management using an activity logs enriched Desktop dataset. In (To appear) PIM ’08: In Proceedings of the Workshop on Personal Information Management, 2008. • [CSC+07] Sergey Chernov, Pavel Serdyukov, Paul-Alexandru Chirita, Gianluca Demartini, and Wolfgang Nejdl. Building a desktop search test-bed. In ECIR ’07: Proceedings of 29th European Conference on IR Research, Advances in Information Retrieval, pages 686–690. Springer, 2007. • [Her06] E. Herder. Forward, Back and Home Again - Analyzing User Behavior on the Web. PhD thesis, University of Twente, Enschede, 2006. • [RM00] B. J. Rhodes and P. Maes. Just-in-time information retrieval agents. IBM Systems Journal, volume 39, pages 685–704, 2000. • [TDH05] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizingsearch via automated analysis of interests and activities. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 449–456. ACM, 2005. • [WJR02] R.W. White, J.M. Jose, and I. Ruthven. Comparing explicit and implicitfeedbacktechniquesfor web retrieval: Trec-10 interactivetrack report. TREC ’02: Proceedings of the Tenth Text Retrieval Conference, 2002. • [CN06] Paul-Alexandru Chirita, Wolfgang Nejdl Analyzing User Behavior to Rank Desktop Items. In: String Processing and Information Retrieval, 13th International Conference, SPIRE 2006, Proceedings, pp. 86-97, 2006. • [CGNP05] Paul-Alexandru Chirita, Stefania Costache, Wolfgang Nejdl, RalucaPaiu Beagle++: Semantically Enhanced Searching and Ranking on the Desktop. (Electronic Edition) In: The Semantic Web: Research and Applications, 3rd European Semantic Web Conference, ESWC 2006, Proceedings, pp. 348-362, 2006. • [WTN00] Steve Whittaker, Loren Terveen, and Bonnie A. Nardi. Let’s stop pushing the envelope and start addressing it: a Reference Task Agenda for HCI. Human Computer Interaction, volume 15, pages 75–106, 2000. • [McG95] Joseph E. McGrath. Methodology matters: doing research in the behavioral and social sciences. Human-computer interaction: toward the year 2000, pages 152–169, 1995. • [CLWB01] Mark Claypool, Phong Le, Makoto Wased, and David Brown. Implicit interest indicators. In IUI ’01: Proceedings of the 6thinternational conference on Intelligent user interfaces, pages 33–40. ACM, 2001. • [TAAK04] Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In CHI ’04: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 415–422. ACM, 2004. • [WRJ02] Ryen W. White, Ian Ruthven, and Joemon M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 57–64. ACM, 2002. • [Voo02] Ellen M. Voorhees. The philosophy of information retrievalevaluation. In CLEF ’01: Revised Papers from the SecondWorkshopof the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pages 355–370, London, 2002. Michał Kopycki

Many thanks to: • Sergey and Eelco Study participants YOU !! Michał Kopycki

Related work Single domain (Web, Email) Dragontalk [TAAK04] Connections [WRJ02] Implicit Feedback Explicit Feedback Haystack LifeStreams MyLifeBits Stuff I’ve Seen Beagle ++ Cross domain Michał Kopycki

Collected data Michał Kopycki

A glimpse into user behavior • File access over folder hierarchy Michał Kopycki

A glimpse into user behavior • Web page visit length Michał Kopycki

Alternative to the public Desktop dataset Dataset 1 Dataset 2 Desktop dump Logging Framework Desktop dump Logging Framework Comparable Soft-repeatable Common structure Common output Dataset 3 Desktop dump Logging Framework Michał Kopycki

Seems hard, but… “It is possible” [BLA06],[APRILFOOL08],[HAHA07] DEADLINE Michał Kopycki

Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop

Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop

Presentation Transcript

Web Usage Patterns

Accurately Interpreting Clickthrough Data as Implicit Feedback

On implicit evaluations

Feedback on the report

Feedback on feedback feedback !

Query Chains: Learning to Rank from Implicit Feedback

Feedback EDF Scheduling Exploiting Dynamic Voltage Scaling

Usage patterns of collaborative tagging systems

Detecting Implicit Collaboration Patterns

Linux on the Desktop

The ability to identify patterns and trends.

Feedback on the Fly

Feedback on the WYM2010

Online Learning to Diversify using Implicit Feedback

MySQL and SSD: Usage Patterns

Query Chains: Learning to Rank from Implicit Feedback

Context-Sensitive Information Retrieval Using Implicit Feedback