310 likes | 460 Views
Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop. Bachelor Thesis Leibniz University of Hanover Micha ł Kopycki. Bestseller. H ow to write SPYWARE for “research purpose” and get paid for this. Personalization Research Issues (from Eelco’s presentation).
E N D
Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop Bachelor Thesis Leibniz University of HanoverMichał Kopycki Michał Kopycki
Bestseller How to write SPYWARE for “research purpose” and get paid for this Michał Kopycki
Personalization Research Issues (from Eelco’s presentation) Data Acquisition Knowledge Inference User Model Adaptation Decision Making Adaptation Mechanism Michał Kopycki
Outline • Motivation Motivation Logging Framework User study Conclusion and future work Michał Kopycki
[CGNP05] Haystack ‘97 [CN06] [CSC+07] Letizia ‘95 LifeStreams ‘96 [Her06] [CDH+08] JIRIT ‘00 User Context [RM00] Beagle++ ‘05 [TDH05] [BM02] Stuff I’ve Seen ‘03 [WJR02] Movielens Amazon LastFM StumbleUpon Libra Del.icio.us Michał Kopycki
User Context ... in our context Interaction with resource as context • Resource as context Sequence of access TFxIDF Sender Reading time Genre Reference Time windows Bookmarking GPS location Web address Printing document Michał Kopycki
What is user context good for ? • Relationships between resources • Elicitation of user interests • Activity based computing Sergey Chernov, Task Detection for Activity-Based Desktop Search, L3S Research Seminar Michał Kopycki 10/8/2014 7
“…exploiting usage analysis information about sequences of accesses to local resources…” (L3S 2006) Thesis goals • User context recognition support • Public Desktop dataset alternative „… The absence of shared information makes it difficult to focus research problems, and to compare research results…” (Newman 1997) “…an appropriate common test collection that is accepted by the community is required…” (Voorhees. 2001) “…Building a Desktop IR testbed seems to be more challenging…”(L3S 2007) • “…Desktop datasets within different research groups using a single methodology and a common set of tools …” (L3S 2008) Michał Kopycki
Outline • Motivation Logging Framework User study Conclusion and future work Michał Kopycki
Requirements • Automatic • Automatic • Cross-application • Cross-application • Implicit Feedback • Implicit Feedback Relevant A • Privacy preserving • Privacy preserving Not relevant Web Logging Framework Email Relevant • Extensible • Extensible B Not relevant New best Web browser plug-in New best Email client plug-in Relevant C IM Not relevant File System Michał Kopycki
Our approach Resources Applications Michał Kopycki
Component view User Activity Logger C\C++ File system drivers Desktop Internet Explorer Window Events Windows undocumented API Outlook Express Window hooks File System XUL VSTO Thudnerbird Firefox Outlook 2003 Outlook 2007 Thunderbird .NET Firefox C# JavaScript Michał Kopycki
Logging Framework Michał Kopycki
Supported notifications Michał Kopycki
Nepomuk adaptation Logging Framework User Observation Hub Michał Kopycki
Outline • Motivation Logging Framework User study Conclusion and future work Michał Kopycki
User study • 21 participants • Average of 170 active logging days • 2,828,706 Events • Average of 2,815 distinct emails per user • Average of 9,337 distinct URLs per user • Average of 902 events per user per day • Average 5 hours of active interaction per user per day Michał Kopycki
Dataset activity coverage Michał Kopycki
Data collection Methodology: www l3s de google Encryption schemas: Michał Kopycki
A glimpse into user behavior Instant reader Moderate reader Michał Kopycki
Outline • Motivation Logging Framework User study Conclusion and future work Michał Kopycki
Conclusion • Logging Framework • http://pas.kbs.uni-hannover.de/ • http://sourceforge.net/projects/activity-logger • User study • Desktop Dataset • Nepomuk integration • PIM’08 Workshop paper Michał Kopycki
Future work • Logging Framework: • centralized architecture • ontology based RDF output format • support for new applications and notifications • Vista support • Exploratory analysis of the Desktop dataset • Email interaction • Web search interaction • Application interaction Michał Kopycki
References • [BM02] Peter Brusilovsky and Mark T. Maybury. From adaptive hypermedia to the adaptive web. Communications of the ACM, volume 45, pages 30–33, 2002. • [CDH+08] Sergey Chernov, Gianluca Demartini, Eelco Herder, Michał Kopycki, and Wolfgang Nejdl. Evaluating Personal Information Management using an activity logs enriched Desktop dataset. In (To appear) PIM ’08: In Proceedings of the Workshop on Personal Information Management, 2008. • [CSC+07] Sergey Chernov, Pavel Serdyukov, Paul-Alexandru Chirita, Gianluca Demartini, and Wolfgang Nejdl. Building a desktop search test-bed. In ECIR ’07: Proceedings of 29th European Conference on IR Research, Advances in Information Retrieval, pages 686–690. Springer, 2007. • [Her06] E. Herder. Forward, Back and Home Again - Analyzing User Behavior on the Web. PhD thesis, University of Twente, Enschede, 2006. • [RM00] B. J. Rhodes and P. Maes. Just-in-time information retrieval agents. IBM Systems Journal, volume 39, pages 685–704, 2000. • [TDH05] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizingsearch via automated analysis of interests and activities. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 449–456. ACM, 2005. • [WJR02] R.W. White, J.M. Jose, and I. Ruthven. Comparing explicit and implicitfeedbacktechniquesfor web retrieval: Trec-10 interactivetrack report. TREC ’02: Proceedings of the Tenth Text Retrieval Conference, 2002. • [CN06] Paul-Alexandru Chirita, Wolfgang Nejdl Analyzing User Behavior to Rank Desktop Items. In: String Processing and Information Retrieval, 13th International Conference, SPIRE 2006, Proceedings, pp. 86-97, 2006. • [CGNP05] Paul-Alexandru Chirita, Stefania Costache, Wolfgang Nejdl, RalucaPaiu Beagle++: Semantically Enhanced Searching and Ranking on the Desktop. (Electronic Edition) In: The Semantic Web: Research and Applications, 3rd European Semantic Web Conference, ESWC 2006, Proceedings, pp. 348-362, 2006. • [WTN00] Steve Whittaker, Loren Terveen, and Bonnie A. Nardi. Let’s stop pushing the envelope and start addressing it: a Reference Task Agenda for HCI. Human Computer Interaction, volume 15, pages 75–106, 2000. • [McG95] Joseph E. McGrath. Methodology matters: doing research in the behavioral and social sciences. Human-computer interaction: toward the year 2000, pages 152–169, 1995. • [CLWB01] Mark Claypool, Phong Le, Makoto Wased, and David Brown. Implicit interest indicators. In IUI ’01: Proceedings of the 6thinternational conference on Intelligent user interfaces, pages 33–40. ACM, 2001. • [TAAK04] Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In CHI ’04: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 415–422. ACM, 2004. • [WRJ02] Ryen W. White, Ian Ruthven, and Joemon M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 57–64. ACM, 2002. • [Voo02] Ellen M. Voorhees. The philosophy of information retrievalevaluation. In CLEF ’01: Revised Papers from the SecondWorkshopof the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pages 355–370, London, 2002. Michał Kopycki
Many thanks to: • Sergey and Eelco Study participants YOU !! Michał Kopycki
Related work Single domain (Web, Email) Dragontalk [TAAK04] Connections [WRJ02] Implicit Feedback Explicit Feedback Haystack LifeStreams MyLifeBits Stuff I’ve Seen Beagle ++ Cross domain Michał Kopycki
Collected data Michał Kopycki
A glimpse into user behavior • File access over folder hierarchy Michał Kopycki
A glimpse into user behavior • Web page visit length Michał Kopycki
Alternative to the public Desktop dataset Dataset 1 Dataset 2 Desktop dump Logging Framework Desktop dump Logging Framework Comparable Soft-repeatable Common structure Common output Dataset 3 Desktop dump Logging Framework Michał Kopycki
Seems hard, but… “It is possible” [BLA06],[APRILFOOL08],[HAHA07] DEADLINE Michał Kopycki