1 / 33

Large-Scale Collection of Application Usage Data to Inform Software Development

Large-Scale Collection of Application Usage Data to Inform Software Development. David M. Hilbert Information and Computer Science University of California, Irvine Irvine, California 92697-3425 dhilbert@ics.uci.edu http://www.ics.uci.edu/~dhilbert/. Overview. Background and Motivation

tamar
Download Presentation

Large-Scale Collection of Application Usage Data to Inform Software Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert Information and Computer Science University of California, Irvine Irvine, California 92697-3425 dhilbert@ics.uci.eduhttp://www.ics.uci.edu/~dhilbert/

  2. Overview • Background and Motivation • Dissertation and Evaluation • Insights and Hypotheses • Progress and Schedule • Dissertation Outline • Future Research

  3. Background and Motivation • Expectations influence designs, designs embody expectations • Mismatches between expectations and how applications are actually used can lead to breakdowns • Identification and resolution of mismatches can help improve fit between design and use • Behavior of applications, users, and usage environments complex and unpredictable enough that observation required • Research area: theories, methods, techniques to enable large-scale incorporation of application usage data in development

  4. Impact of the Internet • On the positive side • cheap, rapid, large-scale distribution of software for evaluation • simple transport mechanism for usage information and feedback • use and development becoming increasingly concurrent • should make incorporating usage information easier • On the negative side • reduces opportunities for traditional user testing • increases variety and distribution of users and usage situations • lack of scalable techniques and methods for incorporating usage information on a large scale

  5. Current Approaches • Current approaches suffer from significant limitations • usability testing => scale (size, scope, location, duration) • beta testing => data quality (incentives, knowledge, detail) • The user feedback paradox • users not having problems => provide feedback, negative reactions • users having problems => withhold feedback, positive reactions • The impact assessment problem • impact on user population of suspected or reported problems and potential changes

  6. Research Goals • Address issues of scale • enable larger scale evaluations (size, scope, location, duration) than currently possible with existing usability testing techniques • Address issues of data quality • enable higher quality data to be collected than currently possible with beta testers alone or existing automated techniques • Provide a complementary source of information • help address the feedback paradox and impact assessment problem in making design and effort allocation decisions

  7. Research Direction • Explore the use of automated software monitoring techniques • capture information about user interactions on a large scale • compare actual use against developers’ expectations • help automate mismatch identification and resolution process • make incorporating information about users more palatable to developers

  8. Dissertation • Technical issues • Abstraction Problem (data quality) • Selection Problem (data quality/scale) • Context Problem (data quality) • Reduction Problem (scale) • Evolution Problem (scale) • Hypothesis • all these problems can be addressed by embedding the right kinds of data collection mechanisms within an appropriate data collection architecture

  9. Dissertation (cont’d) • Theoretical/methodological issues • aside from “technical issues”, it isn’t clear what data to collect and why, and how to incorporate results in development • since data collection and analysis can be expensive, guidance can increase the chances that the cost/benefit ratio will be favorable • Hypothesis • a theory and method based on usage expectations can be elaborated to provide motivation and guidance for incorporating data collection and analysis in development

  10. Contributions • Identification of key issues limiting scalability and data quality inherent in current techniques • Solutions to the abstraction, selection, context, reduction, and evolution problems within a single data collection architecture • A reference architecture to provide design guidance regarding key components and relationships • Theory to motivate the significance of usage expectations in development and importance of collecting usage information • Methodological guidance regarding collection, analysis, interpretation, and incorporation of results in development

  11. Evaluation • Prototype • demonstrate solutions to the abstraction, selection, context, reduction, and evolution problems within a single data collection architecture • Informal empirical evaluation • assess usability and utility of approach based on feedback from independent developers who integrated the prototype in a research demonstration scenario • Participant observation of an industrial project • foundation for an analytical evaluation of the techniques, reference architecture, theory, and method

  12. The Abstraction Problem • Observation • questions about usage typically occur in terms of concepts at higher levels of abstraction than represented in data provided by application components • questions of usage can occur at multiple levels of abstraction • Hypothesis • simple “data abstraction” mechanisms (based on grammatical techniques) can be constructed to allow low-level data to be related to higher-level concepts such as UI and application features as well as users’ tasks and goals • this can impact the results of human and automated analyses

  13. The Selection Problem • Observation • the amount of data necessary to answer usage questions will typically be a relatively small subset of the much larger set of data that might be recorded at any given time • collecting too much data can make it difficult to separate events and patterns of interest from the “noise” • Hypothesis • simple “data selection” mechanisms (based on events, event sequences, values, and value vectors) can be constructed to allow important data to be captured - and unimportant data filtered - prior to reporting • this can impact the results of human and automated analyses, not to mention scalability

  14. The Context Problem • Observation • information required to interpret the significance of events may not be available in the events produced by application components • contextual information may be spread across multiple events or missing altogether, but is frequently available “for the asking” from the application, artifacts, or user • Hypothesis • simple “context-capture” mechanisms (that provide access to application, artifact, and user state information) can be exploited to allow context to be used in interpreting the significance of events • this can also help in capturing important information not available in events

  15. The Reduction Problem • Observation • much of the analysis that will ultimately be performed to answer usage questions can actually be performed during data collection resulting in greatly reduced data reporting and post-hoc analysis needs • when analysis is left as last step it is often not performed • Hypothesis • simple “data reduction” mechanisms (e.g., for performing counts and other simple analyses during collection) can be constructed to reduce the amount of data that must ultimately be reported and analyzed • this can impact scalability and likelihood that data will be analyzed

  16. The Evolution Problem • Observation • data collection needs will typically evolve over time (perhaps due to results of earlier data collection) more rapidly than the application • unnecessary coupling of data collection and application code can increase cost and even cripple evolution of data collection • Hypothesis • “evolvable” data collection mechanisms (based on encapsulating abstraction, selection, context-capture, and reduction decisions) can be constructed to allow data collection to evolve over time without impacting application deployment or use • this can impact the practicality of performing data collection

  17. Approach • Expectation-Driven Event Monitoring (EDEM)

  18. Agent Specs saved w/ URL Development Computer DevelopmentComputer Java Virtual Machine Java Virtual Machine AgentSpecs CollectedData Top Level Window& UI Events Top Level Window& UI Events ApplicationUI Components ApplicationUI Components EDEMActive Agents EDEMActive Agents Property Queries Property Queries HTTPServer EDEMServer Property Values Property Values User Computer Agent Reports sent via E-mail Agent Specs loaded via URL EDEM Architecture

  19. Reference Architecture DataCapture Abstraction, Selection, Context, Reduction DataPackaging DataTransport DataPrep DataAnalysis SystemModel ofUI & App: Components Events Properties Methods AnalystModel ofUI & App: Features, Dialogs, Controls, User-Supplied Values, User Tasks Mapping

  20. Reference Architecture (Word IV) Instrumentation intertwined w/ app DataCapture Abstraction, Selection, Context, Reduction DataPackaging DataTransport DataPrep DataAnalysis SystemModel ofUI & App: Components Events Properties Methods AnalystModel ofUI & App: Features, Dialogs, Controls, User-Supplied Values, User Tasks Mapping

  21. Reference Architecture (Office IV) Event monitoring infrastructure DataCapture Abstraction, Selection, Context, Reduction DataPackaging DataTransport DataPrep DataAnalysis TestWizard Database of Office UI SystemModel ofUI & App: Components Events Properties Methods AnalystModel ofUI & App: Features, Dialogs, Controls, User-Supplied Values, User Tasks Mapping

  22. Reference Architecture (EDEM) Event monitoring infrastructure DataCapture Abstraction, Selection, Context, Reduction DataPackaging DataTransport DataPrep DataAnalysis “Pluggable” Data Abstraction, Selection, Context-Capture, and Reduction Expectation Agents SystemModel ofUI & App: Components Events Properties Methods AnalystModel ofUI & App: Features, Dialogs, Controls, User-Supplied Values, User Tasks Mapping

  23. Product Status Comments Prototype Needs Work Prototype requires porting and other extensions Theory and Method Needs Work Theory and method require further elaboration Reference Architecture Near Done Design guidance requires further elaboration Survey Done N/A Informal evaluation Done N/A Participant observation Near Done Further analysis of observations required Dissertation Progress

  24. Description Venue Status Prototype Theory/Method Reference Arch. Techniques Survey Conf. Demo ICS97 Accepted X Conf. Demo IUI98 Accepted X X Conf. Paper ICSE98 Accepted X X X Conf. Paper Agents98 Accepted X X X Work. Paper CSCW98 Accepted X X Journ. Paper IEEE TSE In Review X X X X Journ. Paper ACM Surveys In Review X Dissemination Progress

  25. Product Schedule Comments Prototype extension Dec-Jan ‘99 port; update event model; explicit support for 5 techniques Theoretical elaboration Jan-Feb ‘99 elaborate theory/method based on “participant observation” Document results Feb ‘99 should already be well into writing Final defense May ‘99 schedule ahead of time w/ Grudin Buffer period May-Jul ‘99 wrap up any loose ends Schedule for Work Remaining

  26. Dissertation Outline • Introduction (General Introduction) • Expectations in Software Development (highlight theory) • Impact of the Internet (problems and opportunities) • Problems with Current Practice (usability and beta testing) • Proposed Solution (foreshadow insights, approach, contributions) • Extracting Usage Data from User Interaction Events (State of the Art) • Synch and Search • Abstraction, Filtering, and Recoding • Counts and Summary Statistics • Sequence Detection • Sequence Comparison • Sequence Characterization • Visualization • Integrated Support

  27. Dissertation Outline (cont’d) • Key Problems and Insights (Problem Statement) • The Abstraction Problem (meaningfulness) • The Selection Problem (meaningfulness) • The Context Problem (meaningfulness) • The Reduction Problem (scalability/practicality) • The Evolution Problem (scalability/practicality) • Interdependencies and Interactions • Need for Theoretical and Methodological Guidance

  28. Dissertation Outline (cont’d) • Expectation-Driven Event Monitoring (Solution Statement) • Theory and Method (based on research and Microsoft experience) • Expectations in development • Identifying expectations • Integrating data collection in the development process • Analyzing data and interpreting results • A sample usage data collection process • Techniques for Addressing Current Limitations (description of prototype) • Data Abstraction • Data Selection • Context Capture • Data Reduction • Evolution • Reference Architecture (based on prototype and Microsoft experience) • Architectural components and relationships • Supporting large-scale data collection

  29. Dissertation Outline (cont’d) • Experience and Evaluation (Evaluation of Solution) • The GTN scenario • Study Goals • Description • Results • Participant observation of an industrial project • Study Goals • Description • Results • Collection, analysis, and reporting goals • Challenges and limitations (addressed by this research) • Lessons learned (informing this research)

  30. Dissertation Outline (cont’d) • Conclusions • Conclusions • Summary of Contributions • Future Research • References • Appendices

  31. Future Research • Large-scale evaluation of research in practice • nature of usage information • issues in interpretation and incorporation of results • evolution and maintenance issues • Other possible extensions • exploit relationships between expectations and other requirements-related artifacts, e.g. use cases, cognitive walkthroughs, task analysis • explore issues of adaptability and reuse of infrastructure and default analyses • analysis of changes in usage over time • analysis of usage involving multiple cooperating users

  32. Other Possible Applications • Support for adaptive UI/application behavior based on long-term information about user (or users’) actions • Support for "smarter" delivery of help/suggestions/assistance based on long-term information about user (or users’) actions • Support for monitoring of other component-based software systems • low-level data must be related to higher level concepts of interest • available information exceeds that which can practically be collected • data collection needs evolve over time more quickly than application

  33. Theory/Method Motivation Insight Prototype Motivation Insight ReferenceArchitecture Evaluation Insight Evaluation Insight Evaluation Insight Evaluation Insight Survey GTNScenario Microsoft Experience Evaluation Insight Research Process

More Related