1 / 25

Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

Measuring Human Satisfaction in Data Networks. Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006. Outline. What we are trying to do Create “Mean Opinion Score” for data applications What we’ve done Results of preliminary experiments How can we improve our model

tadeo
Download Presentation

Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Human Satisfaction in Data Networks Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

  2. Outline • What we are trying to do • Create “Mean Opinion Score” for data applications • What we’ve done • Results of preliminary experiments • How can we improve our model • Guidance would be much appreciated

  3. Motivation • DQoS (Data Quality of Service tool) • Visualizations of wireless data performance • Used to measure quality of wireless data link Average thruputs Latency Round Trip Times Instantaneous thruputs SNR TCP Timeouts

  4. Objective -> Subjective • Common questions • What is good performance? • What do data users actually need? • Objective measurements useful but not enough • Need to convert objective measurements into subjective score • In voice world, have notion of “Mean Opinion Score” • Converts echo, distortion, latency into “opinion score” • Can we create a “Data Mean Opinion Score” for data applications?

  5. Data Mean Opinion Score USER PERCEPTION OBJECTIVE PERFORMANCE • Can we map objective performance into user perception? • Is there some minimum threshold for acceptable performance? • Is there some maximum threshold beyond which better performance isn’t noticed?

  6. Strategy Raw Packet Traces Network Measurement Tool (e.g. DQOS tool, tcpdump etc) Objective statistics throughputs, latencies, timeouts etc Subjective Performance Evaluation module (Data MOS function) Data MOS score

  7. Methodology for Subjective Performance Evaluation Module • Use (wireless) link emulation software • Enables us to run real applications over variety of link conditions • Measure satisfaction with human subjects • Subjects use typical data applications under a given link condition • Subjects given several typical real-world tasks to obtain ecologically valid scores • Subjects are asked to score their experiences using a variety of measures • Create Data MOS function • Use Principle Components Analysis (PCA) to disentangle effects of different tasks and reduce number of objective/subjective variables • Generate Data MOS function that maps objective and subjective measures into a single score. DATA MOS FUNCTION CLIENT SERVER LINK EMULATION SUBJECTIVE OBJECTIVE

  8. User applications • This talk • Web browsing (canonical data application) • Other applications we’ve tested • FTP • Exchange email • Instant messaging

  9. Important questions . . . • Does the data MOS function exist? • How much variation is there person to person? • What variables among objective measurements are most influential on data MOS • Bandwidth (mean&variance)? Latency? Jitter? What else? • When comparing different commercial networks: How do we take into account network congestion? How do we take into account geographical location of measured users?

  10. Important questions... • Network effects vs website effects? • How much is user perception determined by network effects? • How much is user perception determined by website design? • Frustration due to intricate page with many small objects • Frustration due to poorly designed site that is hard to navigate  Investigating very low-latency sites (google), highly designed and well-branded sites (Barnes and Noble), poorly designed sites (NJ Transit) and hard-to-find information on well-designed sites (HowStuffWorks). • Influence of user goals • User who is told to simply download a webpage may have different opinions from users needing to complete a more complex task. • Marketing concept of “flow” known to affect perceived passage of time & delays  Users not simply rating delay, but delay is allowed to affect their ratings of quality.

  11. Experimental Design

  12. User tasks • Simple page downloads • Users quickly get to information, then may browse or read slowly (ranges from 2 pages to 4). • Steps toward goal are clear, user is simply “waiting” between clicks. • Go to www.google.com. Search for “Bell Labs” • Go to www.cnn.com. Click on the “Politics” section. • Go to www.espn.com. Find the current position of the New York Yankees. (Click on MLB. Click on “standings”.) Measures: • opinions: overall quality, directed questions, etc. • competence: ability to reach page, correctness for question #3.

  13. User tasks • Goal driven tasks • Users don’t necessarily know how to get to information directly (ranges from <5 pages to sometimes >> 10 pages). • Many steps toward goal, not every step makes progress toward goal, sometimes users can’t even find the information (although information is always available). • Go to www.bn.com. Find the price of the book “Friday” by Robert A. Heinlein. Easy • Go to www.njtransit.com. Find the timetable for the Morris and Essex train line. What time is the first outbound train from Penn Station New York? Difficult • Go to www.howstuffworks.com/laser.htm. What kind of laser can cut through steel? Difficult • Rate four Rutgers professors at www.ratemyprofessor.com Long • Find six world records at www.guinessworldrecords.com Long Measures: opinions: overall quality, directed questions, etc. competence: ability to reach page, correctness

  14. Questionnaire • For each task we ask: • Question 1: What is your opinion of the overall quality of this web surfing experience? • Question 2: How easy was it for you to complete the task? • Question 3: Was it easy to find information on the website? • Question 4: Was the site visually appealing? • Question 5: Did the website seem sluggish or responsive? • Question 6: How quickly did the website load? • Also measure • Network conditions • Did subject complete task • Did subject answer correctly

  15. Network configuration • Bandwidth • Link bandwidths varied between 20kbps and 1Mbps • Bandwidth was held constant for each task • Assignment of bandwidth to task done randomly for each subject • Delay • Propagation delay varied between 0ms and 300ms • However, queuing delay still present!!!!

  16. Results

  17. Recap • Nine web browsing tasks • Google search • CNN download • ESPN download • Search for book price on Barnes&Noble • Look up train time on NJ transit • Find out about a laser on howstuffworks.com • Rate Rutgers professors on ratemyprofessor.com • Look up world records on Guinness world record site • Questionnaire • Q1: overall opinion • Q2 & Q3: ease of use • Q4: visual appeal • Q5: website responsiveness • Q6: download speed

  18. Results • Run 83 subjects • Main results • At low bandwidth, opinion is linear with log of bandwidth • At high bandwidth, opinion score “saturates” • No difference observed between 0ms prop delay and 300ms prop • Three notions of bandwidth • Link bandwidth – speed of link • “Browser bandwidth” – how fast does browser take in data • “Bandwidth opinion” – answers to Q5 and Q6 on questionnaire log(Link bandwidth) log(Browser bandwidth) Bandwidth opinion log(Link bandwidth)

  19. Results Download speed Responsiveness Overall opinion 200 400 kbps

  20. Results Guiness world records Google CNN ESPN B&N NJ Transit Laser Professors Download speed Responsive Overall opinion

  21. Results

  22. PCA • Two main factors explain most variance in subject ratings. • Not surprising, since we focus study (and therefore subjects) on two factors: “Delivery” and “Design”. • Overall quality is a roughly balanced combination of both of these factors. • Only a portion of the log2(Bandwidth) is explained . . . “Link Bandwidth” and “Bandwidth Opinion” are different. original factor 1 (unrotated) “Design” Overall “Delivery” link bandwidth  delivered bandwidth  perceived bandwidith  bandwidth opinion (what we control) (what we measure)

  23. Saturation • Why does Google saturate? • Partly due to browser saturation • Partly due to human saturation Real download speed Opinion of Download speed Overall opinion

  24. How do bandwidth, delay, compression affect objective performance? • www.cnn.com • Response time (Time for text to appear on screen) • Download time (Time for page to complete after text appears) • However, maybe users aren’t responding directly to response time and download time vary bandwidth no propagation delay compression vary bandwidth no propagation delay no compression

  25. How do bandwidth, delay, compression affect objective performance? • www.cnn.com • Response time (Time for text to appear on screen) • Download time (Time for page to complete after text appears) 200kbps bandwidth vary propagation delay Curve almost flat Propagation delay dominated by queueing delay!!!

More Related