Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

Measuring Human Satisfaction in Data Networks Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

Outline • What we are trying to do • Create “Mean Opinion Score” for data applications • What we’ve done • Results of preliminary experiments • How can we improve our model • Guidance would be much appreciated

Motivation • DQoS (Data Quality of Service tool) • Visualizations of wireless data performance • Used to measure quality of wireless data link Average thruputs Latency Round Trip Times Instantaneous thruputs SNR TCP Timeouts

Objective -> Subjective • Common questions • What is good performance? • What do data users actually need? • Objective measurements useful but not enough • Need to convert objective measurements into subjective score • In voice world, have notion of “Mean Opinion Score” • Converts echo, distortion, latency into “opinion score” • Can we create a “Data Mean Opinion Score” for data applications?

Data Mean Opinion Score USER PERCEPTION OBJECTIVE PERFORMANCE • Can we map objective performance into user perception? • Is there some minimum threshold for acceptable performance? • Is there some maximum threshold beyond which better performance isn’t noticed?

Strategy Raw Packet Traces Network Measurement Tool (e.g. DQOS tool, tcpdump etc) Objective statistics throughputs, latencies, timeouts etc Subjective Performance Evaluation module (Data MOS function) Data MOS score

Methodology for Subjective Performance Evaluation Module • Use (wireless) link emulation software • Enables us to run real applications over variety of link conditions • Measure satisfaction with human subjects • Subjects use typical data applications under a given link condition • Subjects given several typical real-world tasks to obtain ecologically valid scores • Subjects are asked to score their experiences using a variety of measures • Create Data MOS function • Use Principle Components Analysis (PCA) to disentangle effects of different tasks and reduce number of objective/subjective variables • Generate Data MOS function that maps objective and subjective measures into a single score. DATA MOS FUNCTION CLIENT SERVER LINK EMULATION SUBJECTIVE OBJECTIVE

User applications • This talk • Web browsing (canonical data application) • Other applications we’ve tested • FTP • Exchange email • Instant messaging

Important questions . . . • Does the data MOS function exist? • How much variation is there person to person? • What variables among objective measurements are most influential on data MOS • Bandwidth (mean&variance)? Latency? Jitter? What else? • When comparing different commercial networks: How do we take into account network congestion? How do we take into account geographical location of measured users?

Important questions... • Network effects vs website effects? • How much is user perception determined by network effects? • How much is user perception determined by website design? • Frustration due to intricate page with many small objects • Frustration due to poorly designed site that is hard to navigate  Investigating very low-latency sites (google), highly designed and well-branded sites (Barnes and Noble), poorly designed sites (NJ Transit) and hard-to-find information on well-designed sites (HowStuffWorks). • Influence of user goals • User who is told to simply download a webpage may have different opinions from users needing to complete a more complex task. • Marketing concept of “flow” known to affect perceived passage of time & delays  Users not simply rating delay, but delay is allowed to affect their ratings of quality.

Experimental Design

User tasks • Simple page downloads • Users quickly get to information, then may browse or read slowly (ranges from 2 pages to 4). • Steps toward goal are clear, user is simply “waiting” between clicks. • Go to www.google.com. Search for “Bell Labs” • Go to www.cnn.com. Click on the “Politics” section. • Go to www.espn.com. Find the current position of the New York Yankees. (Click on MLB. Click on “standings”.) Measures: • opinions: overall quality, directed questions, etc. • competence: ability to reach page, correctness for question #3.

User tasks • Goal driven tasks • Users don’t necessarily know how to get to information directly (ranges from <5 pages to sometimes >> 10 pages). • Many steps toward goal, not every step makes progress toward goal, sometimes users can’t even find the information (although information is always available). • Go to www.bn.com. Find the price of the book “Friday” by Robert A. Heinlein. Easy • Go to www.njtransit.com. Find the timetable for the Morris and Essex train line. What time is the first outbound train from Penn Station New York? Difficult • Go to www.howstuffworks.com/laser.htm. What kind of laser can cut through steel? Difficult • Rate four Rutgers professors at www.ratemyprofessor.com Long • Find six world records at www.guinessworldrecords.com Long Measures: opinions: overall quality, directed questions, etc. competence: ability to reach page, correctness

Questionnaire • For each task we ask: • Question 1: What is your opinion of the overall quality of this web surfing experience? • Question 2: How easy was it for you to complete the task? • Question 3: Was it easy to find information on the website? • Question 4: Was the site visually appealing? • Question 5: Did the website seem sluggish or responsive? • Question 6: How quickly did the website load? • Also measure • Network conditions • Did subject complete task • Did subject answer correctly

Network configuration • Bandwidth • Link bandwidths varied between 20kbps and 1Mbps • Bandwidth was held constant for each task • Assignment of bandwidth to task done randomly for each subject • Delay • Propagation delay varied between 0ms and 300ms • However, queuing delay still present!!!!

Results

Recap • Nine web browsing tasks • Google search • CNN download • ESPN download • Search for book price on Barnes&Noble • Look up train time on NJ transit • Find out about a laser on howstuffworks.com • Rate Rutgers professors on ratemyprofessor.com • Look up world records on Guinness world record site • Questionnaire • Q1: overall opinion • Q2 & Q3: ease of use • Q4: visual appeal • Q5: website responsiveness • Q6: download speed

Results • Run 83 subjects • Main results • At low bandwidth, opinion is linear with log of bandwidth • At high bandwidth, opinion score “saturates” • No difference observed between 0ms prop delay and 300ms prop • Three notions of bandwidth • Link bandwidth – speed of link • “Browser bandwidth” – how fast does browser take in data • “Bandwidth opinion” – answers to Q5 and Q6 on questionnaire log(Link bandwidth) log(Browser bandwidth) Bandwidth opinion log(Link bandwidth)

Results Download speed Responsiveness Overall opinion 200 400 kbps

Results Guiness world records Google CNN ESPN B&N NJ Transit Laser Professors Download speed Responsive Overall opinion

Results

PCA • Two main factors explain most variance in subject ratings. • Not surprising, since we focus study (and therefore subjects) on two factors: “Delivery” and “Design”. • Overall quality is a roughly balanced combination of both of these factors. • Only a portion of the log2(Bandwidth) is explained . . . “Link Bandwidth” and “Bandwidth Opinion” are different. original factor 1 (unrotated) “Design” Overall “Delivery” link bandwidth  delivered bandwidth  perceived bandwidith  bandwidth opinion (what we control) (what we measure)

Saturation • Why does Google saturate? • Partly due to browser saturation • Partly due to human saturation Real download speed Opinion of Download speed Overall opinion

How do bandwidth, delay, compression affect objective performance? • www.cnn.com • Response time (Time for text to appear on screen) • Download time (Time for page to complete after text appears) • However, maybe users aren’t responding directly to response time and download time vary bandwidth no propagation delay compression vary bandwidth no propagation delay no compression

How do bandwidth, delay, compression affect objective performance? • www.cnn.com • Response time (Time for text to appear on screen) • Download time (Time for page to complete after text appears) 200kbps bandwidth vary propagation delay Curve almost flat Propagation delay dominated by queueing delay!!!

Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

Matthew Andrews Jin Cao Jim McGowan Bell Labs April 27, 2006

Presentation Transcript

Matthew Andrews World Bank

Volker Hilt volkerhbell-labs Bell Labs

Depository Risk Controls Jim Micklethwaite 21 April 2006

MB1 Advisory Committee Review April 27, 2006

Matthew 14:27

Matthew 14:27

Matthew 27

The Internet is Broken, and How to Fix It Jim Gettys Bell Labs July 27, 2012

SUNRISE SENIOR LIVING, Inc. April 27, 2006

RHSC Meeting NY, April 27-28, 2006

Matthew Andrews Bell Labs Joint with Lisa Zhang February 16, 2006

Milan Vojnović EPFL Joint work with Matthew Andrews Bell Laboratories, Murray Hill, NJ

Matthew 18.23-27

April 27, 2006

Matthew Andrews Bell Labs June 27, 2005

The Internet is Broken, and How to Fix It Jim Gettys Bell Labs July 27, 2012

Depository Risk Controls Jim Micklethwaite 21 April 2006

Matthew 27:54

ARIES Meeting, Madison (April 27, 2006 )