Identifying performance bottlenecks in cdns through tcp level monitoring
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring. Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University August 19, 2011. Performance Bottlenecks. CDN Servers. Internet. Clients. Client Insufficient receive buffer. Internet

Download Presentation

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Identifying performance bottlenecks in cdns through tcp level monitoring

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

Peng Sun

Minlan Yu, Michael J. Freedman, Jennifer Rexford

Princeton University

August 19, 2011


Performance bottlenecks

Performance Bottlenecks

CDN Servers

Internet

Clients

Client

Insufficient receive buffer

Internet

Network congestion

APP

Write too slowly

Server OS

Insufficient send buffer or Small initial congestion window


Reaction to each bottleneck

Reaction to Each Bottleneck

CDN Servers

Internet

Clients

Client is bottleneck:

Notify client to change

Internet is bottleneck:

Circumvent the congested part of network

APP is bottleneck:

Debug application

Server OS is bottleneck:

Tune buffer size, or upgrade server


Previous techniques not enough

Previous Techniques Not Enough

Application logs:

No details of network activities

Packet sniffing:

Expensive to capture

Transport-layer stats:

Directly reveal perf. bottlenecks

Active probing:

Extra load on network


How tcp stats reveal bottlenecks

How TCP Stats Reveal Bottlenecks

Insufficient data in send buffer

CDN Server Applications

Receive window too small

Packet loss

Send buffer full or Initial congestion window too small

Server

Network Stack

Clients

Network Path

CDN Servers

Internet

Clients


Measurement framework

Measurement Framework

  • Collect TCP statistics

    • Web100 kernel patch

    • Extract useful TCP stats for analyzing perf.

  • Analysis tool

    • Bottleneck classifier for individual connections

    • Cross-connection correlation at AS level

      • Map conn. to AS based on RouteView

      • Correlate bottlenecks to drive CDN decisions


How bottleneck classifier works

How Bottleneck Classifier Works

Small initial Cwin

Slow start limits perf.

Network Stack is bottleneck

BytesInSndBuf = Rwin

Rwin limits sending

Client is bottleneck

Cwin drops greatly

and Packet loss

Network path is bottleneck


Coralcdn experiment

CoralCDN Experiment

  • CoralCDN serves 1 million clients per day

  • Experiment Environment

    • Deployment: A Clemson PlanetLab node

    • Polling interval: 50 ms

    • Traces to Show: Feb 19th – 25th 2011

    • Total # of Conn.: 209K

    • After removingCache-Miss Conn.: 137K (Total 2008 ASes)

  • Log Space overhead

    • < 200MB per Coral server per day


What are major bottleneck for individual clients

What are Major Bottleneck for Individual Clients?

  • We calculate the fraction of time that the connection is under each bottleneck in lifetime

Reasons:

Slow CPU or scarce disk resources of the PlanetLab node

Our suggestion:

Filter them out of decision making

Reasons:

Congestion window rises too slowly for short conn.

(>80% of the connections last <1 second)

Our suggestion:

Use larger initial congestion window

Reasons:

Spotty network (discussed in next slide)

Reasons:

Receive buffer too small (Most of them are <30KB)

Our suggestion:

Use more powerful PlanetLab machines


As level correlation

AS-Level Correlation

  • CDNs make decision at the AS level

    • e.g., change server selection for 1.1.1.0/24

  • Explore at the AS level:

    • Filter out non-network bottlenecks

    • Whether network problems exist

    • Whether the problem is consistent


Filtering out non network bottlenecks

Filtering Out Non-Network Bottlenecks

  • CDNs change server selection if clients have low throughput

  • Non-network factors can limit throughput

  • 236 out of 505 low-throughput ASes limited by non-network bottlenecks

  • Filtering is helpful:

    • Don’t worry about things CDNs cannot control

    • Produce more accurate estimates of perf.


Network problem at as level

Network Problem at AS Level

  • CDN make decision at AS level

  • Whether conn. in the same AS have common network problem

  • For 7.1% of the ASes, half of conn. have >10% packet loss rate

  • Network problems are significant at the AS level


Consistent packet loss of as

Consistent Packet Loss of AS

  • CDNs care about predictive value of measurement

  • Analyze the variance of average packet loss rates

    • Each epoch (1 min) has nonzero average loss rate

    • Loss rate is consistent across epochs (standard deviation < mean)


Conclusion future work

Conclusion & Future Work

  • Use TCP-level stats to detect performance bottlenecks

  • Identify major bottlenecks for a production CDN

  • Discuss how to improve CDN’s operation with our tool

  • Future Works

    • Automatic and real-time analysis combined into CDN operation

    • Detect the problematic AS on the path

    • Combine TCP-level stats with application logs to debug online services


Thanks questions

Thanks!Questions?


  • Login