a flexible and efficient api for a customizable proxy cache
Download
Skip this Video
Download Presentation
A Flexible and Efficient API for a Customizable Proxy Cache

Loading in 2 Seconds...

play fullscreen
1 / 53

A Flexible and Efficient API for a Customizable Proxy Cache - PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on

A Flexible and Efficient API for a Customizable Proxy Cache. Vivek S. Pai, Alan L. Cox, Vijay S. Pai, and Willy Zwaenepoel. iMimic Networking, Inc. http://www.imimic.com. Motivation. More features moving into proxy caches The ubiquitous layer 7 device

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Flexible and Efficient API for a Customizable Proxy Cache' - jael-newton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a flexible and efficient api for a customizable proxy cache

A Flexible and Efficient API for a Customizable Proxy Cache

Vivek S. Pai, Alan L. Cox,

Vijay S. Pai, and Willy Zwaenepoel

iMimic Networking, Inc.

http://www.imimic.com

motivation
Motivation
  • More features moving into proxy caches
    • The ubiquitous layer 7 device
    • Filtering, reporting, CDN support, transformation
    • Lots of this being done one-off, ad hoc
    • Can’t know everything at deployment
  • Some approaches for generalization
    • ICAP/OPES, proprietary mechanisms
    • But design considerations shifting
  • Goal: new approach for modern environments
contributions
Contributions
  • Designed event-friendly proxy API
  • Implemented on iMimic DataReactor cache
  • Imposes negligible performance overhead
  • Demo modules
    • High performance
    • Low interference
outline
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions
proxy cache concepts
Proxy Cache Concepts

clients

WAN

proxy cache

LAN

origin servers

why program a proxy
Why Program a Proxy?
  • It’s at the right point in network
    • Sees all client-side and server-side HTTP traffic
    • Can react to both LAN and WAN conditions
  • Already examines layer 7
  • Groundwork in place for value-adds
    • Content filtering, access control, etc.
enabling technologies
Enabling Technologies
  • Moore’s Law
    • CPU speeds outstripping all other components
    • Lots of cycles to burn…
  • Proxy software
    • Increasing efficiency in managing connections, disk storage, etc.
  • Commodity OS/hardware improvements
    • No longer need specialized systems to run efficient proxy caches
commodity system improvements
Commodity System Improvements
  • 1997: Appliances 4x faster than software running on a 2-processor UltraSparc
  • [Source: Danzig, “NetCache Architecture and Deployment”]
commodity system improvements1
Commodity System Improvements
  • 1997: Appliances 4x faster than software running on a 2-processor UltraSparc
  • [Source: Danzig, “NetCache Architecture and Deployment”]
  • 1st NLANR cacheoff (April ’99): gap only 2.5 x
    • 600 req/sec (Peregrine) vs. 1500 (InfoLibria)
commodity system improvements2
Commodity System Improvements
  • 1997: Appliances 4x faster than software running on a 2-processor UltraSparc
  • [Source: Danzig, “NetCache Architecture and Deployment”]
  • 1st NLANR cacheoff (April ’99): gap only 2.5 x
  • 2nd cacheoff (Jan ’00): gap only 1.7x
    • 1450 req/sec (iMimic) vs. 2400 (Compaq)
commodity system improvements3
Commodity System Improvements
  • 1997: Appliances 4x faster than software running on a 2-processor UltraSparc
  • [Source: Danzig, “NetCache Architecture and Deployment”]
  • 1st NLANR cacheoff (April ’99): gap only 2.5 x
  • 2nd cacheoff (Jan ’00): gap only 1.7x
  • 3rd cacheoff (Oct ’00): gap only 15%
    • 2083 req/sec (Microsoft) vs. 2400 (Compaq)
commodity system improvements4
Commodity System Improvements
  • 1997: Appliances 4x faster than software running on a 2-processor UltraSparc
  • [Source: Danzig, “NetCache Architecture and Deployment”]
  • 1st NLANR cacheoff (April ’99): gap only 2.5 x
  • 2nd cacheoff (Jan ’00): gap only 1.7x
  • 3rd cacheoff (Oct ’00): gap only 10%
  • 4th cacheoff (Dec ’01): commodity system best
    • Performance record: 2700 req/sec (Cintel/iMimic)
how free is the cpu
How free is the CPU?
  • Stratacache Dart-10, with Nokia phone
  • 120 req/sec (7 Mbps) with 300 MHz CPU
    • CPU mostly idle; performance disk-limited
outline1
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions
previous customization approaches
Previous Customization Approaches
  • Write your own proxy or modify Squid
    • Huge code, changes likely to conflict with updates
  • ICAP: TCP-based offload
    • Proxy redirects requests/responses to a separate server for modification
  • Filter-style processes
    • Plugins where proxy designers anticipated a need (e.g., content filtering)
  • Kernel modules
    • Difficult programming model, but needed for kernel-integrated proxies
reasons for a new approach
Reasons for a New Approach
  • Scalability needed to > 10,000 flows
    • Filter processes may not scale
  • Limitations of ICAP-style offloading
    • Offloading small requests adds latency
    • Need for separate ICAP server with own CPU
  • Programmers want flexibility
    • Program in C using standard OS and libraries
    • Avoid problems from later code conflicts
design of the proxy api
Design of the Proxy API
  • Event-aware
    • Modules notified as requests/responses arrive
    • Maps well to implementation of modern proxies
  • HTTP-Complete
    • Capture all key interactions in HTTP request-response protocol for full flexibility
  • Support various programming models
    • Events, threads, processes
    • Communication via function call or socket
http data flows
HTTP Data Flows

Cache

Misses

Requests

Server

Client

Proxy Cache

New

Content

Responses

Cache

Hits

Cached

Content

Storage System

http data flows and the api
HTTP Data Flows and the API

Server

modify

modify

Client

Proxy Cache

modify

modify

modify

Storage System

http request response structure

Response Status Code

Response header line 1

Response header line 2

...

Response header line N

<blank terminating line>

Actual response “body,"

containing HTML file,

image binary data, etc.

HTTP Request-Response Structure

Requested URL

Request header line 1

Request header line 2

...

Request header line N

<blank terminating line>

Header block – special first line followed by

more detail about

request/response

Optional request “body"

used in POST requests

for forms, etc.

Body data

design of api notifications
Design of API Notifications
  • typedef struct DR_FuncPtrs {
  • DR_InitFunc *dfp_init; // on module load
  • DR_ReconfigureFunc *dfp_reconfig; // on config change
  • DR_FiniFunc *dfp_fini; // on module unload
  • DR_ReqHeaderFunc *dfp_reqHeader; // when req hdr done
  • DR_ReqBodyFunc *dfp_reqBody; // on each piece of req body
  • DR_ReqOutFunc *dfp_reqOut; // before req to remote srv
  • DR_DNSResolvFunc *dfp_dnsResolv; // when DNS resolution needed
  • DR_RespHeaderFunc *dfp_respHeader; // when resp hdr done
  • DR_RespBodyFunc *dfp_respBody; // on each piece of resp body
  • DR_RespReturnFunc *dfp_respReturn; // when resp returned to clt
  • DR_TransferLogFunc *dfp_logging; // log entry after req done
  • DR_OpaqueFreeFunc *dfp_opaqueFree; // when each resp completes
  • DR_TimerFunc *dfp_timer; // periodic maintenance
  • int dfp_timerFreq; // timer period (sec)
  • } DR_FuncPtrs;
outline2
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions
api functions
API Functions
  • Content Adaptation
  • Content Management
  • Customized Administration
  • Utility Functions
content adaptation
Content Adaptation
  • Functions to allow modules to inspect and modify requests and replies through cache

Server

modify

modify

Client

Proxy Cache

modify

modify

modify

Storage System

content adaptation cont d
Content Adaptation (cont’d)
  • Example uses
    • Integration into a CDN based on URL rewriting
    • Transcoding for mobile devices
  • Special features of cache integration
    • Store modified content
    • Return multiple versions using HTTP Vary header
content management
Content Management
  • Fine-grained control over cacheability
    • Content-freshness modification/eviction
    • Content preloading
    • Content querying
  • Example uses
    • News CDN needs new home page on major event
    • Premium services
customized administration
Customized Administration
  • Notifications on logging
  • Example uses
    • Aggregation at network operation centers
    • Detection of high error rates indicates bad links
utility functions
Utility Functions
  • Interfaces to underlying OS event-notification
    • Module may register or clear interest on FD events
    • API will automatically call back module
    • Independent of underlying OS mechanisms (e.g., poll, select, /dev/poll, kevent)
  • Configuration options processing
outline3
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions
implementation in datareactor
Implementation in DataReactor
  • Commercial proxy server
    • Portable (x86, Alpha, Sparc), and

(FreeBSD, Linux, Solaris)

    • Fast (exposes overheads)
    • Independently measured at Proxy Cache-Offs (alone or via OEMs)
  • Support requires < 1000 lines of code
  • Implementation < 6 person-months
sample modules
Sample Modules
  • Ad Remover
    • Matches ad patterns in Hostname, URI
  • Dynamic Compressor
    • Uses zlib to compress, store, & serve object
  • Image Transcoder
    • Color stripping via NetPBM & ijpeg helpers
  • Text Injector
    • Finds <head> tag, asks helper what to insert
  • Content Manager
    • Local telnet, then query, fetch, inject, evict objects
  • ICAP client
    • Implements ICAP 1.0 draft to use external server
measurement
Measurement
  • Polygraph and PolyMix-3, Measurement Factory
    • De facto standard for proxy testing
  • Scales with load
    • Number of clients
    • Number of servers
    • Data set size
    • Working set size
  • Very long test time
    • Fill phase (~14 hours)
    • Test phase (~10 hours)
polygraph test phases
PolyGraph Test Phases

Fill Phase

1st Load Phase

2nd Load Phase

0 5 10 15 20 25 30

Time (hours)

polygraph hit rates
PolyGraph Hit Rates

Cacheable

Offered

Actual

our test environment
Our Test Environment
  • Proxy - 1.4GHz Athlon, 2GB memory
  • 5 SCSI disks, GigE, FreeBSD
  • Harness
    • 10 Polygraph client/server machines
    • Target load: 1450 reqs/sec
    • 16000 simultaneous connections
  • Pmix-3: Modified Polymix-3
    • Single fill phase for all tests
    • Load phase time cut in half
    • Slight increase in hit rate
outline4
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions
summary
Summary
  • CPUs getting more idle
  • Commodity OS suitable choices
  • High-concurrency servers needed
  • Customizable, efficient event-friendly API
  • Implemented with low overhead
  • Sample results, deployments promising
ongoing work
Ongoing Work
  • CoDeeN – a CDN system on PlanetLab
    • Uses a customized version of DataReactor
    • Being built at Princeton
    • Prototype: 1 week reading + 1 week reading
    • Currently: ~42 nodes (one per site)
  • Lessons
    • API easy enough for busy grad students
    • Logging infrastructure would be nice
    • Want to mask non-HTTP failures
questions

Questions?

[email protected]

iMimic Networking, Inc.

http://www.imimic.com/

ad