170 likes | 280 Views
CoDeeN, developed by Kyoung Soo Park, Vivek Pai, and Larry Peterson at Princeton University, is a cutting-edge Content Distribution Network (CDN) that significantly enhances web page loading speeds by connecting users to nearby, less overloaded servers. With over 150 nodes and serving 6.5 million requests daily, CoDeeN is particularly adept at handling large files, averaging 700MB each. This presentation explores CoDeeN’s operational strategies, challenges in file size management, and new beta tools for node support, paving the way for next-level content delivery services.
E N D
CoDeeN,Large Files, & CoDeploy KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University
What Is CoDeeN? • Content Distribution Networks • Web pages load faster if • You’re contacting a nearby server • That server isn’t overloaded • The page is already in memory • You use long-lived TCP connections right
CoDeeN By The Numbers • In operation ~10 months • 150 nodes (~120 live) • 6.5 million reqs/day • 5 million “good” reqs/day • about 300GB/day (estimate) • 7K-20K unique IPs per 24 hours • Over 600,000 unique IPs served
Our “Strategy” • Stay operational • Build some credibility • Exploit that + activity to branch out • Involves doing sales pitches • Tap into new consumers • In particular, nonprofits, non-commercial
How Big? • 200 TeraBytes of data total • Interviews: about 3.5GB each • Files: average of 700MB each
Problem: “Nobody” Handles 700MB • CDNs designed for avg size 10KB • 1MB = 100 files • 700MB = 70,000 files • Commercial disks ~ 100GB • Our storage ~ 3GB
slow client New Problems • Why not replicate less? • You’re farther away • Why not merge requests? client readahead
file0-1 file1-2 file0-1 file file2-3 file4-5 file3-4 file4-5 Our Approach CDN CDN Client Agent CDN CDN Server CDN CDN
GET name/ranges Header: blah Header: blah HTTP/1.0 206 Partial Range: start-end/length Header: blah GET name Range: bytes ranges Header: blah HTTP/1.0 200 OK Content-length: piece length New-header: obj length Low-Level HTTP Stuff egress ingress
Benefits • Transparent to client (no software) • Server only needs byte-range support • Every real server has it • Will generate more log entries • Can use/augment HTTP infrastructure • Caching, redirection, etc • Adding security controls • Low incremental overhead • Agent is about 300 semicolons • CDN mods about 20 semicolons
Dual-Use Technology • Other one-to-many problems • Node/experiment installs • Software updates • Push model instead of pull • Solution? • Build “master” script • Push to nodes • Nodes pull as needed
CoDeploy • Now in beta • Small set of tools at source • No (new) installation at target • Needed tools at CoDeeN-hosting nodes • Fun components • Peer-review system of CoDeeN nodes • Nearest CoDeeN finder • Parallel ssh, scp
What To Expect Next • Will redeploy auto-rewriting service • Akamai-like URL mangling • Was in testing before December upgrade • Tie rewriter into “hosting” service • Make it simpler for provider to use CoDeeN
More Info http://codeen.cs.princeton.edu/codeploy KyoungSoo Park kyoungso@cs.princeton.edu Vivek Pai vivek@cs.princeton.edu