1 / 17

CoDeeN,Large Files, & CoDeploy

CoDeeN,Large Files, & CoDeploy. KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University. What Is CoDeeN?. Content Distribution Networks Web pages load faster if You’re contacting a nearby server That server isn’t overloaded The page is already in memory

hester
Download Presentation

CoDeeN,Large Files, & CoDeploy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CoDeeN,Large Files, & CoDeploy KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University

  2. What Is CoDeeN? • Content Distribution Networks • Web pages load faster if • You’re contacting a nearby server • That server isn’t overloaded • The page is already in memory • You use long-lived TCP connections right

  3. CoDeeN By The Numbers • In operation ~10 months • 150 nodes (~120 live) • 6.5 million reqs/day • 5 million “good” reqs/day • about 300GB/day (estimate) • 7K-20K unique IPs per 24 hours • Over 600,000 unique IPs served

  4. Our “Strategy” • Stay operational • Build some credibility • Exploit that + activity to branch out • Involves doing sales pitches • Tap into new consumers • In particular, nonprofits, non-commercial

  5. What Most CDNs (want to) Serve

  6. But What About Big Files?

  7. How Big? • 200 TeraBytes of data total • Interviews: about 3.5GB each • Files: average of 700MB each

  8. Problem: “Nobody” Handles 700MB • CDNs designed for avg size 10KB • 1MB = 100 files • 700MB = 70,000 files • Commercial disks ~ 100GB • Our storage ~ 3GB

  9. slow client New Problems • Why not replicate less? • You’re farther away • Why not merge requests? client readahead

  10. file0-1 file1-2 file0-1 file file2-3 file4-5 file3-4 file4-5 Our Approach CDN CDN Client Agent CDN CDN Server CDN CDN

  11. GET name/ranges Header: blah Header: blah HTTP/1.0 206 Partial Range: start-end/length Header: blah GET name Range: bytes ranges Header: blah HTTP/1.0 200 OK Content-length: piece length New-header: obj length Low-Level HTTP Stuff egress ingress

  12. Benefits • Transparent to client (no software) • Server only needs byte-range support • Every real server has it • Will generate more log entries • Can use/augment HTTP infrastructure • Caching, redirection, etc • Adding security controls • Low incremental overhead • Agent is about 300 semicolons • CDN mods about 20 semicolons

  13. Dual-Use Technology • Other one-to-many problems • Node/experiment installs • Software updates • Push model instead of pull • Solution? • Build “master” script • Push to nodes • Nodes pull as needed

  14. CoDeploy • Now in beta • Small set of tools at source • No (new) installation at target • Needed tools at CoDeeN-hosting nodes • Fun components • Peer-review system of CoDeeN nodes • Nearest CoDeeN finder • Parallel ssh, scp

  15. What To Expect Next • Will redeploy auto-rewriting service • Akamai-like URL mangling • Was in testing before December upgrade • Tie rewriter into “hosting” service • Make it simpler for provider to use CoDeeN

  16. More Info http://codeen.cs.princeton.edu/codeploy KyoungSoo Park kyoungso@cs.princeton.edu Vivek Pai vivek@cs.princeton.edu

More Related