1 / 20

Fast Data at Massive Scale

Fast Data at Massive Scale. Lessons Learned at Facebook Bobby Johnson. Me. Director of Engineering Scaling and Performance Site Security Site Reliability Distributed Systems Development tools Customer Service Tools Took Facebook from 7M users to 120M. Architecture. Load Balancer

raisie
Download Presentation

Fast Data at Massive Scale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Data at Massive Scale Lessons Learned at Facebook Bobby Johnson

  2. Me • Director of Engineering • Scaling and Performance • Site Security • Site Reliability • Distributed Systems • Development tools • Customer Service Tools • Took Facebook from 7M users to 120M.

  3. Architecture Load Balancer (assigns a web server) Other services Search, Feed, etc (ignore for now) Web Server (PHP assembles data) Memcache (fast) Database (slow, persistent)

  4. 1/2 the time is in PHP • 1/4 is in memcache • 1/8 is in database

  5. One year ago, almost half the time was memcache

  6. Network Incast memcache memcache memcache memcache Switch Many Small Get Requests PHP Client

  7. Network Incast memcache memcache memcache memcache Switch Many big data packets PHP Client

  8. Clustering memcache 10 objects PHP Client 1 round trip for 10 objects

  9. Clustering memcache memcache 5 objects 5 objects PHP Client • 2 round trips total • 1 round trip per server • longest request is 5

  10. Clustering memcache memcache memcache 3 objects 4 objects 3 objects PHP Client • 3 round trips total • 1 round trip per server • longest request is 4

  11. Clustering • If objects are small, round trips dominate so you want objects clustered • If objects are large, transfer time dominates so you want objects distributed • In a web application you will almost always be dealing with small objects

  12. Caching • Basic tools are parallelism and clustering • Clustering is a latency/throughput tradeoff • Application code must be aware • Networking is a burst problem • Dropped packets kill you • TCP quick ack

  13. PHP CPU

  14. Application Improvements

  15. know what your libraries do $results = get_search_results( $needle ); foreach ( $results as $result ) { if ( is_pending_friend( $result[‘id’] ) ) { // we’ll change the links based on this $result[‘pending’] = true; } }

  16. know what your libraries do function is_pending_friend( $id ) { // this is short-lived, so don’t cache expensive_db_query( $id …)

  17. Databases • Tend to be slower than lighter weight alternatives, so avoid using them • If you do use them partition them right from the start • If a query is _really_ slow, like a few seconds or a few minutes, you probably have a bug where you’re scanning a table • The db should have a command to tell you what index it’s using for a query, and how many rows it’s examining

  18. General Lessons • Your best tool is parallelism • Look at your data • Build tools to look at your data • Don’t make assumptions about what components are doing • Algorithmic and system improvements are almost always better than micro-optimization

More Related