1 / 75

Building Scalable Web Architectures

Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. Goal. To build a reliable , scalable , cheap , flexible , extendable internet application. The Age of LAMP. What does a LAMP architecture give us?. Scalability. Grows in small steps

marv
Download Presentation

Building Scalable Web Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BuildingScalable Web Architectures Aaron Bannert aaron@apache.org / aaron@codemass.com

  2. Goal To build a reliable, scalable, cheap, flexible, extendable internet application.

  3. The Age of LAMP What does a LAMP architecture give us?

  4. Scalability • Grows in small steps • Stays up when it counts • Can grow with your traffic • Room for the future

  5. Reliability • High Quality of Service • Minimal Downtime • Stability • Redundancy • Resilience

  6. Low Cost • Little or no software licensing costs • Minimal hardware requirements • Abundance of talent • Reduced maintenance costs

  7. Flexible • Modular Components • Public APIs • Open Architecture • Vendor Neutral • Many options at all levels

  8. Extendable • Free/Open Source Licensing • Right to Use • Right to Inspect • Right to Improve • Plugins • Some Free • Some Commercial • Can always customize

  9. Free as in Beer? Price Speed Quality Pick any two.

  10. LAMP-like Architectures

  11. The Big Picture

  12. External Caching Tier

  13. Web Serving Tier

  14. Application Server Tier

  15. Internal Cache Tier

  16. Database Tier

  17. Misc. Services (DNS, Mail, etc…)

  18. The Glue • Routers • Switches • Firewalls • Load Balancers

  19. Software Choices Building LAMP Software

  20. External Caching Tier

  21. External Caching Tier • What is this? • Squid • Apache’s mod_proxy • Commercial HTTP Accelerator

  22. External Caching Tier • What does it do? • Caches outbound HTTP objects • Images, CSS, XML, HTML, etc… • Flushes Connections • Useful for modem users, frees up web tier • Denial of Service Defense

  23. External Caching Tier • Hardware Requirements • Lots of Memory • Moderate to little CPU • Fast Network • Moderate Disk Capacity • Room for cache, logs, etc… (disks are cheap) • One slow disk is OK • Two Cheapies > One Expensive

  24. External Caching Tier • Other Questions • What to cache? • How much to cache? • Where to cache (internal vs. external)?

  25. Web Serving Tier

  26. Web Serving Tier • What is this? • Apache • thttpd • Tux Web Server • IIS • Netscape

  27. Web Serving Tier • What does it do? • HTTP, HTTPS • Serves Static Content from disk • Generates Dynamic Content • CGI/PHP/Python/mod_perl/etc… • Dispatches requests to the App Server Tier • Tomcat, Weblogic, Websphere, JRun, etc…

  28. Web Serving Tier • Hardware Requirements • Lots and lots of Memory • Memory is main bottleneck in web serving • Memory determines max number of users • Fast Network • CPU depends on usage • Dynamic content needs CPU • Static file serving requires very little CPU • Cheap slow disk, enough to hold your content

  29. Web Serving Tier: Zero-copy • Performance Hint • Dedicated static content servers • Modern web servers are very good at serving static content such as • HTML • CSS • Images • Zip/GZ/Tar files

  30. Web Serving Tier • Performance Hint • Stateless Sessions • Each connection is a fresh start • Server remembers nothing • Benefits? • Allows Better Caching • Scales Horizontally

  31. Web Serving Tier • Choices • How much dynamic content? • When to offload dynamic processing? • When to offload database operations? • When to add more web servers?

  32. Application Server Tier

  33. Application Server Tier • What does it do? • Dynamic Page Processing • JSP • Servlets • Standalone mod_perl/PHP/Python engines • Internal Services • Eg. Search, Shopping Cart, Credit Card Processing

  34. Application Server Tier • How does it work? • Web Tier generates the request using • HTTP (aka “REST”, sortof) • RPC/Corba • Java RMI • XMLRPC/Soap • (or something homebrewed) • App Server processes request and responds

  35. Application Server Tier • Caveats • Decoupling of services is GOOD • Manage Complexity using well-defined APIs • Don’t decouple for scaling, change your algorithms! • Remote Calling overhead can be expensive • Marshaling of data • Sockets, net latency, throughput constraints… • XML, Soap, XMLRPC, yuck (don’t scale well) • Better to use Java’s RMI, good old RPC or even Corba

  36. Application Server Tier • More Caveats • Remote Calling can introduce new failure scenarios • Classic Distributed Problems • How to detect remote failures? • How long to wait until deciding it’s failed? • How to react to remote failures? • What do we do when all app servers have failed?

  37. Application Server Tier • Hardware Requirements • Lots and Lots and Lots of Memory • App Servers are very memory hungry • Java was hungry to being with • Consider going to 64bit for larger memory-space • Disk depends on application, typically minimal needed • FAST CPU required, and lots of them • (This will be an expensive machine.)

  38. Database Tier

  39. Database Tier • Available DB Products • Free/Open Source DBs • PostgreSQL • GNU DBM • Ingres • SQLite • Commercial • Oracle • MS SQL • IBM DB2 • Sybase • SleepyCat • MySQL • SQLite • mSQL • Berkeley DB

  40. Database Tier • What does it do? • Data Storage and Retrieval • Data Aggregation and Computation • Sorting • Filtering • ACID properties • (Atomic, Consistent, Isolated, Durable)

  41. Database Tier • Choices • How much logic to place inside the DB? • Use Connection Pooling? • Data Partitioning? • Spreading a dataset across multiple logical database “slices” in order to achieve better performance.

  42. Database Tier • Hardware Requirements • Entirely dependent upon application. • Likely to be your most expensive machine(s). • Tons of Memory • Spindles galore • RAID is useful (in software or hardware) • Reliability usually trumps Speed • RAID levels 0, 5, 1+0, and 5+0 are useful • CPU also important • Dual power supplies • Dual Network

  43. Internal Cache Tier

  44. Internal Cache Tier • What is this? • Object Cache • What Applications? • Memcache • Local Lookup Tables • BDB, GDBM, SQL-based • Application-local Caching (eg. LRU tables) • Homebrew Caching (disk or memory)

  45. Internal Cache Tier • What does it do? • Caches objects closer to the Application or Web Tiers • Tuned for your application • Very Fast Access • Scales Horizontally

  46. Internal Cache Tier • Hardware Requirements • Lots of Memory • Note that 32bit processes are typically limited to 2GB of RAM • Little or no disk • Moderate to low CPU • Fast Network

  47. Misc. Services (DNS, Mail, etc…)

  48. Misc. Services (DNS, Mail, etc…) • Why mention these? • Every LAMP system has them • Crucial but often overlooked • Source of hidden problems

  49. Misc. Services: DNS • Important Points • Always have an offsite NS slave • Always have an onsite NS slave • Minimize network latency • Don’t use NAT, load balancers, etc…

  50. Misc. Services: Time Synchronization • Synchronize the clocks on your systems! • Hints: • Use NTPDATE at boot time to set clock • Use NTPD to stay in synch • Don’t ever change the clock on a running system!

More Related