1 / 74

Building Scalable Web Architectures

Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. http://www.codemass.com/~aaron/presentations/ apachecon2005/ac2005scalablewebarch.ppt. Goal. How do we build a massive web system out of commodity parts and open source?. Agenda. LAMP Overview

millie
Download Presentation

Building Scalable Web Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BuildingScalable Web Architectures Aaron Bannert aaron@apache.org / aaron@codemass.com http://www.codemass.com/~aaron/presentations/ apachecon2005/ac2005scalablewebarch.ppt

  2. Goal • How do we build a massive web system out of commodity parts and open source?

  3. Agenda • LAMP Overview • LAMP Features • Performance • Surviving your first Slashdotting • Growing above a single box • Avoiding Bottlenecks

  4. LAMP Overview Architecture

  5. L A M P Linux Apache MySQL PHP (Perl?)

  6. The Big Picture

  7. External Caching Tier

  8. External Caching Tier • What is this? • Squid • Apache’s mod_proxy • Commercial HTTP Accelerator

  9. External Caching Tier • What does it do? • Caches outbound HTTP objects • Images, CSS, XML, HTML, etc… • Flushes Connections • Useful for modem users, frees up web tier • Denial of Service Defense

  10. External Caching Tier • Hardware Requirements • Lots of Memory • Fast Network • Moderate to little CPU • Moderate Disk Capacity • Room for cache, logs, etc… (disks are cheap) • One slow disk is OK • Two Cheapies > One Expensive

  11. External Caching Tier • Other Questions • What to cache? • How much to cache? • Where to cache (internal vs. external)?

  12. Web Serving Tier

  13. Web Serving Tier • What is this? • Apache • thttpd • Tux Web Server • IIS • Netscape

  14. Web Serving Tier • What does it do? • HTTP, HTTPS • Serves Static Content from disk • Generates Dynamic Content • CGI/PHP/Python/mod_perl/etc… • Dispatches requests to the App Server Tier • Tomcat, Weblogic, Websphere, JRun, etc…

  15. Web Serving Tier • Hardware Requirements • Lots and lots of Memory • Memory is main bottleneck in web serving • Memory determines max number of users • Fast Network • CPU depends on usage • Dynamic content needs CPU • Static file serving requires very little CPU • Cheap slow disk, enough to hold your content

  16. Web Serving Tier • Choices • How much dynamic content? • When to offload dynamic processing? • When to offload database operations? • When to add more web servers?

  17. Application Server Tier

  18. Application Server Tier • What does it do? • Dynamic Page Processing • JSP • Servlets • Standalone mod_perl/PHP/Python engines • Internal Services • Eg. Search, Shopping Cart, Credit Card Processing

  19. Application Server Tier • How does it work? • Web Tier generates the request using • HTTP (aka “REST”, sortof) • RPC/Corba • Java RMI • XMLRPC/Soap • (or something homebrewed) • App Server processes request and responds

  20. Application Server Tier • Caveats • Decoupling of services is GOOD • Manage Complexity using well-defined APIs • Don’t decouple for scaling, change your algorithms! • Remote Calling overhead can be expensive • Marshaling of data • Sockets, net latency, throughput constraints… • XML, Soap, XMLRPC, yuck (don’t scale well) • Better to use Java’s RMI, good old RPC or even Corba

  21. Application Server Tier • More Caveats • Remote Calling introduces new failure scenarios • Classic Distributed Problems • How to detect remote failures? • How long to wait until deciding it’s failed? • How to react to remote failures? • What do we do when all app servers have failed?

  22. Application Server Tier • Hardware Requirements • Lots and Lots and Lots of Memory • App Servers are very memory hungry • Java was hungry to being with • Consider going to 64bit for larger memory-space • Disk depends on application, typically minimal needed • FAST CPU required, and lots of them • (This will be an expensive machine.)

  23. Database Tier

  24. Database Tier • Available DB Products • Free/Open Source DBs • PostgreSQL • GNU DBM • Ingres • SQLite • Commercial • Oracle • MS SQL • IBM DB2 • Sybase • SleepyCat • MySQL • SQLite • mSQL • Berkeley DB

  25. Database Tier • What does it do? • Data Storage and Retrieval • Data Aggregation and Computation • Sorting • Filtering • ACID properties • (Atomic, Consistent, Isolated, Durable)

  26. Database Tier • Choices • How much logic to place inside the DB? • Use Connection Pooling? • Data Partitioning? • Spreading a dataset across multiple logical database “slices” in order to achieve better performance.

  27. Database Tier • Hardware Requirements • Entirely dependent upon application. • Likely to be your most expensive machine(s). • Tons of Memory • Spindles galore • RAID is useful (in software or hardware) • Reliability usually trumps Speed • RAID levels 0, 5, 1+0, and 5+0 are useful • CPU also important • Dual power supplies • Dual Network

  28. Internal Cache Tier

  29. Internal Cache Tier • What is this? • Object Cache • What Applications? • Memcache • Local Lookup Tables • BDB, GDBM, SQL-based • Application-local Caching (eg. LRU tables) • Homebrew Caching (disk or memory)

  30. Internal Cache Tier • What does it do? • Caches objects closer to the Application or Web Tiers • Tuned for your application • Very Fast Access • Scales Horizontally

  31. Internal Cache Tier • Hardware Requirements • Lots of Memory • Note that 32bit processes are typically limited to 2GB of RAM • Little or no disk • Moderate to low CPU • Fast Network

  32. Misc. Services (DNS, Mail, etc…)

  33. Misc. Services (DNS, Mail, etc…) • Why mention these? • Every LAMP system has them • Crucial but often overlooked • Source of hidden problems

  34. Misc. Services: DNS • Important Points • Always have an offsite NS slave • Always have an onsite NS slave • Minimize network latency • Don’t use NAT, load balancers, etc…

  35. Misc. Services: Time Synchronization • Synchronize the clocks on your systems! • Hints: • Use NTPDATE at boot time to set clock • Use NTPD to stay in synch • Don’t ever change the clock on a running system!

  36. Misc. Services: Monitoring • System Health Monitoring • Nagios • Big Brother • Orcalator • Ganglia • Fault Notification

  37. The Glue • Routers • Switches • Firewalls • Load Balancers

  38. Routers and Switches • Expensive • Complex • Crucial Piece of the System • Hints • Use GigE if you can • Jumbo Frames are GOOD • VLans to manage complexity • LACP (802.3ad) for failover/redundancy

  39. Load Balancers • What services to balance? • HTTP Caches and Servers, App Servers, DB Slaves • What NOT to balance? • DNS • LDAP • NIS • Memcache • Spread • Anything with it’s own built-in balancing

  40. Message Busses • What is out there? • Spread • JMS • MQSeries • Tibco Rendezvous • What does it do? • Various forms of distributed message delivery. • Guaranteed Delivery, Broadcasting, etc… • Useful for heterogeneous distributed systems

  41. What about the OS? Operating System Selection

  42. Lots of OS choices • Linux • FreeBSD • NetBSD • OpenBSD • OpenSolaris • Commercial Unix

  43. What’s Important? • Maintainability • Upgrade Path • Security Updates • Bug Fixes • Usability • Do your engineers like it? • Cost • Hardware Requirements • (you don’t need a commercial Unix anymore)

  44. Features to look for • Multi-processor Support • 64bit Capable • Mature Thread Support • Vibrant User Community • Support for your devices

  45. The Age of LAMP What does LAMP provide?

  46. Scalability • Grows in small steps • Stays up when it counts • Can grow with your traffic • Room for the future

  47. Reliability • High Quality of Service • Minimal Downtime • Stability • Redundancy • Resilience

  48. Low Cost • Little or no software licensing costs • Minimal hardware requirements • Abundance of talent • Reduced maintenance costs

  49. Flexible • Modular Components • Public APIs • Open Architecture • Vendor Neutral • Many options at all levels

  50. Extendable • Free/Open Source Licensing • Right to Use • Right to Inspect • Right to Improve • Plugins • Some Free • Some Commercial • Can always customize

More Related