Optimising Web Server Utilisation and Bandwidth Usage

Optimising Web Server Utilisation and Bandwidth Usage Alex Bagehot – Salmon Ltd

Introduction • Who am I • Who is Salmon

About Salmon • Leading Systems Integrator & IBM Premier Business Partner. • 16 years experience of translating leading edge technologies into viable solutions. • Specialists in delivering solutions using IBM’s key software products (WebSphere) as well as, BEA, Sun & Oracle. • Experts in the management & delivery of fixed price, managed risk contracts. • Part of The Novell Group.

Sample Customers

Agenda • Introduction • Serving Web Content • Compression • Static content, Clustering and Entity tags • Dynamic content • Tools and monitoring • The Edge • Summary

Introduction What are web servers? • Machines that send content to users • Negotiate security and compression • Forwards Dynamic requests to other servers Web Server Application Server Internet User

Introduction Why Optimise web servers? • User experience depend on response times • Capacity of the site • Cost of bandwidth • Is there a problem? • planning and monitoring

Introduction What are we aiming to optimise? • Page response time • Network time • Performance of different cache types • Total page and image time • Network bandwidth utilisation • Capacity in terms of how much content you can serve • Expressed in Mbps (mega bits per second) • Capacity depends on the hosting hardware • 100Mbps ethernet • Gigabit ethernet • Avoid using over 70% of your capacity • Cpu utilisation • Use of re-write rules

Analyse Monitor Implement Introduction Repeatable and iterative performance methodology • Analyse and review requirements • Develop solutions • Monitor key metrics

Agenda • Introduction • Serving Web Content • Tools • Compression • Static Content, Clustering and Entity tags • Dynamic Content • Tools and Monitoring • The Edge • Summary

Serving Web Content How is content served to Users? • Browsers communicate via the internet with web servers • HTTP (HyperText Transfer Protocol) is used • Data is sent and received in plain text • It is human readable GET /home.html home.html Blah blah blah Blah blah blah Blah blah blah Blah blah blah Blah blah blah OK

Serving Web Content • Example Web HTTP request • Response Snippet GET / HTTP/1.1 Host:www.salmon.com HTTP/1.1 200 OK Date: Tue, 28 Feb 2006 10:52:20 GMT Connection: close Content-Type: text/html; charset=UTF-8 <html><head> <title>Salmon - Salmon.com Homepage</title> …

Serving Web Content What is HTTP? • Defined by the RFC 2616 standard • An application-level protocol for distributed, collaborative, hypermedia information systems • A request / response protocol in plain text • Includes several features • Message • Methods • Status codes • Content negotiation • Caching • Security

Serving Web Content How it works: • User agent sends request including • Request method • URI (Uniform Resource Identifier) • Protocol version • Host header • Message (other headers, client information, body) GET/HTTP/1.1 Host:www.salmon.com …

Serving Web Content How HTTP works: • The server responds with • Status (success or error) • Message including • Server information • Meta data • Content HTTP/1.1 200 OK Server: MyServer Date: Tue, 28 Feb 2006 10:52:20 GMT Connection: close Content-Type: text/html; charset=UTF-8 <html><head> <title>Salmon Homepage</title> …

Serving Web Content Headers used for: • Compression • Caching • Content negotiation HTTP/1.1 200 OK Server: MyServer Date: Tue, 28 Feb 2006 10:52:20 GMT Connection: close Content-Type: text/html; charset=UTF-8 <html><head> <title>Salmon Homepage</title> …

Serving Web Content Summary • Users and Web servers communicate via a plain text protocol called HTTP • HTTP Header information (or meta data) is used for other many features including • Caching • Compression • Such features • improve performance in terms of reduced response times and bandwidth utilisation • Reduce hardware requirements • Savings immediate, down the line

Agenda • Introduction • Serving Web Content • Compression • Static Content, Clustering and Entity tags • Dynamic Content • Tools and Monitoring • The Edge • Summary

Compression Web Pages • Transferred in plain text via HTTP • Written in using verbose languages eg. HTML, CSS, etc. • Can be large sizes with repeated text ideal for compression Images • Transferred in binary format • Normally these formats are compressed already • GIF, JPEG IBM HTTP Server • Compression must be configured

Compression - Images Images • Contribute to most of the bandwidth utilization normally • Reduce image sizes first! • Then ensure that you are using a format that is compressed • GIF – better for simple generated images, few colours • JPEG – better for photo type images • Image compression is the responsibility of web designers and content managers • No specific web server configuration necessary

GET /home.html HTTP/1.1 Accept-Encoding:gzip,deflate home.html Blah blah blah Blah blah blah Blah blah blah Blah blah blah Blah blah blah HTTP/1.1 200 OK Content-Type: text/html Content-Encoding: gzip Compression - pages Web Pages and other ascii content • Typical saving is 80% of the original size • Are there any specific requirements that may prevent compression? • Supporting specific browsers (IE on MAC) • Compression is negotiated using HTTP Headers:

Compression Summary • Compression if not used will provide large benefits for small effort • Content managers and web designers must • Minimise image use/size where possible • Use appropriate compressed formats • Text files eg. Web pages, javascript, etc • Can be compressed on the web server when the request is served • Web server needs configuration • Not all files will be compressed – depends on how many clients indicate that they can accept compressed content – typically 2/3rd • Typical saving is 80% • If images are already in a compressed format the overall saving will only be 80% of the bandwidth used by pages • Small additional cpu cost

User Proxy server Web server Internet Browser cache Internet cache The ‘Origin’ Static content, Clustering and Entity tags Overview of HTTP/1.1 Caching • Attempt to avoid the round-trip of sending a request, else avoid sending a full response if possible • Content can be cached at various locations • Expiration Model • Validation Model

logo.gif GET /logo.gif HTTP/1.1 HTTP/1.1 200 OK Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT ETag: "228-a198ca00-45ncn“ Expires: Tue, 28 Feb 2006 17:00:00 GMT Thu, 08 Sep 2005 13:05:45 GMT ETag= "228-a198ca00-45ncn" Static content, Clustering and Entity tags Scenario • Browser requests an image that is not in the browser cache • Server sends the image with caching meta data • Caching headers: • Last-modified (validation) • ETag (validation) • Expires (expiration)

logo.gif GET /logo.gif HTTP/1.1 If-None-Match: " 228-a198ca00-45ncn” If-Modified-Since: Wed, 18 Jan 2006 13:00:24 GMT HTTP/1.1 304 Not Modified Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT ETag: "228-a198ca00-45ncn" Thu, 08 Sep 2005 13:05:45 GMT ETag= "228-a198ca00-45ncn" Static content, Clustering and Entity tags Scenario continued… • Browser requests the image again (after 17hrs 28th Feb 2006) • Server determines if the image is still valid

Static content, Clustering and Entity tags Caching and Expiration • The server provides a TTL for a resource • Apache uses the mod_expires module • Example expiry setting: • Static content must be analysed for TTLs • Can be defined by content type (above) • Or by location ExpiresActive on ExpiresByType text/html "access plus 1 hour“ ExpiresByType image/gif “access plus 30 minutes"

Static content, Clustering and Entity tags Most Web sites require more than 1 web server • As will be seen, this can cause wasted bandwidth • Waste occurs when static content is served from a web server cluster and is cached by users Scenario: Web servers server user Load balancer server Internet cache server

Static content, Clustering and Entity tags • Entity tags (ETag) are used in the HTTP validation model • How are ETags calculated? • Apache has a default: • Etag = INode - Mtime - Size • Encoded values are concatenated together • Etag value for the same resource on different web nodes: X X

Web servers server1 “ef6-479-8f6” user Load balancer server2 Internet “da4-4e6-8f6” cache server3 “2ee-a9f-8f6” Logo.gif Etag= “ef6-479-8f6” Static content, Clustering and Entity tags • ETags need to match across web servers, else: • User requests image • Served from server1 • Cached in the browser for 10 minutes • User makes another request 10 minutes later • Conditional request made to server2 • ETag differs • Image served again (rc200) Logo.gif ETag value

Static content, Clustering and Entity tags • Entity tags (ETag) should be the same for a resource across all web nodes • Re-define ETag in Apache • ETag = mtime - size • Ensure file timestamps equal • touch -t [[CC]YY]MMDDhhmm[.SS] • ‘Not modified’ (304) response sent instead of ‘OK’ (200) full content • Not modified response uses less bandwidth

Static content, Clustering and Entity tags Invalidated On all caches Re-define ETag in Apache (in stages) • ETag = mtime – size • Default is: Inode mtime size Bandwidth Change ETag Content Re-cached time

Static content, Clustering and Entity tags Summary • In a clustered environment care must be taken with static content and the way it is deployed • ETag must be modified • Files must be stored on cloned web servers with the same timestamp on each • When this is done • Cache hit ratio improved • Potentially 10% Bandwidth is saved, not instant, improves over days

Proxy server Internet Internet cache Dynamic Content • Cache static and dynamic content close to the user • Content can be cached in several locations: • Client cache • Content delivery network, proxy caches • External cache, reverse proxy • WebSphere Dynacache Reverse Proxy User App server Origin Web server Browser cache Dynacache

Dynamic Content • Traditional HTTP caching: cache-control header • Candidates for caching include: • Frequently accessed pages • Pages that do not contain volatile data • Pages that contain content applicable to many users • Pages that do not contain sensitive information • Cache-Control response headers • max-age: defines the length of time the content remains in the cache, overrides ‘Expires’ • no-store: pages cannot be cached at all • private: pages can only be cached in ‘private’ caches (browser) • WebSphere dynacache provides fragment caching, cache ids, ESI; for greater control

Dynamic Content • Traditional HTTP caching: cache-control header • Apache sets a max-age cache-control header with the ExpiresActive directive for static content: <LocationMatch /staticcontent/.*> ExpiresActive on ExpiresDefault "access plus 90 minutes" </LocationMatch> httpd.conf Date: Tue, 28 Feb 2006 10:04:29 GMT Cache-Control: max-age=5400 Expires: Tue, 28 Feb 2006 11:34:29 GMT Response headers

Dynamic Content • Traditional HTTP caching: cache-control header • By default dynamic pages should be served with no-cache/no-store: GET /webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=10001&catalogId=10151 HTTP/1.1 Host: www.halfords.com Expires: Thu, 01 Dec 1994 16:00:00 GMT Cache-Control: no-cache="set-cookie,set-cookie2"

User App server Origin Web server Proxy server Browser cache Internet Internet Dynacache cache cache-control Dynamic Content Internet proxy servers • Caching controlled using HTTP headers • Expires • cache-control (s-max-age) • If missing, may use heuristics (rule of thumb, 20% of age) • Impossible to purge this kind of cache as it is not in our control! • must-revalidate feature

Proxy server Internet Internet cache Dynamic Content • Reverse proxy (IBM Caching proxy) • Extra layer in front of WebSphere dynacache to improve performance • Disk cache provides more capacity than (memory) webserver plugin • Integration with dynacache • Re-write rules may cause problems Reverse Proxy User App server Origin Web server Browser cache Dynacache

Reverse Proxy User Proxy server App server Origin Web server Browser cache Internet Internet Dynacache cache Dynamic Content • WAS5/6 Web server plugin external cache • Used to cache jsp page results on the Web server • Invalidation can be controlled from the app server cache • Cached content stored in memory only • Memory utilisation important • Separate copies of the cache per Apache process… ThreadsPerChild

Dynamic Content Summary • Dynamic content can be cached • On the web server to reduce app server load • In proxy or browser caches • Care must be taken to identify which pages to be cached • Personalisation • Lifetime in cache • Whole pages or fragments • Caution caching on public proxy servers • private, no-store, must-revalidate

Tools and monitoring • Is there a problem? • Browser proxy • Where are the bottlenecks? • What to monitor? Know thy enemy • Hardware information • Memory • Cpu, processes • Disk (logs) • Network • Web Logs • Process using awk and other unix tools • Hits, total time, %hits over 1 second, bytes served • HTTP Response codes • Interval obscures problems: 10 minutes, 1 hour, 1 day • Cache statistics • Cache monitor application • PMI cache statistics

Tools and monitoring • Is there a problem? • Browser proxies • Fiddler • Page Detailer

Tools and monitoring • What to monitor? • Hardware information • Memory • vmstat, svmon • lsps -a • Cpu, processes • vmstat • Disk (logs and static content) • iostat • Network utilisation can be estimated • entstat, example: entstat ${interface}|grep Bytes|awk '{print "tx= "$2" rcv= "$4}' >> ${outfile}

Tools and monitoring • What to monitor? • Web Logs • Common Log format: host, user identity, user name, timestamp, request, response, size • Example awk for Home.htm hits and bytes: index($7,“home.htm") {homehits=homehits+1;homebytes=homebytes+$10} END {print “homehits= “homehits” homebytes= “homebytes}

Tools and monitoring • What to monitor? • Web Logs • Common Log format: host, user identity, user name, timestamp, request, response, size • Example awk for HTTP Response codes (ratio of 200 to 304) : index($8,“home.htm") && $9 == 200 {h200=h200+1} index($8,“home.htm") && $9 == 304 {h304=h304+1} END {print “home 304 / any response ratio = “ h304/(h304+h200)}

Tools and monitoring • Summary • Is there a problem? • Where are the bottlenecks? • What to monitor? • Hardware information • Memory, Cpu, processes, disk, Network • Web Logs • Process using awk and other unix tools • WAS Cache monitor application • PMI cache statistics

Optimising Web Server Utilisation and Bandwidth Usage

Optimising Web Server Utilisation and Bandwidth Usage

Presentation Transcript

Web Server

Web Usage Patterns

Web Usage Mining

Web Usage Mining

Web server

Gramene Web Tour and Usage Statistics

Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data

Web Usage Mining

Web Usage Mining Classification

WEB SERVER

Web Development Overview and Usage

Cheap Dedicated Server Offer Unlimited Bandwidth

Web Usage Mining: Processes and Applications

Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data

Optimising FTA Utilisation

Bodily Web Server vs. Digital Web Server