Optimising web server utilisation and bandwidth usage
Download
1 / 54

Optimising Web Server Utilisation and Bandwidth Usage - PowerPoint PPT Presentation


  • 436 Views
  • Updated On :

Optimising Web Server Utilisation and Bandwidth Usage. Alex Bagehot – Salmon Ltd. Introduction. Who am I Who is Salmon. About Salmon. Leading Systems Integrator & IBM Premier Business Partner. 16 years experience of translating leading edge technologies into viable solutions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Optimising Web Server Utilisation and Bandwidth Usage ' - DoraAna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slide2 l.jpg

Introduction

  • Who am I

  • Who is Salmon


Slide3 l.jpg

About Salmon

  • Leading Systems Integrator & IBM Premier Business Partner.

  • 16 years experience of translating leading edge technologies into viable solutions.

  • Specialists in delivering solutions using IBM’s key software products (WebSphere) as well as, BEA, Sun & Oracle.

  • Experts in the management & delivery of fixed price, managed risk contracts.

  • Part of The Novell Group.



Slide5 l.jpg

Agenda

  • Introduction

  • Serving Web Content

  • Compression

  • Static content, Clustering and Entity tags

  • Dynamic content

  • Tools and monitoring

  • The Edge

  • Summary


Slide6 l.jpg

Introduction

What are web servers?

  • Machines that send content to users

  • Negotiate security and compression

  • Forwards Dynamic requests to other servers

Web

Server

Application

Server

Internet

User


Slide7 l.jpg

Introduction

Why Optimise web servers?

  • User experience depend on response times

  • Capacity of the site

  • Cost of bandwidth

  • Is there a problem?

    • planning and monitoring


Slide8 l.jpg

Introduction

What are we aiming to optimise?

  • Page response time

    • Network time

    • Performance of different cache types

    • Total page and image time

  • Network bandwidth utilisation

    • Capacity in terms of how much content you can serve

    • Expressed in Mbps (mega bits per second)

    • Capacity depends on the hosting hardware

      • 100Mbps ethernet

      • Gigabit ethernet

    • Avoid using over 70% of your capacity

  • Cpu utilisation

    • Use of re-write rules


  • Slide9 l.jpg

    Analyse

    Monitor

    Implement

    Introduction

    Repeatable and iterative performance methodology

    • Analyse and review requirements

    • Develop solutions

    • Monitor key metrics


    Slide10 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Tools

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide11 l.jpg

    Serving Web Content

    How is content served to Users?

    • Browsers communicate via the internet with web servers

    • HTTP (HyperText Transfer Protocol) is used

    • Data is sent and received in plain text

      • It is human readable

    GET /home.html

    home.html

    Blah blah blah

    Blah blah blah

    Blah blah blah

    Blah blah blah

    Blah blah blah

    OK


    Slide12 l.jpg

    Serving Web Content

    • Example Web HTTP request

    • Response Snippet

    GET / HTTP/1.1

    Host:www.salmon.com

    HTTP/1.1 200 OK

    Date: Tue, 28 Feb 2006 10:52:20 GMT

    Connection: close

    Content-Type: text/html; charset=UTF-8

    <html><head>

    <title>Salmon - Salmon.com Homepage</title>


    Slide13 l.jpg

    Serving Web Content

    What is HTTP?

    • Defined by the RFC 2616 standard

      • An application-level protocol for distributed, collaborative, hypermedia information systems

      • A request / response protocol in plain text

      • Includes several features

        • Message

        • Methods

        • Status codes

        • Content negotiation

        • Caching

        • Security


    Slide14 l.jpg

    Serving Web Content

    How it works:

    • User agent sends request including

      • Request method

      • URI (Uniform Resource Identifier)

      • Protocol version

      • Host header

      • Message (other headers, client information, body)

    GET/HTTP/1.1

    Host:www.salmon.com


    Slide15 l.jpg

    Serving Web Content

    How HTTP works:

    • The server responds with

      • Status (success or error)

      • Message including

        • Server information

        • Meta data

        • Content

    HTTP/1.1 200 OK

    Server: MyServer

    Date: Tue, 28 Feb 2006 10:52:20 GMT

    Connection: close

    Content-Type: text/html; charset=UTF-8

    <html><head>

    <title>Salmon Homepage</title>


    Slide16 l.jpg

    Serving Web Content

    Headers used for:

    • Compression

    • Caching

    • Content negotiation

    HTTP/1.1 200 OK

    Server: MyServer

    Date: Tue, 28 Feb 2006 10:52:20 GMT

    Connection: close

    Content-Type: text/html; charset=UTF-8

    <html><head>

    <title>Salmon Homepage</title>


    Slide17 l.jpg

    Serving Web Content

    Summary

    • Users and Web servers communicate via a plain text protocol called HTTP

    • HTTP Header information (or meta data) is used for other many features including

      • Caching

      • Compression

      • Such features

        • improve performance in terms of reduced response times and bandwidth utilisation

        • Reduce hardware requirements

        • Savings immediate, down the line


    Slide18 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide19 l.jpg

    Compression

    Web Pages

    • Transferred in plain text via HTTP

    • Written in using verbose languages eg. HTML, CSS, etc.

    • Can be large sizes with repeated text ideal for compression

      Images

    • Transferred in binary format

    • Normally these formats are compressed already

      • GIF, JPEG

        IBM HTTP Server

    • Compression must be configured


    Slide20 l.jpg

    Compression - Images

    Images

    • Contribute to most of the bandwidth utilization normally

    • Reduce image sizes first!

    • Then ensure that you are using a format that is compressed

      • GIF – better for simple generated images, few colours

      • JPEG – better for photo type images

    • Image compression is the responsibility of web designers and content managers

    • No specific web server configuration necessary


    Slide21 l.jpg

    GET /home.html HTTP/1.1

    Accept-Encoding:gzip,deflate

    home.html

    Blah blah blah

    Blah blah blah

    Blah blah blah

    Blah blah blah

    Blah blah blah

    HTTP/1.1 200 OK

    Content-Type: text/html

    Content-Encoding: gzip

    Compression - pages

    Web Pages and other ascii content

    • Typical saving is 80% of the original size

    • Are there any specific requirements that may prevent compression?

      • Supporting specific browsers (IE on MAC)

    • Compression is negotiated using HTTP Headers:


    Slide22 l.jpg

    Compression

    Summary

    • Compression if not used will provide large benefits for small effort

    • Content managers and web designers must

      • Minimise image use/size where possible

      • Use appropriate compressed formats

    • Text files eg. Web pages, javascript, etc

      • Can be compressed on the web server when the request is served

      • Web server needs configuration

      • Not all files will be compressed – depends on how many clients indicate that they can accept compressed content – typically 2/3rd

    • Typical saving is 80%

      • If images are already in a compressed format the overall saving will only be 80% of the bandwidth used by pages

    • Small additional cpu cost


    Slide23 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide24 l.jpg

    User

    Proxy

    server

    Web

    server

    Internet

    Browser

    cache

    Internet

    cache

    The ‘Origin’

    Static content, Clustering and Entity tags

    Overview of HTTP/1.1 Caching

    • Attempt to avoid the round-trip of sending a request, else avoid sending a full response if possible

    • Content can be cached at various locations

    • Expiration Model

    • Validation Model


    Slide25 l.jpg

    logo.gif

    GET /logo.gif HTTP/1.1

    HTTP/1.1 200 OK

    Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT

    ETag: "228-a198ca00-45ncn“

    Expires: Tue, 28 Feb 2006 17:00:00 GMT

    Thu, 08 Sep 2005 13:05:45 GMT

    ETag= "228-a198ca00-45ncn"

    Static content, Clustering and Entity tags

    Scenario

    • Browser requests an image that is not in the browser cache

    • Server sends the image with caching meta data

      • Caching headers:

        • Last-modified (validation)

        • ETag (validation)

        • Expires (expiration)


    Slide26 l.jpg

    logo.gif

    GET /logo.gif HTTP/1.1

    If-None-Match: " 228-a198ca00-45ncn”

    If-Modified-Since: Wed, 18 Jan 2006 13:00:24 GMT

    HTTP/1.1 304 Not Modified

    Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT

    ETag: "228-a198ca00-45ncn"

    Thu, 08 Sep 2005 13:05:45 GMT

    ETag= "228-a198ca00-45ncn"

    Static content, Clustering and Entity tags

    Scenario continued…

    • Browser requests the image again (after 17hrs 28th Feb 2006)

    • Server determines if the image is still valid


    Slide27 l.jpg

    Static content, Clustering and Entity tags

    Caching and Expiration

    • The server provides a TTL for a resource

    • Apache uses the mod_expires module

    • Example expiry setting:

    • Static content must be analysed for TTLs

      • Can be defined by content type (above)

      • Or by location

    ExpiresActive on

    ExpiresByType text/html "access plus 1 hour“

    ExpiresByType image/gif “access plus 30 minutes"


    Slide28 l.jpg

    Static content, Clustering and Entity tags

    Most Web sites require more than 1 web server

    • As will be seen, this can cause wasted bandwidth

    • Waste occurs when static content is served from a web server cluster and is cached by users

      Scenario:

    Web servers

    server

    user

    Load

    balancer

    server

    Internet

    cache

    server


    Slide29 l.jpg

    Static content, Clustering and Entity tags

    • Entity tags (ETag) are used in the HTTP validation model

    • How are ETags calculated?

      • Apache has a default:

        • Etag = INode - Mtime - Size

        • Encoded values are concatenated together

    • Etag value for the same resource on different web nodes:

    X

    X


    Slide30 l.jpg

    Web servers

    server1

    “ef6-479-8f6”

    user

    Load

    balancer

    server2

    Internet

    “da4-4e6-8f6”

    cache

    server3

    “2ee-a9f-8f6”

    Logo.gif Etag=

    “ef6-479-8f6”

    Static content, Clustering and Entity tags

    • ETags need to match across web servers, else:

    • User requests image

      • Served from server1

      • Cached in the browser for 10 minutes

    • User makes another request 10 minutes later

      • Conditional request made to server2

      • ETag differs

      • Image served again (rc200)

    Logo.gif

    ETag value


    Slide31 l.jpg

    Static content, Clustering and Entity tags

    • Entity tags (ETag) should be the same for a resource across all web nodes

    • Re-define ETag in Apache

      • ETag = mtime - size

    • Ensure file timestamps equal

      • touch -t [[CC]YY]MMDDhhmm[.SS]

    • ‘Not modified’ (304) response sent instead of ‘OK’ (200) full content

      • Not modified response uses less bandwidth


    Slide32 l.jpg

    Static content, Clustering and Entity tags

    Invalidated

    On all caches

    Re-define ETag in Apache (in stages)

    • ETag = mtime – size

    • Default is: Inode mtime size

    Bandwidth

    Change

    ETag

    Content

    Re-cached

    time


    Slide33 l.jpg

    Static content, Clustering and Entity tags

    Summary

    • In a clustered environment care must be taken with static content and the way it is deployed

    • ETag must be modified

    • Files must be stored on cloned web servers with the same timestamp on each

    • When this is done

      • Cache hit ratio improved

      • Potentially 10% Bandwidth is saved, not instant, improves over days


    Slide34 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide35 l.jpg

    Proxy

    server

    Internet

    Internet

    cache

    Dynamic Content

    • Cache static and dynamic content close to the user

    • Content can be cached in several locations:

      • Client cache

      • Content delivery network, proxy caches

      • External cache, reverse proxy

      • WebSphere Dynacache

    Reverse

    Proxy

    User

    App

    server

    Origin

    Web

    server

    Browser

    cache

    Dynacache


    Slide36 l.jpg

    Dynamic Content

    • Traditional HTTP caching: cache-control header

      • Candidates for caching include:

        • Frequently accessed pages

        • Pages that do not contain volatile data

        • Pages that contain content applicable to many users

        • Pages that do not contain sensitive information

    • Cache-Control response headers

      • max-age: defines the length of time the content remains in the cache, overrides ‘Expires’

      • no-store: pages cannot be cached at all

      • private: pages can only be cached in ‘private’ caches (browser)

    • WebSphere dynacache provides fragment caching, cache ids, ESI; for greater control


    Slide37 l.jpg

    Dynamic Content

    • Traditional HTTP caching: cache-control header

      • Apache sets a max-age cache-control header with the ExpiresActive directive for static content:

    <LocationMatch /staticcontent/.*>

    ExpiresActive on

    ExpiresDefault "access plus 90 minutes"

    </LocationMatch>

    httpd.conf

    Date: Tue, 28 Feb 2006 10:04:29 GMT

    Cache-Control: max-age=5400

    Expires: Tue, 28 Feb 2006 11:34:29 GMT

    Response headers


    Slide38 l.jpg

    Dynamic Content

    • Traditional HTTP caching: cache-control header

      • By default dynamic pages should be served with no-cache/no-store:

    GET /webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=10001&catalogId=10151 HTTP/1.1

    Host: www.halfords.com

    Expires: Thu, 01 Dec 1994 16:00:00 GMT

    Cache-Control: no-cache="set-cookie,set-cookie2"


    Slide39 l.jpg

    User

    App

    server

    Origin

    Web

    server

    Proxy

    server

    Browser

    cache

    Internet

    Internet

    Dynacache

    cache

    cache-control

    Dynamic Content

    Internet proxy servers

    • Caching controlled using HTTP headers

      • Expires

      • cache-control (s-max-age)

      • If missing, may use heuristics (rule of thumb, 20% of age)

    • Impossible to purge this kind of cache as it is not in our control!

      • must-revalidate feature


    Slide40 l.jpg

    Proxy

    server

    Internet

    Internet

    cache

    Dynamic Content

    • Reverse proxy (IBM Caching proxy)

      • Extra layer in front of WebSphere dynacache to improve performance

      • Disk cache provides more capacity than (memory) webserver plugin

      • Integration with dynacache

      • Re-write rules may cause problems

    Reverse

    Proxy

    User

    App

    server

    Origin

    Web

    server

    Browser

    cache

    Dynacache


    Slide41 l.jpg

    Reverse

    Proxy

    User

    Proxy

    server

    App

    server

    Origin

    Web

    server

    Browser

    cache

    Internet

    Internet

    Dynacache

    cache

    Dynamic Content

    • WAS5/6 Web server plugin external cache

      • Used to cache jsp page results on the Web server

      • Invalidation can be controlled from the app server cache

      • Cached content stored in memory only

        • Memory utilisation important

        • Separate copies of the cache per Apache process… ThreadsPerChild


    Slide42 l.jpg

    Dynamic Content

    Summary

    • Dynamic content can be cached

      • On the web server to reduce app server load

      • In proxy or browser caches

    • Care must be taken to identify which pages to be cached

      • Personalisation

      • Lifetime in cache

      • Whole pages or fragments

    • Caution caching on public proxy servers

      • private, no-store, must-revalidate


    Slide43 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide44 l.jpg

    Tools and monitoring

    • Is there a problem?

    • Browser proxy

    • Where are the bottlenecks?

    • What to monitor? Know thy enemy

      • Hardware information

        • Memory

        • Cpu, processes

        • Disk (logs)

        • Network

      • Web Logs

        • Process using awk and other unix tools

        • Hits, total time, %hits over 1 second, bytes served

        • HTTP Response codes

        • Interval obscures problems: 10 minutes, 1 hour, 1 day

      • Cache statistics

        • Cache monitor application

        • PMI cache statistics


    Slide45 l.jpg

    Tools and monitoring

    • Is there a problem?

    • Browser proxies

      • Fiddler

      • Page Detailer


    Slide46 l.jpg

    Tools and monitoring

    • What to monitor?

      • Hardware information

        • Memory

          • vmstat, svmon

          • lsps -a

        • Cpu, processes

          • vmstat

        • Disk (logs and static content)

          • iostat

        • Network utilisation can be estimated

          • entstat, example:

            entstat ${interface}|grep Bytes|awk '{print "tx= "$2" rcv= "$4}' >> ${outfile}


    Slide47 l.jpg

    Tools and monitoring

    • What to monitor?

      • Web Logs

        • Common Log format: host, user identity, user name, timestamp, request, response, size

      • Example awk for Home.htm hits and bytes:

        index($7,“home.htm") {homehits=homehits+1;homebytes=homebytes+$10}

        END {print “homehits= “homehits” homebytes= “homebytes}


    Slide48 l.jpg

    Tools and monitoring

    • What to monitor?

      • Web Logs

        • Common Log format: host, user identity, user name, timestamp, request, response, size

      • Example awk for HTTP Response codes (ratio of 200 to 304) :

        index($8,“home.htm") && $9 == 200 {h200=h200+1}

        index($8,“home.htm") && $9 == 304 {h304=h304+1}

        END {print “home 304 / any response ratio = “ h304/(h304+h200)}


    Slide49 l.jpg

    Tools and monitoring

    • Summary

      • Is there a problem?

      • Where are the bottlenecks?

        • What to monitor?

      • Hardware information

        • Memory, Cpu, processes, disk, Network

      • Web Logs

        • Process using awk and other unix tools

      • WAS Cache monitor application

        • PMI cache statistics


    Slide50 l.jpg

    Agenda

    • Introduction

    • Serving Web Content

    • Compression

    • Static Content, Clustering and Entity tags

    • Dynamic Content

    • Tools and Monitoring

    • The Edge

    • Summary


    Slide51 l.jpg

    Proxy

    server

    Proxy

    server

    Proxy

    server

    Proxy

    server

    Proxy

    server

    Internet

    cache

    cache

    cache

    cache

    cache

    The Edge

    • Content provider network, eg Akamai

      • Many services including content provision

        • Cache based on content metadata

        • Provide cache invalidation features

      • Serve content geographically close to the user

      • Integration with Websphere with ESI

    Congestion

    User

    Origin

    Web

    server

    Internet

    Browser

    cache


    Summary l.jpg
    Summary

    > The keys to success

    • Analyse requirements for content serving

    • Use Compression

    • Cater for Clustered Environments

    • Analyse and Monitor Performance

    • Consider Content Provider Networks


    Summary53 l.jpg
    Summary

    > The end

    Questions?

    [email protected]


    References l.jpg
    References

    Resources

    HTTP1.1 specification, useful to understand more than just the basics. details cacheing headers and recommendations for proxies caches. http://www.w3.org/Protocols/rfc2616/rfc2616.html

    AOL proxy cache set up: http://webmaster.info.aol.com/caching.html

    the Cache Now! campaign - http://vancouver-webpages.com/CacheNow/

    test your cacheability: http://www.ircache.net/cgi-bin/cacheability.py?query=http%3a//www.argos.co.uk/static/Home.htm&descend=on

    MS IE innerHTML bug: http://support.microsoft.com/default.aspx?scid=kb;en-us;319546

    General reading

    The benefits and drawbacks of HTTP compression http://www3.lehigh.edu/images/userImages/cdh3/Page_3456/LU-CSE-02-002.pdf

    HTTP1.1 response headers and their meanings wrt servlets http://java.sun.com/developer/Books/javaserverpages/cservletsjsp/chapter7.pdf

    http headers for optimal performance http://modperlbook.org/pdf/ch16.pdf

    What?s wrong with HTTP (and why it doesn?t matter) http://www.usenix.org/events/usenix99/invited_talks/mogul.pdf

    Web Cache Consistency http://www.cs.duke.edu/~chase/cps212-archive/slides/webconsist6.pdf


    ad