optimising web server utilisation and bandwidth usage
Download
Skip this Video
Download Presentation
Optimising Web Server Utilisation and Bandwidth Usage

Loading in 2 Seconds...

play fullscreen
1 / 54

Optimising Web Server Utilisation and Bandwidth Usage - PowerPoint PPT Presentation


  • 440 Views
  • Uploaded on

Optimising Web Server Utilisation and Bandwidth Usage. Alex Bagehot – Salmon Ltd. Introduction. Who am I Who is Salmon. About Salmon. Leading Systems Integrator & IBM Premier Business Partner. 16 years experience of translating leading edge technologies into viable solutions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Optimising Web Server Utilisation and Bandwidth Usage' - DoraAna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
Introduction
  • Who am I
  • Who is Salmon
slide3
About Salmon
  • Leading Systems Integrator & IBM Premier Business Partner.
  • 16 years experience of translating leading edge technologies into viable solutions.
  • Specialists in delivering solutions using IBM’s key software products (WebSphere) as well as, BEA, Sun & Oracle.
  • Experts in the management & delivery of fixed price, managed risk contracts.
  • Part of The Novell Group.
slide5
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static content, Clustering and Entity tags
  • Dynamic content
  • Tools and monitoring
  • The Edge
  • Summary
slide6
Introduction

What are web servers?

  • Machines that send content to users
  • Negotiate security and compression
  • Forwards Dynamic requests to other servers

Web

Server

Application

Server

Internet

User

slide7
Introduction

Why Optimise web servers?

  • User experience depend on response times
  • Capacity of the site
  • Cost of bandwidth
  • Is there a problem?
    • planning and monitoring
slide8
Introduction

What are we aiming to optimise?

  • Page response time
      • Network time
      • Performance of different cache types
      • Total page and image time
  • Network bandwidth utilisation
      • Capacity in terms of how much content you can serve
      • Expressed in Mbps (mega bits per second)
      • Capacity depends on the hosting hardware
        • 100Mbps ethernet
        • Gigabit ethernet
      • Avoid using over 70% of your capacity
  • Cpu utilisation
      • Use of re-write rules
slide9
Analyse

Monitor

Implement

Introduction

Repeatable and iterative performance methodology

  • Analyse and review requirements
  • Develop solutions
  • Monitor key metrics
slide10
Agenda
  • Introduction
  • Serving Web Content
  • Tools
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide11
Serving Web Content

How is content served to Users?

  • Browsers communicate via the internet with web servers
  • HTTP (HyperText Transfer Protocol) is used
  • Data is sent and received in plain text
    • It is human readable

GET /home.html

home.html

Blah blah blah

Blah blah blah

Blah blah blah

Blah blah blah

Blah blah blah

OK

slide12
Serving Web Content
  • Example Web HTTP request
  • Response Snippet

GET / HTTP/1.1

Host:www.salmon.com

HTTP/1.1 200 OK

Date: Tue, 28 Feb 2006 10:52:20 GMT

Connection: close

Content-Type: text/html; charset=UTF-8

Salmon - Salmon.com Homepage

slide13
Serving Web Content

What is HTTP?

  • Defined by the RFC 2616 standard
    • An application-level protocol for distributed, collaborative, hypermedia information systems
    • A request / response protocol in plain text
    • Includes several features
      • Message
      • Methods
      • Status codes
      • Content negotiation
      • Caching
      • Security
slide14
Serving Web Content

How it works:

  • User agent sends request including
    • Request method
    • URI (Uniform Resource Identifier)
    • Protocol version
    • Host header
    • Message (other headers, client information, body)

GET/HTTP/1.1

Host:www.salmon.com

slide15
Serving Web Content

How HTTP works:

  • The server responds with
    • Status (success or error)
    • Message including
      • Server information
      • Meta data
      • Content

HTTP/1.1 200 OK

Server: MyServer

Date: Tue, 28 Feb 2006 10:52:20 GMT

Connection: close

Content-Type: text/html; charset=UTF-8

Salmon Homepage

slide16
Serving Web Content

Headers used for:

  • Compression
  • Caching
  • Content negotiation

HTTP/1.1 200 OK

Server: MyServer

Date: Tue, 28 Feb 2006 10:52:20 GMT

Connection: close

Content-Type: text/html; charset=UTF-8

Salmon Homepage

slide17
Serving Web Content

Summary

  • Users and Web servers communicate via a plain text protocol called HTTP
  • HTTP Header information (or meta data) is used for other many features including
    • Caching
    • Compression
    • Such features
      • improve performance in terms of reduced response times and bandwidth utilisation
      • Reduce hardware requirements
      • Savings immediate, down the line
slide18
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide19
Compression

Web Pages

  • Transferred in plain text via HTTP
  • Written in using verbose languages eg. HTML, CSS, etc.
  • Can be large sizes with repeated text ideal for compression

Images

  • Transferred in binary format
  • Normally these formats are compressed already
    • GIF, JPEG

IBM HTTP Server

  • Compression must be configured
slide20
Compression - Images

Images

  • Contribute to most of the bandwidth utilization normally
  • Reduce image sizes first!
  • Then ensure that you are using a format that is compressed
    • GIF – better for simple generated images, few colours
    • JPEG – better for photo type images
  • Image compression is the responsibility of web designers and content managers
  • No specific web server configuration necessary
slide21
GET /home.html HTTP/1.1

Accept-Encoding:gzip,deflate

home.html

Blah blah blah

Blah blah blah

Blah blah blah

Blah blah blah

Blah blah blah

HTTP/1.1 200 OK

Content-Type: text/html

Content-Encoding: gzip

Compression - pages

Web Pages and other ascii content

  • Typical saving is 80% of the original size
  • Are there any specific requirements that may prevent compression?
    • Supporting specific browsers (IE on MAC)
  • Compression is negotiated using HTTP Headers:
slide22
Compression

Summary

  • Compression if not used will provide large benefits for small effort
  • Content managers and web designers must
    • Minimise image use/size where possible
    • Use appropriate compressed formats
  • Text files eg. Web pages, javascript, etc
    • Can be compressed on the web server when the request is served
    • Web server needs configuration
    • Not all files will be compressed – depends on how many clients indicate that they can accept compressed content – typically 2/3rd
  • Typical saving is 80%
    • If images are already in a compressed format the overall saving will only be 80% of the bandwidth used by pages
  • Small additional cpu cost
slide23
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide24
User

Proxy

server

Web

server

Internet

Browser

cache

Internet

cache

The ‘Origin’

Static content, Clustering and Entity tags

Overview of HTTP/1.1 Caching

  • Attempt to avoid the round-trip of sending a request, else avoid sending a full response if possible
  • Content can be cached at various locations
  • Expiration Model
  • Validation Model
slide25
logo.gif

GET /logo.gif HTTP/1.1

HTTP/1.1 200 OK

Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT

ETag: "228-a198ca00-45ncn“

Expires: Tue, 28 Feb 2006 17:00:00 GMT

Thu, 08 Sep 2005 13:05:45 GMT

ETag= "228-a198ca00-45ncn"

Static content, Clustering and Entity tags

Scenario

  • Browser requests an image that is not in the browser cache
  • Server sends the image with caching meta data
    • Caching headers:
      • Last-modified (validation)
      • ETag (validation)
      • Expires (expiration)
slide26
logo.gif

GET /logo.gif HTTP/1.1

If-None-Match: " 228-a198ca00-45ncn”

If-Modified-Since: Wed, 18 Jan 2006 13:00:24 GMT

HTTP/1.1 304 Not Modified

Last-Modified: Wed, 18 Jan 2006 13:00:24 GMT

ETag: "228-a198ca00-45ncn"

Thu, 08 Sep 2005 13:05:45 GMT

ETag= "228-a198ca00-45ncn"

Static content, Clustering and Entity tags

Scenario continued…

  • Browser requests the image again (after 17hrs 28th Feb 2006)
  • Server determines if the image is still valid
slide27
Static content, Clustering and Entity tags

Caching and Expiration

  • The server provides a TTL for a resource
  • Apache uses the mod_expires module
  • Example expiry setting:
  • Static content must be analysed for TTLs
    • Can be defined by content type (above)
    • Or by location

ExpiresActive on

ExpiresByType text/html "access plus 1 hour“

ExpiresByType image/gif “access plus 30 minutes"

slide28
Static content, Clustering and Entity tags

Most Web sites require more than 1 web server

  • As will be seen, this can cause wasted bandwidth
  • Waste occurs when static content is served from a web server cluster and is cached by users

Scenario:

Web servers

server

user

Load

balancer

server

Internet

cache

server

slide29
Static content, Clustering and Entity tags
  • Entity tags (ETag) are used in the HTTP validation model
  • How are ETags calculated?
    • Apache has a default:
      • Etag = INode - Mtime - Size
      • Encoded values are concatenated together
  • Etag value for the same resource on different web nodes:

X

X

slide30
Web servers

server1

“ef6-479-8f6”

user

Load

balancer

server2

Internet

“da4-4e6-8f6”

cache

server3

“2ee-a9f-8f6”

Logo.gif Etag=

“ef6-479-8f6”

Static content, Clustering and Entity tags

  • ETags need to match across web servers, else:
  • User requests image
    • Served from server1
    • Cached in the browser for 10 minutes
  • User makes another request 10 minutes later
    • Conditional request made to server2
    • ETag differs
    • Image served again (rc200)

Logo.gif

ETag value

slide31
Static content, Clustering and Entity tags
  • Entity tags (ETag) should be the same for a resource across all web nodes
  • Re-define ETag in Apache
    • ETag = mtime - size
  • Ensure file timestamps equal
    • touch -t [[CC]YY]MMDDhhmm[.SS]
  • ‘Not modified’ (304) response sent instead of ‘OK’ (200) full content
    • Not modified response uses less bandwidth
slide32
Static content, Clustering and Entity tags

Invalidated

On all caches

Re-define ETag in Apache (in stages)

  • ETag = mtime – size
  • Default is: Inode mtime size

Bandwidth

Change

ETag

Content

Re-cached

time

slide33
Static content, Clustering and Entity tags

Summary

  • In a clustered environment care must be taken with static content and the way it is deployed
  • ETag must be modified
  • Files must be stored on cloned web servers with the same timestamp on each
  • When this is done
    • Cache hit ratio improved
    • Potentially 10% Bandwidth is saved, not instant, improves over days
slide34
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide35
Proxy

server

Internet

Internet

cache

Dynamic Content

  • Cache static and dynamic content close to the user
  • Content can be cached in several locations:
    • Client cache
    • Content delivery network, proxy caches
    • External cache, reverse proxy
    • WebSphere Dynacache

Reverse

Proxy

User

App

server

Origin

Web

server

Browser

cache

Dynacache

slide36
Dynamic Content
  • Traditional HTTP caching: cache-control header
    • Candidates for caching include:
      • Frequently accessed pages
      • Pages that do not contain volatile data
      • Pages that contain content applicable to many users
      • Pages that do not contain sensitive information
  • Cache-Control response headers
    • max-age: defines the length of time the content remains in the cache, overrides ‘Expires’
    • no-store: pages cannot be cached at all
    • private: pages can only be cached in ‘private’ caches (browser)
  • WebSphere dynacache provides fragment caching, cache ids, ESI; for greater control
slide37
Dynamic Content
  • Traditional HTTP caching: cache-control header
    • Apache sets a max-age cache-control header with the ExpiresActive directive for static content:

ExpiresActive on

ExpiresDefault "access plus 90 minutes"

httpd.conf

Date: Tue, 28 Feb 2006 10:04:29 GMT

Cache-Control: max-age=5400

Expires: Tue, 28 Feb 2006 11:34:29 GMT

Response headers

slide38
Dynamic Content
  • Traditional HTTP caching: cache-control header
    • By default dynamic pages should be served with no-cache/no-store:

GET /webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=10001&catalogId=10151 HTTP/1.1

Host: www.halfords.com

Expires: Thu, 01 Dec 1994 16:00:00 GMT

Cache-Control: no-cache="set-cookie,set-cookie2"

slide39
User

App

server

Origin

Web

server

Proxy

server

Browser

cache

Internet

Internet

Dynacache

cache

cache-control

Dynamic Content

Internet proxy servers

  • Caching controlled using HTTP headers
    • Expires
    • cache-control (s-max-age)
    • If missing, may use heuristics (rule of thumb, 20% of age)
  • Impossible to purge this kind of cache as it is not in our control!
    • must-revalidate feature
slide40
Proxy

server

Internet

Internet

cache

Dynamic Content

  • Reverse proxy (IBM Caching proxy)
    • Extra layer in front of WebSphere dynacache to improve performance
    • Disk cache provides more capacity than (memory) webserver plugin
    • Integration with dynacache
    • Re-write rules may cause problems

Reverse

Proxy

User

App

server

Origin

Web

server

Browser

cache

Dynacache

slide41
Reverse

Proxy

User

Proxy

server

App

server

Origin

Web

server

Browser

cache

Internet

Internet

Dynacache

cache

Dynamic Content

  • WAS5/6 Web server plugin external cache
    • Used to cache jsp page results on the Web server
    • Invalidation can be controlled from the app server cache
    • Cached content stored in memory only
      • Memory utilisation important
      • Separate copies of the cache per Apache process… ThreadsPerChild
slide42
Dynamic Content

Summary

  • Dynamic content can be cached
    • On the web server to reduce app server load
    • In proxy or browser caches
  • Care must be taken to identify which pages to be cached
    • Personalisation
    • Lifetime in cache
    • Whole pages or fragments
  • Caution caching on public proxy servers
    • private, no-store, must-revalidate
slide43
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide44
Tools and monitoring
  • Is there a problem?
  • Browser proxy
  • Where are the bottlenecks?
  • What to monitor? Know thy enemy
    • Hardware information
      • Memory
      • Cpu, processes
      • Disk (logs)
      • Network
    • Web Logs
      • Process using awk and other unix tools
      • Hits, total time, %hits over 1 second, bytes served
      • HTTP Response codes
      • Interval obscures problems: 10 minutes, 1 hour, 1 day
    • Cache statistics
      • Cache monitor application
      • PMI cache statistics
slide45
Tools and monitoring
  • Is there a problem?
  • Browser proxies
    • Fiddler
    • Page Detailer
slide46
Tools and monitoring
  • What to monitor?
    • Hardware information
      • Memory
        • vmstat, svmon
        • lsps -a
      • Cpu, processes
        • vmstat
      • Disk (logs and static content)
        • iostat
      • Network utilisation can be estimated
        • entstat, example:

entstat ${interface}|grep Bytes|awk '{print "tx= "$2" rcv= "$4}' >> ${outfile}

slide47
Tools and monitoring
  • What to monitor?
    • Web Logs
      • Common Log format: host, user identity, user name, timestamp, request, response, size
    • Example awk for Home.htm hits and bytes:

index($7,“home.htm") {homehits=homehits+1;homebytes=homebytes+$10}

END {print “homehits= “homehits” homebytes= “homebytes}

slide48
Tools and monitoring
  • What to monitor?
    • Web Logs
      • Common Log format: host, user identity, user name, timestamp, request, response, size
    • Example awk for HTTP Response codes (ratio of 200 to 304) :

index($8,“home.htm") && $9 == 200 {h200=h200+1}

index($8,“home.htm") && $9 == 304 {h304=h304+1}

END {print “home 304 / any response ratio = “ h304/(h304+h200)}

slide49
Tools and monitoring
  • Summary
    • Is there a problem?
    • Where are the bottlenecks?
      • What to monitor?
    • Hardware information
      • Memory, Cpu, processes, disk, Network
    • Web Logs
      • Process using awk and other unix tools
    • WAS Cache monitor application
      • PMI cache statistics
slide50
Agenda
  • Introduction
  • Serving Web Content
  • Compression
  • Static Content, Clustering and Entity tags
  • Dynamic Content
  • Tools and Monitoring
  • The Edge
  • Summary
slide51
Proxy

server

Proxy

server

Proxy

server

Proxy

server

Proxy

server

Internet

cache

cache

cache

cache

cache

The Edge

  • Content provider network, eg Akamai
    • Many services including content provision
      • Cache based on content metadata
      • Provide cache invalidation features
    • Serve content geographically close to the user
    • Integration with Websphere with ESI

Congestion

User

Origin

Web

server

Internet

Browser

cache

summary
Summary

> The keys to success

  • Analyse requirements for content serving
  • Use Compression
  • Cater for Clustered Environments
  • Analyse and Monitor Performance
  • Consider Content Provider Networks
summary53
Summary

> The end

Questions?

[email protected]

references
References

Resources

HTTP1.1 specification, useful to understand more than just the basics. details cacheing headers and recommendations for proxies caches. http://www.w3.org/Protocols/rfc2616/rfc2616.html

AOL proxy cache set up: http://webmaster.info.aol.com/caching.html

the Cache Now! campaign - http://vancouver-webpages.com/CacheNow/

test your cacheability: http://www.ircache.net/cgi-bin/cacheability.py?query=http%3a//www.argos.co.uk/static/Home.htm&descend=on

MS IE innerHTML bug: http://support.microsoft.com/default.aspx?scid=kb;en-us;319546

General reading

The benefits and drawbacks of HTTP compression http://www3.lehigh.edu/images/userImages/cdh3/Page_3456/LU-CSE-02-002.pdf

HTTP1.1 response headers and their meanings wrt servlets http://java.sun.com/developer/Books/javaserverpages/cservletsjsp/chapter7.pdf

http headers for optimal performance http://modperlbook.org/pdf/ch16.pdf

What?s wrong with HTTP (and why it doesn?t matter) http://www.usenix.org/events/usenix99/invited_talks/mogul.pdf

Web Cache Consistency http://www.cs.duke.edu/~chase/cps212-archive/slides/webconsist6.pdf

ad