Analyzing web server log files
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Analyzing Web Server Log Files PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Analyzing Web Server Log Files. Eric Landrieu e-mail: [email protected] Lead Developer, PerfMan for Web Servers The Information Systems Manager, Inc. Growth of Web Server. Has become a vital part of the business model

Download Presentation

Analyzing Web Server Log Files

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Analyzing web server log files

Analyzing Web Server Log Files

Eric Landrieu

e-mail: [email protected]

Lead Developer, PerfMan for Web Servers

The Information Systems Manager, Inc.


Growth of web server

Growth of Web Server

  • Has become a vital part of the business model

  • Internet web servers must be reliable, as they are truly an international 24x7x365 sales mechanism

  • Content of site(s) can be just as damaging in user’s eyes as poor performance – we have a 2-edged sword


So how do we monitor the web server

So how do we monitor the web server?

  • OS-level tools

    • Performance Monitor (Windows NT)

    • SMF, RMF (OS/390)

    • Third-party offerings

  • “Active” web site monitors (give a client-side view of the site)

  • Database/Application monitoring

  • Web server log files


So how do we monitor the web server1

Database

Health

OS Statistics

Active Site

Monitoring

Log File

Analysis

So how do we monitor the web server?

  • No one method can give you the whole picture on your web server’s health and performance


What s in the log files

What’s in the Log Files?

  • View of client-server “transactions” – client request, with the server response

  • Multiple “transactions” can be required for a web page

GET /parking/space.asp

404 File Not Found


What s in the log files1

What’s in the Log Files?

  • Each “transaction” is totally separate in the log file

  • Any “user-level” data must be manually grouped using criteria available in the particular log file


So what is in these log files

So what is in these log files?


Information in the log files

Information in the log files

  • Client IP - Usually the IP address, but can be resolved to DNS by the web server (not recommended)

  • File requested by client (including directory)

  • Method used in request (GET, POST, etc.)


Information in the log files1

Information in the log files

  • Return Code - was it successful, and if not, why?

  • Bytes Sent back to the client in the response

  • Referring URL – where did the user find the link to this request?

  • Browser String telling what browser is being used


Information in the log files2

Information in the log files

  • Username - anonymous or authenticated access

  • Cookie – The cookie relating to this “transaction”, if any

  • Bytes Received by the server in the request

  • Time Taken by the server to process the request


Standardized log formats

Standardized Log Formats

  • Common Log Format (CLF)

  • Extended Common Log Format

  • W3C Standard

  • Other formats may be product-specific, and many are extensions of the CLF or Extended CLF formats.


Common log format

Common Log Format

  • Advantages

    • Supported by just about every web server ever written

  • Disadvantages

    • Inflexible

    • Contains very limited data: no Bytes Received, Time Taken, User agent (Browser), or Referer fields available.


Common log format1

Common Log Format

64.12.105.154 - - [16/Feb/2001:06:59:35 -0800] "GET /cgi-bin/Count.cgi?df=gecbhome&dd=B HTTP/1.0" 404 211

64.12.97.10 - - [16/Feb/2001:06:59:37 -0800] "GET /java/FixFontHeadline.class HTTP/1.0" 200 2898

64.12.97.9 - - [16/Feb/2001:06:59:43 -0800] "GET /graphics/trombone.gif HTTP/1.0" 200 1050

64.12.96.206 - - [16/Feb/2001:06:59:58 -0800] "GET /images/joinband.jpg HTTP/1.0" 200 13457

64.12.97.9 - - [16/Feb/2001:07:00:30 -0800] "GET /images/parade.jpg HTTP/1.0" 200 22754

128.93.11.53 - - [16/Feb/2001:10:20:53 -0800] "GET /schedule.shtml HTTP/1.0" 200 7103

128.93.11.53 - - [16/Feb/2001:10:26:48 -0800] "GET /index.shtml HTTP/1.0" 200 8650

128.93.11.53 - - [16/Feb/2001:10:21:18 -0800] "GET /about.shtml HTTP/1.0" 200 9151

128.93.11.53 - - [16/Feb/2001:10:26:25 -0800] "GET /communty.shtml HTTP/1.0" 200 5731

128.93.11.53 - - [16/Feb/2001:10:18:25 -0800] "GET /join.shtml HTTP/1.0" 200 5056

128.93.11.53 - - [16/Feb/2001:10:24:53 -0800] "GET /write.shtml HTTP/1.0" 200 9633

128.93.11.53 - - [16/Feb/2001:10:54:05 -0800] "GET /robots.txt HTTP/1.0" 404 204


Extended common log format

Extended Common Log Format

  • Adds User Agent (Browser) and Referrer to Common Log Format

  • Advantages

    • Most web servers support it

    • More information available than CLF

  • Disadvantages

    • Still no Time Taken or Bytes Received

    • Still inflexible


Extended common log format1

Extended Common Log Format

64.12.105.154 - - [16/Feb/2001:06:59:35 -0800] "GET /cgi-bin/Count.cgi?df=gecbhome&dd=B HTTP/1.0" 404 211

"http://www.mycommunityband.org/" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)"

64.12.97.10 - - [16/Feb/2001:06:59:37 -0800] "GET /java/FixFontHeadline.class HTTP/1.0" 200 2898 "-" "Java 1.1"

64.12.97.9 - - [16/Feb/2001:06:59:43 -0800] "GET /graphics/trombone.gif HTTP/1.0" 200 1050

"http://www.mycommunityband.org/" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)"

64.12.96.206 - - [16/Feb/2001:06:59:58 -0800] "GET /images/joinband.jpg HTTP/1.0" 200 13457

"http://www.mycommunityband.org/join.shtml" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)"

64.12.97.9 - - [16/Feb/2001:07:00:30 -0800] "GET /images/parade.jpg HTTP/1.0" 200 22754

"http://www.mycommunityband.org/about.shtml" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)"

128.93.11.53 - - [16/Feb/2001:10:20:53 -0800] "GET /schedule.shtml HTTP/1.0" 200 7103 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:26:48 -0800] "GET /index.shtml HTTP/1.0" 200 8650 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:21:18 -0800] "GET /about.shtml HTTP/1.0" 200 9151 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:26:25 -0800] "GET /communty.shtml HTTP/1.0" 200 5731 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:18:25 -0800] "GET /join.shtml HTTP/1.0" 200 5056 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:24:53 -0800] "GET /write.shtml HTTP/1.0" 200 9633 "-“

"xyro_([email protected])"

128.93.11.53 - - [16/Feb/2001:10:54:05 -0800] "GET /robots.txt HTTP/1.0" 404 204 “-” “-”


W3c extended log format

W3C Extended Log Format

  • http://www.w3.org/TR/WD-logfile

  • Advantages

    • Very Flexible

    • Extensible

  • Disadvantages

    • Not as universally supported by web servers


W3c extended log format1

W3C Extended Log Format

#Software: Microsoft Internet Information Services 5.0

#Version: 1.0

#Date: 2001-03-18 05:01:20

#Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-version cs-host

cs(User-Agent) cs(Cookie) cs(Referer)

2001-03-18 05:01:20 144.249.14.154 - 144.249.252.75 GET /Default.asp - 200 40606 253 16 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd09351121e -

2001-03-18 05:01:21 144.249.14.154 - 144.249.252.75 GET /corporate.css - 304 160 436 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/

2001-03-18 05:01:21 144.249.14.154 - 144.249.252.75 GET /images/vDivider2.gif - 304 209 444 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/

2001-03-18 05:01:21 144.249.14.154 - 144.249.252.75 GET /images/toc_quicklink.gif - 304 209 448 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/

2001-03-18 05:01:21 144.249.14.154 - 144.249.252.75 GET /images/region_am.jpg - 304 209 444 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/

2001-03-18 05:01:21 144.249.14.154 - 144.249.252.75 GET /images/orange_square_bullet.gif - 304 209 455 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/

2001-03-18 05:01:22 144.249.14.154 - 144.249.252.75 GET /corpnews/images/org_pointer_2.gif - 304 209 456 0 HTTP/1.1 entry.corp.com

Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95)

SITESERVER=ID=547754cdab354b60fcd92cd09351121e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF http://entry.corp.com/


Which format s does my web server support

Which Format(s) Does My Web Server Support


Which format s does my web server support1

Which Format(s) Does My Web Server Support


Limitations

Limitations

-or-

Why we can’t ignore other sources of information


Log file limitations

Web

Site

Log File Limitations

Client

  • Not enough information to get the whole picture on the site’s performance and health

    • We need to correlate the log data with other sources.

      • OS-level statistics (Performance Monitor, SMF, 3rd party)

      • “Active” web analysis (e.g. Keynote)

      • Data on databases or other components of the site

Internet

Web

Server

Back

End

DB


Log file limitations1

Web

Site

Log File Limitations

Client

  • Not enough information to get the whole picture on the site’s performance and health

    • We need to correlate the log data with other sources.

      • OS-level statistics (Performance Monitor, SMF, 3rd party)

      • “Active” web analysis (e.g. Keynote)

      • Data on databases or other components of the site

Internet

Web

Server

Back

End

DB


Log file limitations2

Web

Site

Log File Limitations

Client

  • Not enough information to get the whole picture on the site’s performance and health

    • We need to correlate the log data with other sources.

      • OS-level statistics (Performance Monitor, SMF, 3rd party)

      • “Active” web analysis (e.g. Keynote)

      • Data on databases or other components of the site

Internet

Web

Server

Back

End

DB


Log file limitations3

Web

Site

Log File Limitations

Client

  • Not enough information to get the whole picture on the site’s performance and health

    • We need to correlate the log data with other sources.

      • OS-level statistics (Performance Monitor, SMF, 3rd party)

      • “Active” web analysis (e.g. Keynote)

      • Data on databases or other components of the site

Internet

Web

Server

Back

End

DB


Log file limitations4

Web

Site

Log File Limitations

Client

  • Only when fit together with the other pieces do we get the complete picture of your total web site health.

Internet

Web

Server

Back

End

DB


Log file limitations5

Log File Limitations

  • You may also have to deal with log file formats which don’t include all of the information that you would like.

Bytes

Received

Time

Taken

Common

Log

Format

User

Agent

Referrer


Issues with log files

Issues With Log Files

  • User or Session level statistics

  • Caching

  • Clustering

  • What constitutes a “site”?


User or session level statistics

User or Session Level Statistics

  • The server doesn’t give you statistics for the user (e.g. how long were they on the site?)

  • You have to mine these yourself from the data available

  • You will only be able to get approximations with this data, not exact figures


How do we group records for user level statistics

How do we group records for user-level statistics?

  • Client’s IP address

    • Proxy Servers and firewalls with Network Address Translation (NAT) will make all users from behind the firewall look like one user

    • If the proxy or firewall has multiple IP addresses (or it is an array), multiple accesses of site from one user may look like multiple users


How do we group records for user level statistics1

How do we group records for user-level statistics?

  • Cookies

    • If the site assigns cookies to track users through the site, you can group the records based upon the cookie

    • Users who disable cookies on their browser mess this up

    • Not all log file formats include the cookie


How do we group records for user level statistics2

How do we group records for user-level statistics?

  • User name

    • Useful for intranet, but you must have the server disallow anonymous access

    • Impractical for most internet sites (except restricted access)


Caching

Caching

  • Content from the web site may be cached outside of the web server

  • The web server may not get notification of requests for content that are serviced by these caches

  • The caches may be in Proxy Servers, Browsers, or elsewhere


Clustering

Clustering

  • Each server in a web cluster may maintain its own log file

  • You have to combine the log files to get information relevant to the entire site

  • One user accessing your site may get data from multiple servers

  • You may still want information on each individual server, to verify that they are load-balancing properly


What constitutes a web site

What constitutes a web site?

  • You have to decide exactly what you want to call a site:

    • A load-balanced cluster

    • A single site running on a dedicated server

    • A single site on a server running multiple sites

    • A directory within a site on a server

    • Multiple servers which act as your web presence (home, support, e-commerce…)


What good is analyzing log files

What good is analyzing log files?

  • OS-level analysis can’t:

    • Provide user (session)-level info

    • Break down by return code, file type or name, directory, etc.


What good is analyzing log files1

What good is analyzing log files?

  • “Active” monitoring:

    • Gives the client-side perspective

    • May not distinguish between a slow link/router and a slow response from server

    • Some are concerned only with response to the testing system, not server load

    • If a browser-based product, it may have troubles with browser incompatabilities


So what s the key to analyzing log files

So what’s the key to analyzing log files?

  • Grouping your log file records into useful statistics that will help you understand what is going on with your site


Example 404 errors

Example: 404 Errors

  • When a user gets a 404 Error (File Not Found), they may perceive a lack of “professionalism” or “quality” with your site.

  • You want to know not only what non-existent files are being requested, but why they are being requested (outdated link?)


Example 404 errors1

Example: 404 Errors


Example 404 errors2

Example: 404 Errors


Example user session time

Example: User Session Time

  • You want to get as useful an approximation as is possible for how long users are staying at your site (at least, marketing will)

  • Obviously, the longer they are browsing your site, the more interested they may be in what you have to offer

  • You can use their first and last requests for files to get a rough approximation


Example user session time1

Example: User Session Time

  • Most sessions were very short (1-2 pages)

  • This was an “Entry server” cluster, which passed off to other sites

  • A few (<20% of total sessions) were very long


Example cluster load balancing

Example: Cluster Load-Balancing

  • Ideally, your clustered servers for the site would be sharing the load equally

  • If one server is carrying a larger load, it can lead to overall perceived slowdown of your site (most people going to a heavily loaded server while an idle server sits and does nothing)


Example cluster load balancing1

Example: Cluster Load-Balancing


Example cluster load balancing2

Example: Cluster Load-Balancing


So what should i take out of this

So What Should I Take Out Of This?

-or-

Is there a point???


Summary

Summary

  • Web server log file analysis is an important part of your monitoring of your web servers

  • Log file analysis alone will not give you the complete picture of your web server, but you can’t get the complete picture without it

  • Know what is useful in the log files, what limitations are inherent in them, and how to analyze them


  • Login