1 / 34

Internet / Intranet CIS-536

Internet / Intranet CIS-536. Class 4 Web Server Technology HTTP Protocol Log Files. Class 4 Agenda. Discuss Homework Overview of Web Servers and Server Technology HTTP The Protocol For Communication Between Web Browser and Server Log Files. Web Servers.

burrm
Download Presentation

Internet / Intranet CIS-536

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet / IntranetCIS-536 Class 4 Web Server Technology HTTP Protocol Log Files

  2. Class 4 Agenda • Discuss Homework • Overview of Web Servers and Server Technology • HTTP • The Protocol For Communication Between Web Browser and Server • Log Files

  3. Web Servers • A Basic Web Server is Just a File Server • Client Requests a File via HTTP Protocol • Server Delivers the File via HTTP Protocol • Server Maps URL to a Subdirectory • Web Server Needs Appropriate Permissions to Access Files/Directories • Supports Non-HTTP Protocols • FTP, Gopher, etc. • A Web Server is Not HTML Specific • Typically Identifies a Filetype by Extension • Or Directory Where File Exists

  4. Additional Common Web Server Features • Additional Security Beyond That Provided by O/S • Scripting • Ability to Dynamically Create a Web Page • Run a Program Instead of Returning a File (CGI) • Return the Program Output as the Requested File • Administration • Log Files • Performance Monitoring

  5. Advanced Web Server Features • Virtual Hosting • Allow Multiple URL’s to Map to Same Computer • Performance Optimization • Caching • Reliability • Scalability • Proxy Servers (For Security and Performance) • Fetch Documents That are on Other Computers • Cache Them Locally • Allows for Easy Scalability • Multiple Proxy Servers Can Cache Documents From One Source Computer • Embedded Scripting • Server Side Includes • Custom Scripting Languages • Server API

  6. Web Servers – Added Functionality • Database Connectivity • SQL, MySQL • Directory Listings • Icons, etc. • Built-In Search Engines • Built-In ImageMap Handling • Multimedia Support • Session Emulation • Streaming Multimedia • Advanced Security • Encrypted HTTP • S-HTTP (Secure HTTP) – CommerceNet • SSL (Secure Sockets Layer) - Netscape • Web Server “Add-Ons” • CGI Substitutes / CGI Optimizations • Cold Fusion

  7. Web Server History • All Web Servers Have a Common Root • httpd (NCSA) • UNIX Orientation • Many Features are Essentially UNIX Features • Apache • Website (O’Reilly) • Netscape Enterprise Server • Microsoft Internet Information Server • A Slew of Others

  8. Apache • UNIX Origins – Now Ported to NT • Evolved From httpd • Freeware • Typical UNIX Application • Public Source Code • Many Defaults, Conventions • BUT: All is Configurable • No GUI Interface • Configured via Scripts, Shell Commands, Config Files • Various “Flavors” • Many Optional Features • API • ApacheSSL

  9. IIS / Netscape • Microsoft IIS • Not Strictly Derived From httpd/Apache • Windows NT • However: Functionally Very Similar to Apache • Emulates Many UNIX Conventions • E.g. Forward Slashes • Configuration via GUI • Personal Web Server • Peer Web Server • Netscape • Multi-Platform • UNIX is Preferred Platform • Less “Open” Than Apache • More Secure?

  10. UNIX File Structure • Forward Slashes (/) to Separate Filenames, Directories • Case Sensitive File Names • Windows is Not • No Limit on Filename Size / Extensions • Extensions are by Convention • Root is “/” • User Home Directory is: “~/” • Symbolic Links / Aliases • Directories Can Be Spread Over Multiple Drives • Can Create Non-Hierarchical Structure • File Permissions • Read, Write, Execute • Separate Permissions for Owner, Group, All • Directories are Special Cases of Files • Execute Permissions = Able to Browse Directory

  11. Web Server Configuration • Directory Structure • Virtual Document Tree • Access to User Directories • UNIX: ~user • Symbolic Links • Be Careful: May Link You Out of Directory Structure • Case Sensitivity • Ownership Access • Server is a Process Started by a User. • Has the Permissions of the User Who Started It. • Default Documents • Allow Directory Browsing • Scripting • Who is Allowed to Run Scripts? • How are Scripts Identified?

  12. Web Server File Access Control / Security • Directory • O/S Level Security • IP, Domain Level Security • Spoofing • Directory Access • .htaccess • Microsoft Front-Page Extensions • Encryption • S-HTTP • Web Protocols Only • SSL • TCP/IP Level • V1.0 – V2.X : Security Holes Found, Fixed • V3.0 Is Current • Uses Port 443 • Microsoft PCT • Response to Holes in SSL 2.0 • Now Use SSL

  13. Server Administration • Need Sysadmin and O/S Expertise • Lots of “Holes” Gotchas Whenever Scripts are Allowed • FTP • Who is Allowed to Change Documents? • Who is Allowed to Change Server Configuration? • How do They Get Access? • Direct Access • Remote Access (e.g. FTP) • Log Files • Accessibility • Directory Structure • Management

  14. HTTP • The Protocol For Requesting and Delivering Web Pages • Not Restricted to Returning HTML Files • Client Server Model • Request / Reponse • TCP/IP Protocol Using Port 80 • Supports Other Ports, Can Be Run Over Other Protocols • “Replaced” FTP as the Primary Method For Internet File Transfer • Stateless • Uses MIME Format to Encapsulate Data • Message Structure Similar to SMTP Mail Messages • Message Header (metadata) • Message Body (data) • Separated From Header by a Blank Line • Browser Only Displays Body, Not Header • No Restrictions on Message Size / Format (as with SMTP)

  15. HTTP Versions • HTTP 1.0 - Commonly Used Version • HTTP 1.1 • Formalizes Many Extensions to Version 1.0 • Supports Persistent Connections • Supports Compression/Decompression • Supports Virtual Hosting • Single Server With Multiple IP Addresses • Supports Multiple Languages • Supports Byte Range Transfers • Useful For Re-Sending Interrupted Data Transfers • Similar to Process Used By XMODEM, etc.

  16. HTTP OVERVIEW HTTP Request Client (Browser) Web Server File System HTTP Response HTML HTML CGI Server Application HTML

  17. HTTP Commands • Simple Structure • Main Methods • GET <URI> HTTP/1.0 • Request the File Specified By the URL • URI is URL Without Protocol/Port • HEAD • Request the HTTP Header Information Only • Don’t Return the File Itself • POST • Sends Data to The Server • Typically Data From a Form • Defined, But Not Widely Implemented • PUT • DELETE • LINK • UNLINK

  18. Common HTTP Header Fields • Additional “Parameters” to the HTTP Commands • Used in HTTP Requests: • Accept • Lists the MIME Types That Client Can Accept • E.g. Accept text/plain, text/html or Accept * • Accept-Charset • Lists Accepted Character Sets That Client Can Accept • ASCII, ISO-8859-1 Are Assumed • Accept-Encoding • Accept-Language • Authorization • Basic – UserName:Password (Base64 Encoding) • Cookie • From • E-mail Address of Requesting User • Not Typically Used For Privacy Reasons • Primarily Used By Automated Clients (e.g. Bots)

  19. Common HTTP Header Fields (2) • Host • Virtual Host – One Server Handles Multiple Sites • If-Modified-Since • Only Return Data if it Has Been Modified Since This Date • Pragma • General Purpose For “Additional” Headers Not in Standard • Referrer • The URL That Referred One to This URL • User-Agent • Name/Version of the HTTP Client • Used in HTTP Responses: • Allow • Lists the Available Commands Supported by Server • Content-Encoding • Allows for Passing Data in Compressed Formats • Content-Language • Describes the Natural Language of the Intended Audience

  20. Common HTTP Header Fields (3) • Content-Length • Size of the Message Body • Content-Type • The MIME Type For the Data • Date • Expires • HTTP Clients Should Not Cache Data After This Date • Last-Modified • Location • Used For Redirection • MIME-Version • Pragma • E.g. no-cache • Retry-After • When Server is Unavailable. Info On When to Try Back • Server • Name/Version of the HTTP Server

  21. Common HTTP Header Fields (4) • Title • Descriptive Title of the File • WWW-Authenticate • When Authorization Denied, Tells Client Which Methods of Authentication are Supported • HTTP Status Codes • Returned By the Server In First Line of Response • Informational (100-199) • Successful (200-299) • Redirection (300-399) • Location in HTTP Header Specifies Redirection • Client Error (400-499) • Server Error (500-599)

  22. Common Status Values • 200 – OK • 201 – Created (Post Request Was Fulfilled) • 204 - No Content (OK. Nothing For Client to Display • 300 - Multiple Choices • Requested Resource Available From Multiple Locations. • List of Locations Returned in the Response. • 301 - Moved Permanently • 302 - Moved Temporarily • 304 - Not Modified • Document Hasn’t Been Modified Since If-Modified Since Date • 400 - Bad Request • 401 – Unauthorized • 403 - Forbidden • 404 – Not Found • 500 – Internal Server Error • 501 – Not Implemented (Server Does Not Support ThisRequest) • 502 – Bad Gateway (Invalid Response From Server) • 503 – Service Unavailable

  23. Cookies • Cookies Are Name Value Pairs • Stored by the Client • Passed in the HTTP Header • Cookies Have Associated Expiration • Session (Default) • Date / Time • Associated With a URL Path, Not a Page! • Allows Passing Parameters Between Web Pages • Thus Cookies are Used to Provide State Information to a Stateless Protocol

  24. Web Server HTTP Functionality • Content Negotiation • Choose From Several Different Formats Based on Request • Language Negotiation • Choose From Versions of Same Document Based on Request • Support for HTTP-Put, HTTP-Delete • Keep-Alive • As-Is • Server Doesn’t Add HTTP Headers • Allows You to Create Specific Behavior • Redirect to Another Site • Never Saved in Browser’s Cache

  25. Some Definitions • Hits • Each HTTP Request is a Hit • Accessing a Web Page May Result in Multiple Hits • E.g. Each Graphic is a Hit • Page Views • Accessing a Single Web Page is a Page View • E.g. Typing in a URL or Clicking on a Link • Visits • A Single Client’s Visit to Your Entire Site (Session) • May Include Multiple Page Views • What Constitutes a Second Visit From the Same Client? • Why is This Important? • Terms are Sometimes Used Interchangeably and Improperly • Compare Apples to Apples • Important for Commercial Web Sites • Advertising is Based on Site Access • Typically Sold on Page View Basis

  26. Server Log Files • Many Variations to Web Server Log File Formats • Four Log Files • Access (Transfer) Log • Each Hit is Recorded • User, Date/Time, HTTP Request, etc. • Error Log • Date/Time, Error • Referrer Log • Referring Page, Destination Page • Agent (User) Log • Client’s Browser • Clearly a Need for Standardization • Linking the Four Log Files Together

  27. Common Log Format • Host • IP Address (or Hostname) of Client • Some Servers Perform Lookup of IP Address • RFC931 • HTTP Request: From • Seldom Used. • Authuser • HTTP Request: Authorization • UserName if Username Authorization is Required • Time Stamp • HTTP Response: Date • E.g. [ 10/Jun/1998:14:23:34 -0700] • Request • The Actual HTTP Request • E.g. GET /index.htm HTTP/1.1

  28. Common Log Format (2) • Status • The HTTP Response Status Code • Transfer Volume • HTTP Response: Content-Length

  29. Extended Log File Format • Seven Common Log Format Fields Plus • Referrer • HTTP Request: Referrer • User Agent • HTTP Request: User-Agent • Identifies Browser • Other Common Fields • Cookies • Can Help Identify Users

  30. Issues • Client vs. User • Typically Don’t Have User Level Information • Only Record IP Address of Computer Used For Access • If Fixed IP Address For a Single User’s Machine • This Can Identify the User • Dynamically Assigned IP Addresses • Identifies the Overall Domain (e.g. AOL.com) • Proxy Servers • All Client’s Have IP Address of Proxy Server • Multiple “Sessions” at Same Time • Impossible to Have Truly Accurate Information • Log File Analysis Software Has Algorithms to Identify Page Views, Visits • Client Level Caching Affects Logs • “ISP” Level Caching Affects Logs • E.g. AOL Maintains a Cache • No Requirement for Clients, ISPs to Follow Expiration Info

  31. Log File Maintenance on Server • Log Files Grow Rapidly • Log Files Compress Very Nicely • Server Configurable • Generate Daily/Weekly/Monthly Logs • Maintenance Scripts to Cleanup Log Files • Compress • Archive • Cycle • E.g. Maintain Current Months Files

  32. Log File Analysis • Big Business • Bread and Butter of Sites Driven By Advertising Revenue • Evaluation Factors • Log File Formats Supported • Ability to Link Multiple Logs • How Log Files are Accessed (e.g. via FTP) • Display Methodology • E.g. Available Via Web Pages • Lookup Capabilities • E.g. Map User-Agent to Browser • E.g. Resolve IP Addresses to Domains, Regions • Level of Analysis • E.g. Calculating Visits, Return Visitors • Configurability • Drill-Down Capabilities • Enterprise Capabilities • Ability to Manage Multiple Sites

  33. Log File Analysis Options • Important to Understand the Core Log Files • Log File Analysis Programs Make Some Assumptions • Freeware • Commercial • Service Bureaus

  34. Resources • HTTP • Server Comparison • http://webcompare.internet.com/chart.htm • Apache Server • www.apache.org • Website Server • http://website.ora.com • Microsoft IIS http://www.microsoft.com/NTWorkstation/downloads/Recommended/ServicePacks/NT4OptPk/Default.asp

More Related