web basics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Web basics PowerPoint Presentation
play fullscreen
1 / 17

Web basics

130 Views Download Presentation
Download Presentation

Web basics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Web basics • HTTP • http://www.ietf.org/rfc/rfc2616.txt • http://www2002.org/CDROM/refereed/444/ • URI/L/Ns • http://www.ietf.org/rfc/rfc2396.txt • HTML • http://www.w3.org/TR/html401/

  2. HTTP operationBasic (top) vs. with Intermediaries Request User Agent Origin Server Response Request chain User Agent Origin Server Response chain Intermediaries: Proxies, gateways, tunnels

  3. HTTP Terminology • User Agent (UA): program acting on behalf of user. • Resource: data object or service identified by a URI. • Origin server (OS): server originating a resource • Connection: transport session initiated by UA (but not always direct to OS). Typically TCP or SSL.

  4. HTTP Terminology • Message: formatted sequence of bytes: • Request: from client to server • Response: from server to client • Message = startline + headers + body

  5. GET /index.html HTTP/1.1 Host: www.hello.ucsc.edu User-Agent: Mozilla <blank line> HTTP/1.1 200 OK Content-Length: 45 Content-Language: en-us Content-Type: text/html <html> <body> Hello world </body> </html> Request and response messages

  6. Requests • GET, HEAD, POST • PUT, DELETE • OPTIONS, TRACE, CONNECT

  7. Common request headers • Host (required), User-Agent • Referer • Authorization • If-Modified-Since, Cache-Control • Accept[-Language/-Charset/-Encoding]

  8. Common response codes • 200 OK • 301 Moved permanently, 307 Moved tmp • 400 Bad request • 401 Unauthorized, 403 Forbidden • 404 Not found • 500 Internal Server Error

  9. Common response headers • Content-Type, Content-Length, Content-Language • Date, Last-Modified, Expires • Location [for 3xx responses] • Server

  10. Response generationTheory (top) vs. practice Resource Variant Instance Entity Message Selection (negotiation, UA optimization) Content encoding (gzip) Instance manipulations (range, delta) Transfer encoding (chunking, encryption) Resource Variant/Instance Message Selection (UA optimization) Understanding the full model is necessary for a good understanding of caching, but we are going to ignore caching

  11. Cookies • Not part of official HTTP spec, but see: • http://www.ietf.org/rfc/rfc2109.txt • http://www.ietf.org/rfc/rfc2965.txt • Adding state to “stateless” protocol • OS adds Set-Cookie header to response: • Set-Cookie: sid=113a8fbc;version=1;path=/ • UA adds Cookie header to future requests: • Cookie: sid=113a8fbc;$version=1;$path=/

  12. URI/L/N • Universal Resource… • Name: a persistent identifier • (Under development) • Locator: (perhaps transient) locator information • Typically: address plus access method • Identifier: either a URN or URL • RFC2396 provides syntactic rules that all URIs must obey

  13. HTTP URLs • http://host:port/path?query • “Fragments” are not strictly part of URLs • Relative URIs • Canonicalization • Aggressively avoid false distinctions • But always keep a working URL

  14. HTML • Do a bit of review on the way frames and Javascript work

  15. Problems for Archiving • Links obscured by increasing use of Flash, Javascript, DHTML, PDF, Word, … • Soft-404’s, 30x’s (Big pain!!) • Great example of non-cooperation • Browser-specific content • Servers lie about content • E.g., incorrect or missing Content-Type

  16. Problems for Archiving • Aliasing • Material is copied • Host has multiple names (www.foo.com and foo.com typically the same) • Resource has multiple names (e.g., case-insensitivity)

  17. Problems for archiving • And this ignores spamming!