1 / 50

URIs and RFC 3986

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012). URIs and RFC 3986. Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah. Goals. Learn the detailed design of URIs See how the naming principles we’ve explored are reflected in URIs

kelvin
Download Presentation

URIs and RFC 3986

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012) URIsandRFC 3986 Noah Mendelsohn Tufts UniversityEmail: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

  2. Goals • Learn the detailed design of URIs • See how the naming principles we’ve explored are reflected in URIs • Learn to read RFCs and to study the art of writing specifications • Understand why grammars are important

  3. ReviewWeb Architecture Basics

  4. Architecting a universal Web • Identification: URIs • Interaction: HTTP • Data formats: HTML, JPEG, etc.

  5. What Happens When We Browse a Web Page? Consider:What are all the things that are “named”in the interaction between browser and Web Server?

  6. The user clicks on a link URI is http://webarch.noahdemo.com/demo1/test.html URI is http://webarch.noahdemo.com/demo1/test.html

  7. The http “scheme” tells client to send HTTP GET msg URI is http://webarch.noahdemo.com/demo1/test.html URI is http://webarch.noahdemo.com/demo1/test.html HTTP GET

  8. demo1/test.html GET /demo1/test.html HTTP/1.0 Host: webarch.noahdemo.com User-Agent: Noahs Demo HttpClient v1.0 Accept: */* Accept-language: en-us The client sends an HTTP GET URI is http://webarch.noahdemo.com/demo1/test.html HTTP GET Host: webarch.noahdemo.com

  9. HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html> The server sends an HTTP Response HTTP GET HTTP Status Code200 Means Success! demo1/test.html Host: webarch.noahdemo.com HTTP RESPONSE

  10. HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html> The server sends an HTTP Response HTTP GET demo1/test.html Host: webarch.noahdemo.com The “representation” returned is an HTML document HTTP RESPONSE

  11. Architecting a universal Web • Identification: URIs • Interaction: HTTP • Data formats: HTML, JPEG, etc.

  12. Assign URIs for all Resources • A resource is something that has information (e.g. a Web page) • If a resource doesn’t have a URI, you can’t link to it…it’s not part of the Web.

  13. Review:Naming Questions 14

  14. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too few names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of names

  15. The Structure of URIs 16

  16. A simple URI http://uss.tufts.edu/stuserv/acadcal/

  17. A simple URI http://uss.tufts.edu/stuserv/acadcal/

  18. A simple URI Scheme http://uss.tufts.edu/stuserv/acadcal/

  19. Schemes Scheme http://uss.tufts.edu/stuserv/acadcal/ mailto:noah@cs.tufts.edu Schemes let us name different kinds of things, accessed in different ways.

  20. A simple URI Authority http://uss.tufts.edu/stuserv/acadcal/ Authority: who controls allocation of this name?

  21. A simple URI // Fixed in grammar to indicate authority follows http://uss.tufts.edu/stuserv/acadcal/

  22. A simple URI Path http://uss.tufts.edu/stuserv/acadcal/ Path: provides for hierarchical naming…… also supports “../xxx” relative syntax?

  23. A simple URI Path http://uss.tufts.edu/stuserv/acadcal/ Path: provides for hierarchical naming…… maps well to heirarchicalinformation systems

  24. A more complex URI http://www.tufts.edu?student=smith

  25. Query component A more complex URI http://www.tufts.edu?student=smith The query is part of the URI... However, in many cases, all URIs with a common path are processed by the same server-side code Also…HTML forms are useful for filling in the query components

  26. Fragments identify parts of documents Fragments http://tools.ietf.org/html/rfc3986#section-3.5 Fragment interpretation depends on the media type of the returned representation (text/html)…this is useful but tricky and causes a variety of problems.

  27. Characteristics of URIs 28

  28. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs

  29. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Both supported Some characteristics of URIs

  30. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Depends on scheme Some characteristics of URIs

  31. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Yes…URIs “on the side of a bus”is an important goal…but some URIs are complex Some characteristics of URIs

  32. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs Allowed but not required

  33. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? With most schemes, absolute URIs are global Some characteristics of URIs

  34. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? FILE: scheme is not global! Some characteristics of URIs

  35. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? NO!!Status code 404 is key to Web scalability Some characteristics of URIs

  36. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some aliases requirede.g.: http vs. HTTP…worst cases depend on users Some characteristics of URIs

  37. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Depends on scheme and user…see Metadata in URI finding Some characteristics of URIs

  38. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? URIs are the structuringmechanism for the Web as a whole Some characteristics of URIs

  39. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Designed to allow mappings to hierarchical systems Some characteristics of URIs

  40. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs Decentralized allocation except:

  41. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs scheme names centrally registered with IANA

  42. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs For http and mailto schemes:central Domain Name (DNS) registration required for authority

  43. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs Yes. E.g.: ASCII-only, spaces and some punctuation must be %encoded

  44. Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? Some characteristics of URIs URIs are silent on this…but HTTP redirection provides for indirect identification

  45. Grammars 46

  46. What are formal grammars? • Grammars are formal languages for specifying other languages • A grammar allows you to: • Always: determine whether a given string is “in” the specified language • Often: associate structures in the grammar with parts of the string • The Chomsky hierarchy: • Different grammars have different expressive power • Regular expressions recognize “regular languages” (ab*)  a, ab, abb, abbb • Context-free grammars are more powerful: typically used for programming languages • The ABNF used in RFC’s is a context-free grammar • Context-free grammars can be recognized (parsed) by a finite-state pushdown automaton Tutorial at: http://en.wikipedia.org/wiki/Chomsky_hierarchy

  47. Why use formal grammars for specifying languages? • Precise and rigorous • Less ambiguous than an explanation in English • Membership of a string in a language can be checked automatically • Tools to process the language can often be constructed automatically from the grammar

  48. ABNF: the grammar for IETF RFCs • ABNF example from RFC 3986:URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] • ABNF is itself specified in RFC 2234 • ABNF is convenient to use in fixed-font specification documents like RFCs

  49. Summary 50

  50. Summary • The structure and interpretation of URIs is set out in RFC 3986 • URIs embody many of the principles we have studied • Formal grammars are powerful tools for specifying names • The design decisions embodied in URIs are keys to the success of the Web!!

More Related