1 / 31

URLs and Resources

URLs and Resources. Herng-Yow Chen. Outline. Navigating the Internet’s Resources URL syntax and what the various URLs mean and do URL Shortcuts that many web clients support: relative URLs and expanded URLs URL encoding and character rules Common URL schemes

ady
Download Presentation

URLs and Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. URLs and Resources Herng-Yow Chen

  2. Outline • Navigating the Internet’s Resources • URL syntax • and what the various URLs mean and do • URL Shortcuts that many web clients support: • relative URLs • and expanded URLs • URL encoding and character rules • Common URL schemes • The future of URLs, including URNs

  3. Navigating a resource by URL, which tells a web client • URL scheme: how to access the resource • Server location: where the resource is hosted • Resource path: what particular local resource on the server is being requested http://english.csie.ncnu.edu.tw/demo/index.html Web page Scheme (how) Host (where) Path (what)

  4. URLs • URLs can direct you to resources available through protocols other than HTTP. • Email account: mailto:hychen@csie.ncnu.edu.tw • A file resides on a FTP server:ftp://ftp.ncnu.edu.tw/a_file.txt • A video streamed by a video server:rtsp://www.cnn.com/headline.rm • Most URLs have the same “scheme://server location/path” structure

  5. Navigating a resource by URL, which tells a web client • URL scheme: how to access the resource • Server location: where the resource is hosted • Resource path: what particular local resource on the server is being requested http://english.csie.ncnu.edu.tw/demo/index.html Web page Scheme (how) Host (where) Path (what)

  6. URL Syntax • <scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag>

  7. Scheme: what protocol to use • The scheme is really the main identifier of how to access a given resource. • The scheme must start with an alphabetic character, and it is separated from the rest of the URL by the first “:” character. • Scheme names are case-insensitive.

  8. Usernames and Passwords • Many servers require a username and password before you can access data through them. For examples: • ftp://ftp.prep.ai.mit.edu/pub/gnu • ftp://anonymous@ftp.perp.ai.mit.edu/pub/gnu • ftp://anonymous:my_passwd@ftp.prep.ai.mit.edu/pub/gnu • http://joe:joespasswd@www.joes-hardware.com/sales_info.txt • The default username and password • “anonymous” for username • “Internet Explorer sends “IEUser” for password, while Netscape send “mozilla”.

  9. Hosts and Ports • The host component (IP or Domain Name) identifies that host machine on the Internet that has access to the resource. • The port component identifies the network port on which the server is listing. • Different services uses different default ports for a machine. • HTTP: 80 • FTP: 21 • Telnet: 23 • SMTP: 25

  10. Paths • The path component of the URL specifies where on the server machine the resource lives. • The path often resembles a hierarchical filesystem path. For example: • http://www.csie.ncnu.edu.tw/course/1998.htmlThe path in the URL is “ /course/1998.html”, which resembles a filesystem path on a UNIX filesystem. • The path component for HTTP URLs can be divided into path segments separated by“ /” . Each path segment can have its own params component (described later).

  11. Parameters • For many schemes, a simple host and path to the object just aren’t enough. • Aside from what port the server is listening to and even whether or not you have access to the resource with a username and password, many protocols require more information to work. • For example, • ftp://ftp.ncnu.edu.tw/image.gif;type=a • ftp://ftp.ncnu.edu.tw/program.exe;type=i

  12. Query strings • Some resources, such as database, can be queried according to input strings. For example: • http://www.xxx.tw/a.cgi?id=123&name=abc • There is no requirement for the format of the query component, except that some characters are illegal. By convention, many gateways except the query to be formatted as a series of “name=value” pairs, separated by “&” characters.

  13. Query Strings http://english.csie.ncnu.edu.tw/course/NWSMLViewer.php?lectureid=rctlee-20030909125212 lectureid=rctlee-20030909125212 Internet Server “viewer” gateway

  14. Fragments • Some finer resource fragments, such as sessions in a large HTML document , can friendly be accessed. For example, • http://engquiz.csie.ncnu.edu.tw/e-book/html/B001.html#page10 • Because HTTP servers generally deal only with entire objects, not with fragments of objects, clients don’t pass fragments along to servers. Namely, the whole object is retreived, but only the partial content is displayed. • Note that in Range Request feature of HTTP/1.1, agents may request byte ranges of objects. (later lectures)

  15. Fragments (Fragment is NOT sent to the server) (b)Browser makes request to http://www.csie.ncnu.edu.tw/~hychen/web_tech/ (a)User selects link to “http://www.csie.ncnu.edu.tw/~hychen/web_tech/#Resource” Internet www.csie.ncnu.edu.tw Client (c)Server returns entire HTML page Browser scrolls down to star at named “Resource” fragment (d)Browser displays HTML page starting with named ”Resource”fragment

  16. URL shortcuts • Web clients understand and use a few URL shortcuts. • Many browsers also support automatic expansion of URLs, where the user can type in a key (memorable) part of a URL, and the browser fills in the rest. • Relative URLs • Base URLs • Resolving relative references • Expanded URLs

  17. Relative URLs • URLs comes in two flavors: absolute and relative. • So far, we have looked only at absolute URLs, all the information you need to access a resource. • On the other hand, relative URL is incomplete. To get all the information need to access a resource, a relative URL must be interpreted on the basis of another URL, called its base.

  18. HTML snippet with relative URL <HTML> <HEAD> <TITLE> Joe’s Tools </TITLE> </HEAD> <BODY> <H1> Tools page </H1> <H2> Hammers </H2> <P> Joe’s HARDWARE online has the largest selection of <A href= “ ./hammers.html”> hammers </A> on earth. </BODY> </HTML>

  19. Using a base URL Relative URL: ./hammers.html Base URL: http://www.joes-hardware.com/tools.html http://www.joes-hardware.com/hammers.html New absolute URL

  20. Base URLs • The first step in the conversion process is to find a base URL, which can come from a few places. • Explicitly provided in the resource • Use <BASE> tag to define the base URL • Base URL of the encapsulating resource • Does not explicitly specify a base URL. • Use the URL of the resource in which the document is imbedded as a base, as the example in the preceding slide. • No base URL • In some instances, there is no base URL. This often means that you have an absolute URL; however, sometimes you just have an incomplete or broken URL.

  21. Resolving relative references

  22. Expanded URLs • Some browser try to expand URLs automatically, either after you submit the URL or while you’re typing. This provides users with a shortcut: they don’t have to type in the complete URL. • Hostname expansion • Ex: yahoo  www.yahoo.com • History expansion • Ex: http://www.ncnu http://www.ncnu.edu.tw

  23. Shady characters in URLs • URLs were designed to be portable, to uniformly name all the resources on the Internet. This means that the URLs will be transmitted through various protocol. • Because different protocols (schemes) use different mechanisms for transmitting, it is important for the URLs to be transmitted safely, namely without losing information, through any protocols over network. • Some protocols, such as the Simple Mail Transfer Protocol (SMTP) for email, use a 7-bit encoding for message; this can strip off certain characters if the source is encoded in 8 bits or more.

  24. Shady characters in URLs • URLs are permitted to contain only characters from a relatively small, universally safe alphabet. • In addition to the transportable issue, URLs should be readable. Hence, some invisible, nonprinting characters also are prohibited in URLs, even though these character may pass through mailers. • To complete matter further, URLs also need to be complete. One day people would want URLs to contain binary data or characters outside of the universally safe of alphabets. So, an escape mechanism was added.

  25. The URL Character Set • US-ASCII is very portable, due to its long legacy. It uses 7 bits to represent most keys available on an English typewriter and a few non-printing control character for text formatting and hardware signal. But it doesn’t support the inflected characters common in European languages or non-Romanic language read. • Want to contain arbitrary binary data. • Use escape sequences allow the encoding of arbitrary values using restricted subset of the US-ASCII character set, yielding portability and completeness.

  26. Encoding mechanism • Simply represents the unsafe character by an “escape” notation, consisting of a percent sign (%) followed by two hexadecimal digits. • For example • ~  0x7E, http://www.ncnu.edu.tw/%7Ehychen • Space  0x20, http://www.abc.com/web%20tools.html • %  0x25, http://www.abc.com/100%25satisfaction.html

  27. Character Restrictions • % escape token • / path delimiter • . Path component • .. Path component • # fragment delimiter • ? Query-string delimiter • ; params delimiter • : to delimit the scheme, user/password, and host/port • $,+ Reserved • @&= Reserved -special meaning in some scheme • {}|\^~[]’ Restricted  unsafe handling by various transport agent, such as gateway • <>” Unsafe; should be encoded  have meaning outside • the scope of URL • 0x00-0x1F, 0x7F Restricted  fall within nonprintable range • >0x7F Restricted  fall within this range do not fall within 7-bit range of US-ASCII

  28. Common scheme format • http, https • mailto • ftp • rtsp, rtspu • file • News • telnet

  29. The Future: URN? Gethttp://purl.oclc.org/jhardware/ STEP1:Ask the resource resolver what the Joe’s Hardware URL is. Receive from the resolver the current location of the resource Internet Client Purl.oclc.org Actual:http://www.joes-hardware.com/ STEP2: Get the actual URL for the resource Gethttp://www.joes-hardware.com Internet Client www.joes-hardware.com

  30. URIUniversal Resource Identifier • URIs defined in RFC 1630. (1994) • URI is a superset of URL and URN. • Full URI: proto://hostname/path http://www.csie.ncnu.edu.tw:80/~hychen/ • Partial URI: /path /~hychen/ Identifies the Server No server mentioned

  31. URLs information • http://www.w3.org/Addressing/ • The W3C page about naming and addressing URIs and URLs. • http://www.ietf.org/rfc/rfc1738.txt • RFC 1738, “Uniform Resource Locators (URL),” by T. Berners-Lee, L. Masinter, and M. McCahill. • http://www.ietf.org/rfc/rfc2396.txt • RFC 2396, “Uniform Resource Identifiers (URI): Generic Syntax,” by T. Berners-Lee, R. Fielding, and L. Masinter. • http://www.ietf.org/rfc/rfc2141.txt • RFC 2141, “URN Syntax,” by R. Moats. • http://purl.oclc.org • The persistent uniform resource locator web site. • http://www.ietf.org/rfc/rfc1808.txt • RFC 1808, “Relative Uniform Resource Locators,” by R. Fielding.

More Related