This Lecture

ES 101-02. Module 5Uniform Resource Locators,Hypertext Transfer Protocol, &Common Gateway Interface

This Lecture • Uniform Resource Locators (URL) • Hypertext Transfer Protocol (HTTP) • Common Gateway Interface (CGI)

Definitions • We previously discussed the Domain Name System, or DNS • Distributed database hosted by DNS servers • Maps host IP addresses to a mnemonic name • Easier for humans to remember • Universal registration, ie. every domain name on the Internet is unique • In order to find resources on a particular server, we must introduce the concept of a URL

Uniform Resource Locators • The URL allows a client browser to send search data to a server for further processing • URLs are a scheme for specifying Internet resources using a single line of printable ASCII characters • No control characters are allowed • The URL structure and syntax allows the web client to access all major Internet protocols via TCP • File Transfer Protocol (FTP) • Hypertext Transfer Protocol (HTTP) • Etc. • URLs can also be used within HTML documents to provide “links” to other documents

URL Contents • A URL contains the following: • Protocol to use when accessing the server, e.g. HTTP • Internet Domain Name of the site on which the server is running, and the address of the requested server • Port number of the target application • Location of the resource in the directory structure • Example of a URL: http://www.cern.ch/hypertext/WWW/RDBgate/Implementation.html

URL Contents (cont’d) • The previous URL references the file: Implementation.html • This file is located in the directory: /hypertext/WWW/RDBgate, which is located on the server www.cern.ch • The protocol used is HTTP Note that this is an exact reference. Abbreviated references are allowed under certain conditions.

Allowed Characters in URLs • Every URL must be written using printable ASCII characters • This ensures that URLs can be sent by electronic mail • Many mail programs would mishandle control characters • However, any non-printable ASCII character can be included in a URL by using a character encoding scheme

ASCII/IRA Character Set

Character Encoding • Any ASCII control character can be represented by using the preceding character stream, %xy, where “xy” is equal to the hexadecimal code of the character of interest • It should be obvious that the “%” character can’t be used in a URL • There are other disallowed characters: • “Space” and “TAB” characters, double quotation marks (“), and “Slash” are examples of forbidden characters

Ports and IP Addresses • Port designations and IP addresses are usually “assumed” if they are not specified in the URL • However, they can be included within a URL without causing problems: http://www.address.edu:80/path/subdir/file.ext • If a port number or IP address is not included in the URL, the protocol assumes that the port number is the default for that protocol • As an example, using the “HTTP” protocol implies port “80”

Ports and IP Addresses (cont’d) • Numeric IP addresses can be used in place of domain names: http://132.206.9.22/pathname • You could also include the username and password in the URL • This is not recommended, since the password is not encrypted. Very bad security practice!!

Partial URLs • If you are within a given HTML document, it is not necessary to specify the complete URL • Any information not included in the URL is assumed to be the same as that used to access the current document • Partial URLs are very useful when constructing large collections of HTML documents that will be kept “together” • Caveat: If you move this collection of documents to a different folder or server, the links will not work

URL Forms • Let’s look at a couple of examples: • File Transfer Protocol (FTP) • Hypertext Transfer Protocol (HTTP)

FTP URLs • FTP URLs designate the files and directories that are accessible using the FTP protocol • In the absence of any username and password, anonymous FTP access is assumed • This connects you to the server as user “anonymous” with a password equal to your email address • Examples: • ftp://internet.address.edu/path/ • ftp://ftp.prenhall.com/pub/esm/computer_science.s-041/stallings/Figures/DCC7e_PDF_Figures/CHAP-02/ • Note that the final “slash” indicates a directory • The web browser would display this URL as a directory of contents

FTP Directory Example

HTTP URLs • HTTP URLs designate files, directories, or server-side programs that are accessible using the HTTP protocol • Example: http://www.site.edu:3232/cgi-bin/srch • This example references the program “srch” at the site www.site.edu, accessible through the HTTP server, using Port = 3232 • An HTTP URL must always point to either a file, or a directory • A directory is indicated by terminating the URL with a “slash” • Example: http://www.site.edu/htmldocs/ Note the slash

HTTP History • HTTP is a protocol utilized for transmitting information with the efficiency necessary for making hypertext “jumps” • It is documented in the IETF standards as RFC 2616 • It is a transaction-oriented, client/server protocol • The most common use of HTTP is to handle communications between a web browser (client), and a web server • Other examples: Accessing a CD using HTTP • To provide reliability, HTTP utilizes TCP

TCP/IP Architecture

Hypertext Transfer Protocol • In order to develop interactive HTML documents, we need to first review the interaction between a WWW client (browser) and an HTTP server • A web site is a directory of interactive HTML documents and programs • This interaction involves two distinct, but closely related issues • HTTP communication methods • How a HTTP server handles a client request

HTTP Communication Methods • HTTP provides a number of communication methods, such as: • GET, POST, HEAD, etc. • These methods allow a client to receive information from the server, and send information to the server

HTTP Request Handling • If the client requests a file, the server simply locates the file and sends it to the client • If the file is not available, an error message is returned to the client • Consider the situation when the client wants to send information to the server for more complicated processing • The HTTP server software does not do this processing, but hands it off to another program via the Common Gateway Interface (cgi-bin) • The program that receives the processing request is referred to as a “gateway program” • This implies that there are two interfaces to the HTTP server • HTTP client interactions • CGI interactions

Gateway Programs • Gateway programs can be referenced using URLs • When the HTTP server needs to activate the program, it invokes the CGI mechanism to pass the data to the target program • The CGI program acts on the data, and returns it to the HTTP protocol • In order to understand the CGI program, we must first discuss the HTTP protocol • After this discussion, we will cover the CGI

HTTP is an Internet-based, client/server protocol that has been designed for the rapid and efficient delivery of HTML documents The client can make multiple concurrent requests of the HTTP server Each request is processed individually The server has no recollection of previous connections This type of protocol is “stateless” Statelessness is a very important feature of HTTP Speeds up processing of requests HTTP Overview

HTTP Communications • All HTTP communications utilize 8-bit characters • This allows the safe transmission of any type of data, such as HTML documents • An HTTP connection has four stages: • Open the connection • Request • Response • Close the connection

HTTP Open Connection • The client contacts the server at the correct IP address, using TCP Port 80 • Note that the DNS servers allow mapping mnemonic names to IP addresses • TCP Port 80 is a “well known” port

TCP Well Known Ports

HTTP Request • The client sends a message to the server requesting service. • The client request contains HTTP request headers that define the “method” requested for the transaction • The request header is followed by information about the capabilities of the client, followed by the data to be sent to the HTTP server, if any

HTTP Response • The server sends a response to the client • The response is composed of “response headers” describing the state of the transaction • The response header is then followed by any data required for the client

HTTP Close Connection • The connection is closed by the client

HTTP Procedure • The procedure outlined previously implies that only a single download or process can be handled per connection • This has some implications regarding handling of a request • Consider the following scenarios: • Single Transaction per Connection • Statelessness of the Connection

Single Transaction per Connection • Suppose HTTP is utilized to access an HTML document that contains ten different images • As a result, the document is composed of 11 distinct connections • HTML document • Ten additional requests for the images

Statelessness of the Connection • Suppose a user retrieves a “fill-in” HTML form from the HTTP server • The user would then enter their username and password in order to access restricted data • After the client submits the form data, the HTTP server hands off the information to the CGI program • The CGI program then processes the data, and returns it as an HTML document, which is then delivered to the client • Note that the HTTP server would not retain any knowledge of this connection. The state information would be included in the form data

Eavesdropping • Recall that all HTML information is passed back and forth between the client and server in unencrypted ASCII character format • This implies that a machine could “listen” on Port 80 to the data sent between the HTTP server and the client • If security is required, a secure form of HTTP must be used (HTTPS) • Secure communication is beyond the scope of this course

Common Gateway Interface • This is the standard method for communication between HTTP servers, and server-side gateway programs • When access to a gateway program is required, the CGI process activates the program, and sends it any data required • When the processing is finished, the CGI process sends the information back to the HTTP server • Gateway programs can be compiled programs written in any high-level language, or scripting language • High-level languages: C, C++, Pascal • Scripting languages: perl, tcl, Unix shell, etc.

Common Gateway Interface (cont’d) • Gateway programs reside in the “/cgi-bin” folder of the server

Next Lecture(s) • This presentation concludes our discussion of HTTP/URLs, which are Layer 5 constructs • The next topic of discussion will be on utilities that are of use in web development, and HTML • At the conclusion of these lectures, we will discuss how to use these tools to build a web site

This Lecture

This Lecture

Presentation Transcript

In this lecture

This Lecture:

In this lecture

This lecture

This lecture

This lecture

THIS LECTURE REVIEWS

This lecture…

This lecture…

This lecture

This lecture…

This lecture…

This lecture:

This lecture…

This lecture…

This lecture…

This lecture…

This lecture…

This lecture…

THIS LECTURE

This lecture

In this lecture