Internet Applications

Internet Applications

The World Wide Web • By far the best known distributed application is the World Wide Web (WWW), or the Web for short. Technically, the web is a distributed system of HTTP servers and clients, more commonly known as web servers and web browsers. • Prior to the emergence of the web, the user community of the Internet largely comprised of researchers and academics who used network services such as electronic mail and file transfer to exchange data. • The World Wide Web originated with Tim Berners-Lee in late 1990 for CERN, the European Particle Physics Laboratory in Geneva, Switzerland. A proposal for a "universal hypertext system" was submitted in November 1990 by Tim Berners-Lee and Robert Cailliau for a "universal hypertext system."

The World Wide Web Since the original proposal, the growth of the World-Wide Web has been extraordinary (see Figure 1), and has expanded far beyond the research and academic community into all sectors world-wide, including commerce and private homes. The continued development of the Web technology is currently coordinated by the World-Wide Web Consortium, W3C.

The World Wide Web The genius of the World-Wide Web is that it combines three important and well-established computing technologies: • Hypertext documents: documents in which chosen words or phrases, typically highlighted, can be marked as links to other documents, so that a user is able to access the linked documents by clicking with a mouse on the highlighted text. • Network based information retrieval: the File Transfer Protocol (FTP) service was the most widely used service for such information retrieval. • Standard Generalized Markup Language(SGML), an ISO standard which allows documents to be “marked up” with tags so that they can be displayed in a uniform format on any platform, independent of the presentation mechanics.

The World Wide Web • At its most basic, the World-Wide Web is a client-server application based on a protocol named the HyperText Transfer Protocol (HTTP). • A web server is a connection-oriented server that implements the HTTP. By default, an HTTP server runs at the well-known port 80. • A user runs a World-Wide Web client (sometimes referred to as a browser) on a local computer. The client interacts with a web server according to the HTTP, specifying a document to be fetched. If the document is located by the server in its directory, the document’s contents is returned to the client, which presents the it to the user.

The Hypertext Markup Language (HTML) • HTML is a markup language used to create documents that can be retrieved using the World Web Web. • HTML is based on SGML, with semantics that are appropriate for representing information of a wide range of types. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information.

HTML <HTML> <HEAD> <TITLE>A Sample Web Page</TITLE> </HEAD> <HR> <BODY> <center> <H1>My Home Page</H1> <IMG SRC="/images/myPhoto.gif"> <b>Welcome to Kelly's page!</b> <p> <! A list of hyperlinks follows.> <a href="/doc/myResume.html"> My resume</a>. <p> <a href="http://www.someUniversity.edu/">My university<a> </center> <HR> </BODY>

The Extensible Markup Language XML • Whereas HTML is a language that allows a document to be marked up for the presentation or display of the information contained in a document, XML allows a document to be marked up for structured information. • Also based on SGML, XML uses tags to describe the information contained in a document. <message> <to>you@yourAddress.com</to> <from>me@myAddress.com</from> <subject>This is a message</subject> <text> Hello world! </text> </message>

HTTP

The HyperText Transfer Protocol (HTTP) • Originally conceived for fetching and displaying text files, HTTP has been extended to allow the transfering of web contents of virtually unlimited types. • The first version of HTTP, HTTP/0.9, was a simple protocol for raw data transfer. • The most widely used HTTP version is HTTP/1.0, which has a draft proposed by Tim Berners Lee[13], but has no formal specification, although its ``common usage'' is described in RFC1945[8]. • Since then, an improved protocol, known as HTTP/1.1, has been developed and often adopted. HTTP/1.1 is a far more extensive protocol than HTTP/1.0. However, the basics of the protocol is well represened in the simpler HTTP/1.0.

The HyperText Transfer Protocol (HTTP) • HTTP is a connection-oriented, stateless, request-response protocol. • An HTTP server, or web server, runs on TCP port 80 by default. • HTTP clients, colloquially called web browsers, are processes which implements HTTP to interacts with a web server to retrieve documents phrased in HTML, whose contents are displayed according to the documents’ markups.

The HyperText Transfer Protocol (HTTP) • In HTTP/1.0, each connection allows only one round of request-response. • A client obtains a connection, issues a request • The server processes the request, issues a response, and closes the connection thereafter.

The HyperText Transfer Protocol (HTTP) • HTTP is text-based: the request and responses are character strings. • Each request and response is composed of these parts, in order: • The request/response line • A header section • A blank line • The body

A sample HTTP session

The HTTP request • A client request is sent to the server after the client has established a connection to the server. • A request line is of the following form: <HTTP method><space><Request-URI><space><protocol specification>\r\n where • <HTTP method> is the name of a method defined for the protocol, • <Request-URI> is the URI of a web document, or, more generally, a web object, • <protocol specification> is a specification of the protocol observed by the client, and • <space> is a space character. • An example client request is as follows: GET /index.html HTTP/1.0

HTTP Methods in a client request • The HTTP method in a client request is a reserved word (in uppercase) which specifies an operation of the server that the client desires. • Some of the key client request methods are listed below: ~ GET: for retrieving the contents of web object referenced by the specified URI ~ HEAD: for retrieving a header from the server only, not the object itself. ~ POST: used to send data to a process on the server host. ~ PUT: used to request the server to store the contents enclosed with the request to the server machine in the file location specified by the URI.

The Request Header • “The request header fields allow the client to pass additional information about the request, and about the client itself, to the server. These fields act as request modifiers, with semantics equivalent to the parameters on a programming language method (procedure) invocation.” • A header is composed of one or more lines, each line in the form of <keyword>: <value>\r\n

The Request Header Some of the keywords and values that may appear in a request header are: • Accept: content types acceptable by the client • User-Agent: specifies the type of browser • Connection: “Keep-Alive” can be specified so that the server does not immediately close a connection after sending a response. • Host: host name of the server An example request header is as follows: Accept: */* Connection: Keep-Alive Host: www.someU.edu User-Agent: Generic

Request Body • A request optionally ends with a request body, which contains data that needs to be transferred to the server in association with the request. • For example, if the POST method is specified in the request line, then the body contains data to be passed to the target process. (This is an important feature and will become clearer when we discuss CGI, servlet, and SOAP.)

Examples of a complete client request Example1: GET / HTTP/1.1 <blank line> Example2: HEAD / HTTP/1.1 Accept: */* Connection: Keep-Alive Host: somehost.com User-Agent: Generic <blank line>

Examples of a complete client request Example3: POST /servlet/myServer.servlet HTTP/1.0 Accept: */* Connection: Keep-Alive Host: somehost.com User-Agent: Generic <blank line> Name=donald&email=donald@someU.edu

The HTTP Server Response • In response to a request received from a client, the HTTP server sends to it a response. • Like the request, an HTTP response is composed of these parts, in order: 1. The response or status line 2. A header section 3. A blank line 4. The body

The response status line The status line is in the form of: <protocol><sp><status-code><sp><description>\r\n The status code designations are as follows: 100-199 Informational 200-299 Client request successful 300-399 Client request redirected 400-499 Client request incomplete 500-599 Server errors Example 1: HTTP/1.0 200 OK Example 2: HTTP/1.1 404 NOT FOUND

HTTP Response Header • The status line is followed by a response header. A response header is composed of one or more lines, each line in the form of <keyword>: <value>\r\n • There are two types of response header lines: • Response header lines • Entity headerlines

HTTP Response Header Response header lines – these header lines return information about the response, the server, and further access to the resource requested, as follows: Age: seconds Location: URI Retry-After: date|seconds Server: string WWW-Authenticate: scheme realm

HTTP Response Header Entity headerlines – these header lines contain information about the contents of the object requested by the client, as follows: Content-Encoding Content-Length Content-Type: type/subtype (see MIME) Expires: date Last-Modified: date

HTTP Response Header An Example response header is as follows: Date: Mon, 30 Oct 2000 18:52:08 GMT Server: Apache/1.3.9 (Unix) ApacheJServ/1.0 Last-modified: Mon, 17 June 2001 16:45:13 GMT Content-Length: 1255 Connection: close Content-Type: text/html • The Content-Type specifies the type of the data, using the contents type designation of the MIME protocol. • The Content-Encoding specifies the encoding scheme (such as uuencode or base64) of the data, usually for the purpose of data compression. • The expiration date gives the date/time (specified in a format defined with HTTP)after which the web object should be considered stale • The Last-Modifed date specifies the date that the object was last modified.

HTTP Response Body The body of the response follows the header and a blank line, and contains the contents of the web object requested. HTTP/1.1 200 OK Date: Sat, 15 Sep 2001 06:55:30 GMT Server: Apache/1.3.9 (Unix) ApacheJServ/1.0 Last-Modified: Mon, 30 Apr 2001 23:02:36 GMT ETag: "5b381-ec-3aedef0c" Accept-Ranges: bytes Content-Length: 236 Connection: close Content-Type: text/html <html> <head> <title>My web page </title> </head> <body> Hello world! </BODY></HTML>

Content Type – MIME Protocol

Content Type and the Mime Protocol • One of the header lines returned in a server response is the Contents Type of the object requested. • Specification of the contents type follows the scheme established in a protocol known as MIME (Multipurpose Internet Mail Extension.) • Originally used for Email, MIME is now widely used for describing the content of a document sent over a network. • It supports a large number and evolving set of predefined content types, specified in the format Type/Subtype.

The Mime Protocol A small subset of the types and subtypes are:

Simple implementations of an HTTP Client

A Basic HTTP Client implememtation InetAddress host = InetAddress.getByName(args[0]); int port = Integer.parseInt(args[1]); String fileName = args[2].trim(); String request = "GET " + fileName + " HTTP/1.0\n\n"; MyStreamSocket mySocket = new MyStreamSocket(host, port); mySocket.sendMessage(request); // now receive the response from the HTTP server String response = mySocket.receiveMessage(); // read and display one line at a time while (response != null) { System.out.println(response); response = mySocket.receiveMessage(); }

The Java URL Class The Java API provides a class called URL specifically for retrieving the data from a web object identified using a URI.

The URLBrowser String host = args[0]; String port = args[1].trim(); String fileName = args[2].trim(); String HTTPString = "http://"+host+":"+port+"/"+fileName; URL theURL = new URL(HTTPString); InputStream inStream = theURL.openStream( ); BufferedReader input = new BufferedReader (new InputStreamReader(inStream)); String response = input.readLine(); // read and display one line at a time while (response != null) { System.out.println(response); response = input.readLine(); } //end while

Characteristics of HTTP

HTTP is a Connection-Oriented Protocol With HTTP1.0, a connection to a server is automatically closed as soon as the server returns a response. Thus exactly one round of exchange is allowed between a client and a web server; if a client needs to contact the same server in one session, it must reconnect to the server to reissue another request.

HTTP is a Connection-Oriented Protocol The scheme is adequate for the original intent of HTTP for retrieving simple network documents. It is inefficient for documents such as those that contain a large number of links to image objects to be fetched by the server, since fetching each of these links require a reestablishment of a connection. It is also insufficient fors ophisticated web applications based on HTTP (such as shopping carts).

HTTP is a stateless Protocol HTTP 1.0 (as well as version 1.1) is also a stateless protocol: the server does not maintain any state information on a client’s session. Regardless of whether the connection is kept alive, each request is handled by a server as a new request. As with non-persistent connectons originally in practice with HTTP, a stateless protocol is adequate for the original intent of the protocol, but not so for the more complex applications for which HTTP has been extended, the next topic that we will study.

HTTP is a Connection-Oriented Protocol HTTP1.0 was extended to allow a request header line Connection: Keep-Alive to be issued by a client who wishes to maintain a persistent connection with the server; a cooperating server will keep the connection open after sending a response. In HTTP/1.1, connections are persistent by default. Such a connection allows multiple requests to be send over the same TCP connection.

Dynamically generated web contents

Dynamically-generated Web Contents • In the beginning, HTTP was employed to transfer static contents, that is, contents that exist in a constant state, such as a plain text file or an image file. • As the web evolved, applications began to use HTTP for a purpose not originally intended: an application which allows a browser user to retrieve data based on dynamic information entered during an HTTP session.

Dynamicly-generated Web Contents • A typical web application, such as a shopping cart, requires fetching remote data based on data entered by a client at runtime. • For example, an enterprise application typically allows a user to key in data, which is then used to formulate a query to retreive data from a database, and the outcome is displayed to the user. • Applied to the web, it is desirable to allow a client to submit data during a web session to retrieve data from the web server host, to be displayed by the web browser

Dynamically-generated Web Contents • A generic HTTP server does not possess the application logic for fetching the data from the data source. • Instead, an external process that has the application logic will serve as an intermediary. • The external process runs on the server host, accepts input data from the web server, exercises its application logic to obtain data from the data source, returns the outcome to the web server, which transmits the outcome to the client.

Dynamically-generated Web Contents • The first widely adopted protocol to augment HTTP in supporting run-time generated web contents is the Common Gateway Interface (CGI) protocol. • Although rudimentary by comparison, CGI is the predecessor of more sophisticated protocols and facilities (such the Java Servlet) that serve similar purpose. • The understanding of CGI and some of its supplementary protocols is important in that it prepares us for the understanding of more advanced protocols and facilities.

The Common Gateway Interface (CGI) Protocol

Common Gateway Interface (CGI) • The Common Gateway Interface (CGI) is a standard for providing an interface, or a gateway, between an information server and an external process (that is, a process external to the server). • Using the protocol, a web client may specify a program, known as a CGI script, as the target web object in an HTTP request. • The web server fetches the CGI script, activates it as a process, passing to the process input data transmitted by the web client. The web script executes and transmits its output to the web server, which returns the web-script generated data as the body of a response to the web client.

CGI - 2 • An HTTP request may specify a CGI program, or CGI script. • A CGI program can be written in: • Programming languages: C. Ada, C++, Fortran; such a program needs to be compiled to generate an executable. • Script languages such as Perl, Tkl, cobra, such a program, referred to as a CGI script, requires the appropriate language interpreter to be present at the server host. • Commonly used for processing user input from HTML forms, and subsequently composing a web page sent as part of the server response.

CGI Program - 3 • When a web server receives a request whose URI specifies a web program, the web server initiates the execution of the web program. • The web program formulates its output in HTML, which is sent to the server and forwarded to the web client as the HTTP response.

CGI program

Internet Applications