1.63k likes | 1.77k Views
This lecture introduces the concepts of Common Gateway Interface (CGI) and Java Servlets in web programming. It highlights the differences between traditional CGI methods and the more modern approach of using Servlets, which run within a Java Virtual Machine and are more efficient for processing user inputs. The material covers HTML forms, the HTTP GET request, and the servlet lifecycle, along with advantages of using Servlets for server-side processing. References to key texts and the Tomcat server setup are also included for further exploration.
E N D
Part 6: Introduction to CGI and Servlets CIS 5930-04 – Spring 2001 http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken Syracuse University dbc@csit.fsu.edu
Introduction • RMI gave us one approach to client/server programming. • The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading. • We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web. • RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages. • For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client. dbc@csit.fsu.edu
HTML Forms and CGI • There are long-established techniques for getting information from users through Web browsers (predating the appearance of Java on the Web). • The FORM element of HTML can contain a variety of input fields. • The inputted data is harvested by the browser, suitably encoded, and forwarded to the Web server. • On the server side, the Web server is configured to execute an arbitrary program that processes the user’s form inputs. • This program typically outputs a dynamically generated HTML document containing an appropriate response to the user’s input. • The server-side mechanism is called CGI: Common Gateway Interface. dbc@csit.fsu.edu
CGI and Servlets • In conventional CGI, a Web site developer writes the executable programs that process form inputs in a language such as Perl or C. • The program (or script) is executed once each time a form is submitted. • Servlets provide a more modern, Java-centric approach. • The server incorporates a Java Virtual Machine, which is running continuously. • Invocation of a CGI script is replaced invocation of a method on a servlet object. dbc@csit.fsu.edu
Advantages of Servlets • Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts. • This is important if we planning to centralize processing in the server (rather than, say, delegate processing to an applet or browser script). • Besides this we have the usual advantages of Java: • Portability, • A fully object-oriented environment for large-scale program development. • Library infrastructure for decoding form data, handling cookies, etc (although many of these things are also available in Perl). • Servlets are the foundation for Java Server Pages. dbc@csit.fsu.edu
Plan of this Lecture Set • Review HTML forms and associated HTTP requests. • Briefly describe traditional CGI programming. • Detailed discussion of Java servlets: • Deploying Tomcat as a standalone Web server. • Simple servlets. • The servlet life cycle. • Servlet requests and responses. More on the HTTP protocol. • Approaches to session tracking. Handling cookies. • The servlet session-tracking API. dbc@csit.fsu.edu
References • Core Servlets and JavaServer Pages, Marty Hall, Prentice Hall, 2000. • Good coverage and current, with some discussion of the Tomcat server. • Java Servlet Programming, Jason Hunter and William Grawford, O’Reilly, 1998. • Also good, with some good examples. Slightly out of date. • Java Servlet Specification, v2.2, and other documents, at: http://java.sun.com/products/servlet/ dbc@csit.fsu.edu
HTML Forms dbc@csit.fsu.edu
The HTTP GET request • Before discussing forms, let’s look again at how the GET request normally works. • The following server program listens for HTTP requests, and simply prints the received request to the console. dbc@csit.fsu.edu
A Dummy Web Server public class DummyServer { public static void main(String [] args) throws Exception { ServerSocket server = new ServerSocket(8080) ; while(true) { Socket sock = server.accept() ; BufferedReader in = new BufferedReader( new InputStreamReader(sock.getInputStream())) ; String method = in.readLine() ; System.out.println(method) ; while(true) { String field = in.readLine() ; System.out.println(field) ; if(field.length() == 0) break ; } . . . Send a dummy response to client socket . . . } } dbc@csit.fsu.edu
A GET Request • On the hostsirahI run the dummy server: sirah$ java DummyServer • Now I point a browser at http://sirah.csit.fsu.edu:8080/index.html • The dummy server program might print: GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank line> dbc@csit.fsu.edu
Fields of the GET request • The HTTP GET request consists of a series text fields on separate lines, ended by an empty line. • The first line is the most important: it is called the method field. • In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server. dbc@csit.fsu.edu
A Simple HTML Form • The form element includes one or more input elements, along with any normal HTML terms: <html> <body> <form method=get action=“http://sirah.csit.fsu.edu:8080/dummy”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> </body> </html> dbc@csit.fsu.edu
Remarks • The form tag includes important attributes method and action. • The method attribute defines the kind of HTTP request sent when the form is submitted: its value can be get or post (see later). • The action attribute is a URL. In normal use it will locate an executable program on the server. In this case it is a reference to my “dummy server”. • An input tag with type attribute text represents a text input field. • An input tag with type attribute submit represents a “submit” button. dbc@csit.fsu.edu
Displaying the Form • If I place this HTML document on a Web Server at a suitable location, and visit its URL with a browser, I see something like: dbc@csit.fsu.edu
Submitting the Form • If I type my name, and click on the “Submit Query”button, the dummy server running on sirah prints: GET /dummy?who=Bryan HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank> dbc@csit.fsu.edu
Remarks • When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute. • In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request. • In simple cases the appended string begins with a ? • This isfollowed by pairs of the form name=value, where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user. • If the form has multiple input fields, the pairs are separated by & dbc@csit.fsu.edu
POST requests • This method of attaching input data to the URL is handy if the user has a relatively simple query (e.g. for a search engine). • For more complex forms it is usually recommended to specify the post method in the form tag, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> • In the HTTP protocol, a POST request differs from a GET request by having some data appended after the headers. dbc@csit.fsu.edu
A Form Using the POST Method <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Surname: <input type=text name=surname size=32> <p> Surname: <input type=text name=fornames size=40> <p> <input type=submit> </form> dbc@csit.fsu.edu
Extending the Dummy Server • We can modify the dummy server to display POST requests, by declaring a variable contentLength, adding the lines if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ; contentLength = Integer.parseInt(field.substring(16)) ; inside the loop that reads the headers, and adding for(int i = 0 ; i < contentLength ; i++) int b = in.read() ; System.out.println((char) b) ; } after that loop. dbc@csit.fsu.edu
Submitting the Form • When I click on the “Submit Query” button, the dummy server prints: POST /dummy HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-Length: 39 surname=Carpenter&forenames=David+Bryan dbc@csit.fsu.edu
Remarks • The method field (the first line) now starts with the word POST instead of GET; the data is not appended to the URL. • There are a couple more fields in the header, describing the format of the data. • Most importantly, the form data is now on a separate line at the end of file. • However, the form data is still URL-encoded. dbc@csit.fsu.edu
URL Encoding • URL encoding is a method of wrapping up form-data in a way that will make a legal URL for a GET request. • We have seen that the encoded data consists of a sequence of name=value pairs, separated by &. • In the last example we saw that spaces are replaced by +. • Non-alphanumeric characters are converted to the form %XX, where XX is a two digit hexadecimal code. • In particular, line breaks in multi-line form data (e.g. addresses) become %0D%0A—the hex ASCII codes for a carriage-return, new-line sequence. • URL encoding is somewhat redundant for the POST method, but it is the default anyway. dbc@csit.fsu.edu
More Options for the input Tag • We can make a group of radio buttons in an HTML form by using a set of input tags with the type attribute set to radio. • Tags belonging to the same button group should have the same name attribute, and distinct value attributes, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Favorite primary color: <p> Red: <input type=radio name=color value=red> Blue: <input type=radio name=color value=blue> Green: <input type=radio name=color value=green> <p> <input type=submit> </form> dbc@csit.fsu.edu
Radio Buttons • The message sent to the server is: ... Content-type: application/x-www-form-urlencoded Content-Length: 10 color=blue dbc@csit.fsu.edu
Checkboxes <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> What pets do you own? <p> <input type=checkbox name=pets value=dog checked> Dog <br> <input type=checkbox name=pets value=cat> Cat <br> <input type=checkbox name=pets value=bird> Bird <br> <input type=checkbox name=pets value=fish> Fish <p> <input type=submit> </form> • Example from “HTML and XHTML: The Definitive Guide”, O’Reilly. dbc@csit.fsu.edu
Checkboxes • The message posted to the server is: ... pets=dog&pets=bird • Note there is no requirement that a form map a name to a unique value. dbc@csit.fsu.edu
File-Selection • You can name a local file in an input element, and have the entire contents of the file posted by browser to server. • This is not allowed using the default URL-encoding for form data. Instead you must specify multi-part MIME encoding in the formelement, e.g.: <form method=post enctype=“multipart/form-data” action=“http://sirah.csit.fsu.edu:8080/dummy”> Course: <input name=course size=20> <p> Students file: <input type=file name=students size=32> <p> <input type=submit> </form> dbc@csit.fsu.edu
File-Selection Entry • With multi-part encoding, the data is no longer sent on a single line. • On submission the DummyServer prints. . . dbc@csit.fsu.edu
Output of DummyServer on submit POST /dummy HTTP/1.0 Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html ... Content-type: multipart/form-data; boundary=---------------------------269912718414714 Content-Length: 455 -----------------------------269912718414714 Content-Disposition: form-data; name="course" CIS6930 -----------------------------269912718414714 Content-Disposition: form-data; name="students"; filename="students" wcao flora Fulay gao ... zhao6930 zheng -----------------------------269912718414714-- dbc@csit.fsu.edu
Remarks • Each form field has its own section in the posted file, separated by a delimiter specified in the Content-type field of the header. • Within each section there are one or more header lines, followed by a blank line, followed by the form data. • The values can contain binary data. There is no “URL-encoding”. dbc@csit.fsu.edu
Masked and Hidden fields • The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen. • If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts. • Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session. • Use of hidden fields will be one of the topics in the lectures on servlets. dbc@csit.fsu.edu
Text Areas • Similar to text input fields, but allow multi-line input. • Included in a form by using the textarea tag, e.g.: <textarea name=address cols=40 rows=3> . . . optional default text goes here . . . </textarea> • With default (URL) encoding, lines of input are separated by carriage return/newline, coded as %0D%0A. dbc@csit.fsu.edu
Text Area Input • Data posted to server: address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATallahassee%2C+FL+32306-4120 dbc@csit.fsu.edu
Scrollable Menus (Lists) • For long lists of options, when checkboxes become too tedious: <select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select> • The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag. • Without the multiple attribute, only a single option can be selected. dbc@csit.fsu.edu
List Input • The message posted to the server is: ... pets=dog&pets=bird dbc@csit.fsu.edu
Conventional CGI dbc@csit.fsu.edu
Handling Form Data on the Server • In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy. • A common server convention is that these executables live in a subdirectory of cgi-bin/ • The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script. • The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script. • The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser. dbc@csit.fsu.edu
Operation of a CGI Script • At the most basic level, a CGI script must • Parse the input (the form data) from the server, and • Generate a response. • Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers. • In practice the only required HTTP header is the Content-type header. The Web Server will fill in other necessary headers automatically. • Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message. • Otherwise the server will not close the connection to the client, and a browser error will occur. dbc@csit.fsu.edu
“Hello World” CGI Script • In the directory /home/httpd/cgi-bin/users/dbc on sirah, I create the file hello.pl, with contents: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello World!</h1></body></html>” ; • I mark this file world readable, and mark it executable: sirah$ chmod o+r hello.pl sirah$ chmod +x hello.pl • Now I point my browser at the URL: http://sirah/cgi-bin/users/dbc/hello.pl dbc@csit.fsu.edu
Output from CGI Script • The novel feature here is the the HTML was dynamically generated: it was printed out on the fly by the Perl script. dbc@csit.fsu.edu
Retrieving Form Data • Several environment variables are set up by the server to pass information about the request to the Perl script. • If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character. • If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script. dbc@csit.fsu.edu
GET example • I change our first form to submit data to a CGI script: <form method=get action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/getEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define getEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $ENV{QUERY_STRING}!</h1></body></html>\n” ; • When I point the browser at the form, enter my name, and submit the form, the page returned to the browser contains the message: Hello who=Bryan! dbc@csit.fsu.edu
POST example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/postEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define postEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) { $in .= getc ; } print “<html><body><h1>Hello $i!</h1></body></html>\n” ; dbc@csit.fsu.edu
Using the CGI module • The previous example illustrate the underlying mechanisms used to communicate between server and CGI program. • One could go on to use the text processing features of Perl to parse the form data and generate meaningful responses. • In modern Perl you can (and presumably should) use the CGI module to hide many of these details—especially extracting form parameter. dbc@csit.fsu.edu
CGI module example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/CGIEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define CGIEg.pl by: #!/usr/bin/perl use CGI qw( :standard) ; $name = param(“who”) ; print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $name!</h1></body></html>\n” ; • Now the browser gets a more friendly message like: Hello Bryan! dbc@csit.fsu.edu
Getting Started with Servlets dbc@csit.fsu.edu
Server Software • Standard Web servers typically need some additional software to allow them to run servlets. Options include: • Apache Tomcat The official reference implementation for the servlet 2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server. • JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for servlet development. • Sun’s Java Web server An early server supporting servlets. Now apparently obsolete. • Allaire JRun, New Atlanta’s ServletExec, . . . dbc@csit.fsu.edu
Tomcat • In these lectures we will use Apache Tomcat for examples. • For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing. • The current architecture of servlets makes revision of servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server. • Hence you will be encouraged to install your own private server for developing Web applications. • Tomcat is the flagship product of the Jakarta project, which produces server software based on Java. dbc@csit.fsu.edu
Typical Modes of Operation of Tomcat Tomcat Browser 1. Stand-alone 8080 Servlet Request Client Server Apache 2. In-process servlet container Browser Tomcat 80 Servlet Request Client Server Apache 3. Out-of- process servlet container 80 Browser Servlet Request Tomcat Client 8007 Server dbc@csit.fsu.edu