1 / 48

Web Applications

Web Applications. To this point, we have used only static and simple pages to test out Apache We can enhance our pages through client side scripting – code that runs in the browser to manipulate what the user sees or for the user to interact with

linnea
Download Presentation

Web Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Applications • To this point, we have used only static and simple pages to test out Apache • We can enhance our pages through • client side scripting – code that runs in the browser to manipulate what the user sees or for the user to interact with • typical client side scripting is done in JavaScript or Java Applets • server side scripting – code that runs on the server (or is invoked by the server but runs on another machine) to tailor the web page being delivered • this allows dynamic web page content – that is, web pages that are created dynamically • typically server side scripting is done in php, perl, asp or possibly JavaScript, although it could also be done in Linux shell scripting • this is sometimes referred to as dhtml – creating an html page dynamically • To support server side scripting, Apache must be configured for it, that’s what we study here

  2. Server Side Includes • One approach to server side scripting is to use server side includes (SSI) • we embed commands in our html files that will alter the web page by allowing us to • insert the content of other files • access environment variables to make decisions • insert the result of executing either a program or a Linux command • all SSI commands have the form • <!--#command #argument=value --> • where command is one of the available SSI commands (see the next slide) • and argument is the parameter that is to be supplied the given value • this becomes clearer with examples • we will use perl for some of the code examples • files that include SSI commands must be executable rather than just readable • we will set these files to be 745 or 755 instead of 644 – this is explained later

  3. SSI Commands • An SSI is a statement that Apache executes when it appears in the html document • therefore, whatever the SSI returns is inserted right into the html page (thus giving the page dynamic content) • The commands are • echo – output a variable’s value at this point of the web page • fsize – provide the file size of the specified file • flastmod – provide the last modification date of the specified file • config – used to format how output will appear in the page when the output is produced by another SSI command as listed above • include – include the specified file • place the file’s content at this point of the web page • printenv – print out all existing environment and SSI variables • this will not have much use in general, but could be used in debugging • exec – execute the named program if the argument is cgi, execute the named Linux command if the argument is cmd • set – set a value of a variable • used to pass parameters to a cgi program

  4. CGI • Before we examine these include commands in detail, what is CGI? • it stands for common gateway interface • it provides a standard for interfacing the web server with server applications (programs/scripts) • in Linux, typical input comes from STDIN (usually the keyboard) but in CGI, input comes from environment variables • you can set your own variable(s) using the set command • in Linux, typical output goes to STDOUT (usually the monitor) but in CGI, the output goes directly to Apache which then inserts (or appends) the output into a web page being created, which is then returned to a browser • in Linux, error messages go to STDERR but in CGI, the error messages go to Apache to be logged • or sent to a pipe if you have set it up so that errors are piped to some Linux command(s)

  5. The include Command • Here we will focus on the various SSI command starting with the easiest one, include • the include command takes the contents of the specified file and includes it at this point in the html document • you might use this if you have a common section such as a navigation bar or footer • you can have as many include statements as you like, in fact, an html document could be virtually all include statements whereby all content is loaded at run-time • the included file(s) need to have permission of 644 (at least) • specify the file to be included using • file=“filename” if the file is in the same directory as the html file • or virtual=“/directorypath/filename” if the file is elsewhere • you have one file/virtual argument for each include command • <!--#include file=“news.txt”--> • <!--#include virtual=“/pub/includes/news.txt”-->

  6. The echo Command • The echo command takes the argument var=“value” where value is one of the environment variables • the environment variables include CONTENT_TYPE, DATE_GMT, DATE_LOCAL, DOCUMENT_NAME, DOCUMENT_URI, HTTP_REFERER, HTTP_USER_AGENT, LAST_MODIFIED, REMOTE_HOST, REMOTE_ADDR, REQUEST_METHOD, SERVER_ADDR, SERVER_NAME • most of the above are self-explanatory, HTTP_REFERER is the URL of the page whose link led you to this page (or blank if you typed the URL in the browser’s location box), HTTP_USER_AGENT is the user’s browser type • example: <!--#echo var=“DATE_LOCAL”--> • this outputs the date at the point of the html file where this SSI appears

  7. The config Command • You will use the config command to format the output that will appear in the web page • you would couple config either with an echo command to format the environment variable that you want output, or with the fsize or flastmod command • in any case, the config command comes first • The config command permits three possible attributes • errmsg – the message sent back in case of a parsing error • e.g., <!--#configerrmsg=“An SSI error occurred”--> • sizefmt – specify the file’s size if the output is to be a file size • options are bytes, kb or mb • e.g., <!--#configsizefmt=“bytes”-->

  8. Third Attribute • timefmt – how you want the time/date to be output as a format string that consists of some combination of the following • %a, %A – abbreviated or full name of the day of the week • %b, %B – abbreviated or full month name • %d, %e – day of the month zero-padded (e.g., 05) or not padded • %H, %I – hour using 24 hour clock or 12 hour clock • %j, %m – day of year (001-366), month (1-12) • %M – minute as an integer • %p – am or pm • %S – seconds • %U , %W, %y – week number starting with 1 being first Sun/Mon of the year or with Sunday being 0 • %y, %Y – year to 2 digits or to 4 digits • %Z – time zone • %% - the % character

  9. Examples • <!--#configtimefmt=“%A, %B %e, %Y”--> • <!--#echo var=“DATE_LOCAL”--> • output might be Tuesday, March 15, 2011 • <!--#configtimefmt=“%a %d %b %y”--> • <!--#echo var=“LAST_MODIFIED”--> • output might be on Tue 05 Jan 11 • <!--#config sizefmt=“bytes”--> • <!--#fsize file=“data.txt”--> • we might add text to these last two SSI’s so that the output is more than just a number, for instance File Size: … • We may want to do more than just output one of these values so we might write our own CGI program that accesses any of these through environment variables and skip the echo/fsize/flastmod and config commands

  10. Include & Echo Example • Let’s assume that we want to include at the bottom of every html page a footer which consists of our company’s name and the last modification date of the given page • We can add the footer of the company’s name using CSS • <div class=“.footer”>company name<br /> • However, to get the last modification date on the page, we either have to add this manually every time we edit any html page, or we use the LAST_MODIFIED variable • put last modified on <!--#echo var=“LAST_MODIFIED”--> • and precede the echo with a config statement • Alternatively, we can create our own footer page which consists of the company’s name and the above echo (and config) statement(s) • We include the footer on every html page with • <!--#include virtual=“/outfooterpage.html”--> • no matter where we are under DocumentRoot, we get the same footer page • since the footer page has SSI in it (echo and possibly config), we need to make it executable (745 or 755)

  11. Another Example • Let’s assume you want to build your own error page to be called via an ErrorDocument directive • by using #echo, you can make your error document output useful information to the user • what useful information might we want to output? • HTTP_REFERER will tell us if the URL was entered by the user or by a link • DOCUMENT_URI will let us output the incorrect URL • REQUEST_METHOD will let us output the method attempted • REMOTE_HOST will let us output that we couldn’t send the result back to the user <html><head><title>404 Error</title></head> <body> I’m sorry, but the requested file <!--#echo var=“DOCUMENT_URI”--> is not available via the <!--#echo var=“REQUEST_METHOD”--> from the requester <!--#echo var=“HTTP_REFERER”--> and so I cannot return a file to you at <!--#echo var=“REMOTE_HOST”-->. </body></html>

  12. How to Use #include • Create your html file as before • include any SSI statements needed • e.g., <!--#include file=“stuff.txt”--> • the file to be included must either be • in the same directory as the html file • or it must be somewhere under DocumentRoot and instead of file=, you specify virtual=“full path from DocumentRoot” as in virtual=“/cgi-bin/foxr/stuff.txt” • The html file’s permissions must be executable rather than just readable • set the file’s permission to be 745 (or 755) and add the directive XBitHack on in your conf file • note: you can avoid this by naming the file .shtml and adding the proper handler for shtml, but the 745 approach is better • the file being included must be readable (chmod 644 cgifilename)

  13. The exec Command • The exec command executes code • there are two types of code you can execute • specify a cgi file to execute using cgi=“…” • specify a Linux command using cmd=“…” • When calling a CGI program, all of the environment variables are available as well as any SSI include variables (specified using the set command) • if calling a Linux command, SSI variables specified by set can be accessed, but none of the CGI environment variables are available • If calling a cgi program and the program is located in the current directory, then you list the file by name, otherwise you must specify the filename’s path starting from DocumentRoot • note: if your cgi programs are not under DocumentRoot, you have to set up the proper path through a ServerAlias directive

  14. Content-Type • Recall that any SSI command that produces output will be inserted in the html file right at the point where the SSI command is listed • it is possible that a CGI program is to be used to generate the entire contents of a web page • if so, then the output of the CGI program must include proper header information for apache • apache automatically inserts certain header information into a page, so at a minimum, the CGI program must output the Content-Type directive and a blank line, e.g., Content-Type: text/html • without this, the exec command will cause an error when placing its output into the html document

  15. Example • We have an html page which has the following SSI • <!--#exec cgi=“helloworld.sh”--> • The directory containing our html page has the following shell script, whose file is named helloworld.sh • The echo statements produce output which is then redirected from STDIN to the html file, and thus running the script produces output for this page • of course, this is a trivial example, more often our CGI program will read data from various files (perhaps a database) to build the output and generate something dynamic • notice the first echo is the Content-Type directive, the second provides a blank line, and the third provides the actual content including any necessary html tags #!/bin/bash echo “Content-Type: text/html” echo echo <html><body><p>Hello World!</p></body></html>

  16. File Permissions • Any html file that includes SSI commands must be executable rather than just readable • typically our html files have permission of 644 • now we need to set them to be 755 • apache executes SSI commands but the owner of the html page is not apache, so the html file needs to be executable by the world • notice that such a file does not need to be group executable, so we could make the permissions 745 instead of 755 • As an example, assume we have an file foo.html with an exec SSI which invokes a program bar.cgi and bar.cgi reads and writes to a data file bar.dat • foo.html will be 745 because it contains SSI • bar.cgi will also be 745 because it is a program • bar.dat will need to be 666 (or 646) because apache will have to write to it

  17. Executing a Perl Program via Exec • As an example, we want to count the number of visits that a given html page has had • we need two extra files aside from the html page • a text file that stores the current number of visits • a CGI program which will • open and read this file (a single number) • add one to this value • store the value back to the file • optionally, print the number using echo (to output to the web page) • our web page will have the following SSI command • <!--#exec cgi=“counter.cgi”--> • or <!--#exec cgi=“/cgi-bin/counter.cgi”--> if the file is stored in some central depository (say cgi-bin) • permission are as follows: • our web page needs 745 since it includes SSI • counter.cgi needs 745 since it is executable • the data file needs 666 (or 646) since we are reading and writing to it

  18. Example • You have a web page and you want it to generate a random quote whenever someone visits it • amid the text in the web page, you invoke a perl program which generates a random number, and uses it to index into an array of strings, it then prints out the string of the selected number • since this is the output of the perl program, it appears in the web page The web page The Perl Program <html><body> #!/usr/bin/perl -wT Here’s my vegetarian note of the day: srand; <!--#exec cgi=“random.pl”--> my $number=$substr(rand(7),0,2); Ok? my @quotes=(“…”,”…”,…,”…”); </body></html> print “Content-Type: text/html\n\n”; print “<p>$quotes[$number]<p><p>”; Both files need permission of 745

  19. The set Command • This command is used to create and initialize SSI include variables • <!--#set var=“variable name” value=“value”--> • You can only set one variable per set instruction • if you have multiple values to pass to either a cgi program or a Linux command, you will need multiple set instructions • for example, we might want to pass to a program the variables x and y initialized to 1 and 3 respectively • <!--#set var=“x” value=“1” • <!--#set var=“y” value=“3” • <!--#exec cgi=“someprogram.pl”--> • in the program, you access the variables’ values using $ENV{‘name’} as in $ENV{‘x’}

  20. Example • Imagine that you want to generate a different random quote for different pages but want to use the same random quote program from two slides back • here we can use the set include • on our vegetarian page, we do • <!--#set var=“type” value=“veggie”--> • <!--#exec cgi=“random.pl”--> • and on our Frank Zappa page, we do • <!--#set var=“type” value=“zappa”--> • <!--#exec cgi=“random.pl”--> • the program defines two arrays, @quotes and @zappa, and we use an if statement to decide what to print • if($ENV{‘type’}==“veggie”) print “<p>$quotes[$number]<p><p>”; • else print “<p>$zappa[$number]<p><p>”;

  21. Configuring Apache for CGI • To execute cgi scripts, you need to use the mod_cgi module • this is part of apache 2.2’s base so we do not need to separately compile or load it • One directive that might be of use is the ScriptAlias directive to establish the location of all CGI scripts • ScriptAlias /cgi-bin/ /var/web/scripts/cgi-bin/ • in this case, we place our cgi-bin scripts under a location separate from DocumentRoot for security purposes (DocumentRoot would be /var/web/html) • without the trailing / in both paths we will get errors • we will probably want a <Directory> container for the cgi directory so that we can establish directives specific to our cgi scripts (see the next slide)

  22. Directives • For our cgi-bin directory, we need to establish the proper directives • since the / directory has access denied, we will probably need to establish allow from all • the directory needs to establish that cgi files can be executed • Options +ExecCGI • the proper handler is set up for cgi files • AddHandlercgi-script .cgi • other extensions can be added such as .pl for perl or .php for php • make sure 745 files can be executed by apache • XBitHack on • these last two can go outside of this directory container • Our includes now must reference the proper directory • <!--#include virtual=“/scripts/cgi-bin/filename…”--> • <!--#exec cgi=“/scripts/cgi-bin/filename…”--> • this path becomes DocumentRoot/scripts/cgi-bin/ which is then translated to ServerRoot/cgi-bin/

  23. How to Use #exec • This will be similar to the previous slide with a few modifications • add your SSI <!--#exec cgi=“…” • the file listed must have permissions of 745, not 644 • the directory containing this program must have Options ExecCGI and the AddHandler statement • if we place the CGI file in a cgi directory, we need the proper ScriptAlias directive • Unlike other SSI statements, we have to make sure our script actually runs correctly • output Content-type: text/html followed by a blank line • contain no syntax errors • do not generate run-time errors • If any of these are not true, then the result is a mysterious server error message which does not tell you what happened • to figure out what went wrong, consult your error log which might give you more detail, if not, its up to you as a programmer to debug the script – which can be very challenging

  24. Formatting the Output of a Program • Recall that whatever you output from your exec command is placed into the html file which is being returned by Apache to the client’s browser, and will be parsed as html • if your program produces formatted output via \n and \t characters, they are ignored by the browser • your program should generate html formatting tags such as <p> or <br> • or, use <pre>…</pre> tags (preformatting) around the SSI statement • Consider using <!--#exec cmd=“ls -l”--> • the “ls -l” command returns the long listing with \t and \n separating items on a line and separating files onto separate lines, but these would be ignored by a web browser making the output very hard to read • so instead, you would want to do • <pre> • <!--#exec cmd=“ls -l”--> • </pre>

  25. CGI Debugging • You have written a CGI file and then an html file with an exec statement • You attempt to load the html file and • you get the html content but not the CGI output – why? • there are many possible reasons such as the CGI program has produced errors • one error might be that your CGI program is not producing the proper header (Content-Type: text/html) • you get a server error • it is possible that apache is not configured properly or the CGI program has the wrong permissions • By examining your error_log file, you might find some clues • you can also try to run your cgi program from a command line prompt to see if it works as expected • perl programs can be run by typing perl –wT filename • shell programs can be run by ./shellname

  26. The Counter Program The program to count the number of occurrences is shown below, written in Perl #!/usr/bin/perl –wT use CGI qw(:standard); use strict; use Fcntl qw(:flock :seek); # we will use the function flock and seek print “Content-type: text/html\n\n”; # print content-type header open(IN, “+<counter.dat”); # open the data file flock(IN,LOCK_EX); # lock the file while it is in use seek(IN,0,SEEK_SET); # start at the beginning of the data file my $count=<IN>; # read the datum in the file, store in $count $count = $count + 1; # increment count truncate(IN,0); # erase the file seek(IN,0,SEEK_SET); # start at beginning of data file print IN “$count\n”; # output count to data file close(IN); # close data file print “You are visitor $count.<p>\n”; # optional, output to html page

  27. 404 Error Log Example • As we saw in chapter 7, we can set up our own error page based on the error type using ErrorDocument • Here we create our own error page which uses a cgi program to log the error • recall from chapter 7 that an error is automatically logged, but here we combine the error page and the logging • Set up an html file that will display the error message (in this case a 404) • assume this is stored under DocumentRoot at /error/error-page404.html • Add this SSI directive to error-page404.html (somewhere) • <!--#exec cgi=“/cgi-bin/errors/log404.cgi”--> • Add to httpd.conf (or an .htaccess file) • DocumentError 404 /error/error-page404.html • this file should have 745 permission • the log404.cgi program should have 745 permission • the log404.cgi program will open a file called errorlog404.txt and this file should have a 666 permission

  28. log404.cgi Program • Also written in Perl, looks similar to counter.cgi • in this case, we do not erase the previous file’s contents but instead append to the file at the end (the SEEK_END statement) • We use environment variables: • HTTP_REFERER – the page whose link led to the 404 error • REQUEST_URI – the path/filename (or robot if the URI was generated by some software like a web crawler) #!/usr/bin/perl -wT use CGI qw(:standard); use strict; use Fcntl qw(:flock :seek); print header; open(OUT,“>>errorlog404.txt”) or exit; # open errorlog404.txt # but if not found or error, exit flock(OUT,LOCK_EX); seek(OUT,0,SEEK_END); # move to end of file print OUT “Referer: $ENV{HTTP_REFERER} URI: $ENV{REQUEST_URI}\n”; close(OUT);

  29. Control Directives • Another SSI command is #if, which allows you to test a condition to determine what the html page should do • the condition is based on values of environment variables • based on the result of the comparison, you can issue a specific SSI statement • The basic form is • <!--#if expr=“${var name = value}”--> • Action • <!--#endif--> • Conditions can be complex using && to join conditions together • The action specifies what will be placed at that position of the html page, it can be • text • html • images (imgsrc) • further SSI calls

  30. Examples • Imagine that you want to display a particular type of image but you know that the image will display correctly in Mozilla and not IE • <!--#if expr=“${USER_HTTP_AGENT = Mozilla}”--> • <imgsrc=“someimage.xyz”> • <!--#endif--> • by accessing the value of USER_HTTP_AGENT, a piece of information sent in the request header, your html page can make a decision on whether to display the image or not • Another use of this directive is to determine if the user reached this page directly or via a link and respond appropriately • <!--#if expr=“${HTTP_REFERER = “”}”--> • Please note that this URL will be changing to … in the future! • <!--#endif--> • here, we alert users who directly entered the URL of the change to take place but this text message does not get displayed to users who reached this URL via a link

  31. #else and #elif • For more complicated logic, you may want to use an if-else or a nested if-else structure • an else clause can be provided using the <!--#else--> directive and a nested if by the <!--#elifexpr=“…”--> directive • you would use elif if you have another condition to test • you would use else if you have just two options, do one action if the condition is true and do the other action otherwise, or after the last elif statement if you have a “default” case • here is the format for an if-else structure • <!--#if expr=“$varname = value”--> • action • <!--#else--> • action • <!--#endif--> • The elif has its own condition as in • <!--#elifexpr=“{$var name}”--> • You can have as many elif directives as you want prior to your endif and you can have 1 else before an endif

  32. #if-elif-else Example • We have a web page, therealfile.html, that will not appear correctly in Macintosh’ default browser or in IE • we create a “wrapper” file called thewrapperfile.html which includes these SSI statements • BrowserMatchNoCasemacintoshmac • BrowserMatchNoCase MSIE ie • these statements establish the variables mac and ie if the browser is macintosh or MSIE, if neither, then these variables do not get established • <!--#if expr=“${mac}”--> • I’m sorry, this page is not set up for macintosh browsers, please use Mozilla • <!--#elifexpr=“${ie}”--> • I’m sorry, this page is not set up for Internet Explorer browsers, please use Mozilla • <!--#else--> • <!--#include file=“therealfile.html”--> • <!--#endif--> • so either threalfile.html is included into this page or we get one of the two apology messages • thewrapper.html file needs a permission of 745 while therealfile.html needs permission of 644

  33. CGI and Efficiency • There is some concern that CGI is not very efficient • the apache server must invoke another program to run the script because the script is written in another language (e.g., perl, shell script, javascript) • thus, apache first loads that language’s interpreter, runs the script, and pipes the output to the html document • apache must do this for every script run – even if the same language is run several times in a row by the same or different web pages, apache is reloading the appropriate interpreter each time • FastCGI gets around this problem by keeping the language interpreter’s process persistent so that it can be invoked later as needed • there is a FastCGI module (mod_fastcgi) available from www.fastcgi.com/dist – it must be downloaded, compiled and installed and then the module must be loaded into apache

  34. Embedded Languages • For commonly referenced programming languages, it might be better to embed those languages in apache via a language module • there are numerous language modules for all of the common (and some uncommon) scripting languages • asp, perl, php, phython, ruby, tcl are fairly common • some of the language modules available were implemented for apache 1.3 but since the languages are not very popular, or have not been supported, they do not work in later versions of apache • Aside from efficiency, by including a language module, you can then embed code of that language right into your html document • in the next slide, we see an example embedding php into an html file • of course, this is only useful if you want to program in that language!

  35. Embedding PHP • First, we need to make sure php is installed on the same machine as apache • Next, we have to make the apache module available in the directory container for the file(s) that will use PHP • We add a LoadModule directive and then the directive that tells apache how to handle .php files • AddType application/x-httpd-php .php • if we were going to have a different language, we would need the appropriate MIME type for that language • application/x-c .c for C programs • text/x-script.scheme .scm for Scheme files (Scheme is a Lisp-like language) • Finally, embed your php code in the html file • <?php • // php code here • ?>

  36. Two Brief Examples <html><body> Here is some information about the version of php running on this computer: <p> <?php phpinfo( ); ?> </body></html> <html><body> Did you know that <?php $x = 5; $y = 7; $z = $x * $y; echo “<p>$x * $y = $z?<p>”; ?> A PHP program told me that! </body></html>

  37. Combining Languages • As you saw on the previous slide, its possible (and likely) that you can combine html and php • in fact, you can combine multiple languages in any document, you saw earlier a combination of html and cgi (ssi) • html, ssi, perl, php, javascript • Are there any rules or conventions to combining languages? • not really – you just have to make sure that the language is one that the server knows • the server knows html and SSI but not necessarily php, perl or javascript so if you use languages that the server does not know, you will have to load the proper module(s) • and you have to make sure that you use the proper file extension for the AddHandler statement (for instance, if you have a php handler set for .php files, make sure the file name ends in .php and not .cgi) • you would also most likely not use php, javascript or perl to write pages that have static content (e.g., the second example from the last slide makes no sense – why use php to output something static?)

  38. Content and Language • Users can establish their preference for how content appears • fonts • character encoding • language • in Mozilla, go to Tools  Options  Content and click on the buttons for Advanced… under Fonts & Colors and Choose… for Languages • An apache web server can select a file to return and/or an encoding method to use based on negotiation • basically, what does the user want? • can I accommodate that? • if not, how close can I come? • This is known as content and language negotiation • negotiation is performed using file extensions or variant files

  39. MultiViews • To permit apache to negotiate, you have to establish the MultiViews option in whichever directory(ies) you want this capability • recall that the MultiViews is not included when you say AllowOverride all, so it must be explicitly made available in an AllowOverride statement • The MultiViews option works as follows • if the request for a file in a specified directory does not match exactly because of the extension, then MultiViews causes apache to put together all files that match up to the extension • if the request is for /dir/sub/file1.html and there are files /dir/sub/file1.html.en and /dir/sub/file1.html.de, then apache chooses which of the two files would be most preferred • with MultiViews, apache selects the file that best matches the client’s browser specification as stated in the request header

  40. Negotiation with MultiViews • With MultiViews enabled for a directory, you can then specify how negotiable items map to files by including statements in the <Directory> container or the directory’s .htaccess file • for languages, use AddLanguage language .file as in AddLanguage en .en • this would require that any file that may have language negotiation will end in .en for English, as in foo1.html.en versus foo1.html.de (for German) • you would not create a file called foo1.html as it would automatically be selected every time and thus defeat the purpose of using language negotiation but instead foo1.html.en, foo1.html.de, etc • for multimedia types, we have already seen how to define the types to file extensions in the mime-types file and/or the AddType directive • for encoding types, use the AddEncoding directive such as x-gzip and x-compress

  41. Type Maps • These are files that specify the preference of content encodings, types, and languages • a type map can contain specifiers for any or all of these • To add a type map, use AddHandler type-map .extension in your httpd.conf file • Then create your type map whose name will be the name of the file to be negotiated, with a further extension as specified in the AddHandler statement • so for instance, if you have foo1.html.en and foo1.html.de and you have specified AddHandler type-map .var, then your type map for these files will be named foo1.html.var • The type map will primarily consist of entries describing the available type(s), language(s) and encoding(s) but may (optionally) also include the file’s • length – if not specified, this is filled in by the server • URI – relative to the type map • the body of the file itself

  42. Type Map Specifiers • Each variant (language, type, encoding) is specified separately, in the file and a file can contain any combination of variants and any number of options for the variants • language is specified using Content-Language • Content-Language: en, it, fr, de • type is specified using Content-Type and can include optional parameters • level – an integer that specifies the version number of that type • qs – the “quality” of the type, rated from 0.0 to 1.0 (best) – this can be used to denote the relative quality of different types, for instance a .png versus a .jpg versus a .gif file • example: Content-type: image/jpeg; qs=0.8, image/gif; qs=0.5 • encoding is specified using Content-Encoding and the encodings listed must be previously defined using AddEncoding directives

  43. Example Type Map • Imagine that we have a file named foo.html, it is available in english, french or german • the french and german versions are encoded using character set iso-8859-2 (shown below to the left) • Or, imagine several versions of the same content file but an image is present, so there are three versions, one as a jpg, one as a gif and one where the image is omitted in favor of text description • shown below to the right URI: foo URI: foo URI: foo.en.html URI: foo.jpg Content-type: text/html Content-type: image/jpeg; qs=0.8 Content-language: en URI: foo.gif URI: foo.fr.de.html Content-type: image/gif; qs=0.5 Content-type: text/html;charset=iso-8859-2 Content-language: fr, de URI: foo.txt Content-type: text/plain; qs=0.01

  44. Browser Preferences • For negotiation to be used, the user’s browser must have its own ideas of what is preferred • This is set up by the user (or default) and the information is sent to the server using the header • in the header, a list of acceptable languages is found with the Accept-Language specifier • a list of acceptable types are found with the Accept specifier • a list of acceptable character sets is found with the Accept-Charsetspecifier • a list of acceptable encodings is found with the Accept-Encoding specifier • each of these can have a quality (q) value attached • Here is an example of a user who prefers French over English, html text over any other form of text and gif over jpg over any other form of image or multimedia • Accept-Language: fr;q=1.0, en;q=0.5 • Accept: text/html;q=1.0, text/*;q=0.8, image/gif;q=0.6, image/jpeg;q=0.6, image/*;q=0.5, */*;q=0.1 • notice there are no spaces after the ; and before the q

  45. Negotiation Algorithm • Browser presents the acceptable types to the server in the header – denoted using the Accept clauses (e.g., Accept-Language, Accept-Charset) • Server generates a list of all available resources and lists the variants • for instance, en/jpeg/x-compress, en/jpeg/x-gzip, fr/jpeg/x-compress, fr/gif, fr/gif/x-gzip • for each variant, see if there are dimensions that the browser does not accept and if so, discard that resource • Select the best variant by process of elimination • multiply the q and qs values of each dimension found and select the best language • on a tie, continue and select the best media type • on a tie, continue and select best charset • on a tie, continue and select best encoding • on a tie, select the variant with the smallest content length • on a tie, pick the first variant • if no variant matches, return a 406 error (no acceptable representation)

  46. Forcing Languages • If the browser cannot find any language match, it will return a 406 error or possibly a 300 status code (Multiple Choices) • this can be avoided if you set your server to force a language if no match is found • Add these directives • ForceLanguagePriority Prefer • alternatives to Prefer are Fallback and None • LanguagePriority list • the list is the list of languages in order of priority that should be attempted (e.g., en fr de) • Prefer and Fallback both will select any valid language found from the user’s browser list and the LanguagePriority list if a match is found • they differ in that prefer will never return 300 but will select the first file, and fallback will never return 406 but will select the first file

  47. Server “Fiddling” • In some cases, browsers send accept lists without q values • the apache server can, in such a case, add q values to make the accept list make more sense • for instance • Accept: text/html, text/plain, image/gif, image/jpeg, */* • the idea here is that any of these types are acceptable • yet, it makes it seem like */* is as acceptable as the rest, so apache will interpret this as • Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01 • this forces */* to be the least desirable, if an entry has no q value, it is interpreted as 1.0

  48. Negotiation and Caching • Consider that you have requested some file, foo1.html, and through negotiation, the server returns foo1.html.en • this file is cached locally • The next time you request foo1.html, since the URL matches something in your cache, the cached version is returned to your browser • but if you have altered your preferences, you are now getting the wrong file • so to avoid this problem, the apache server will mark any file that is returned as a result of negotiation as non-cacheable • if you want to alter this behavior, add the CacheNegotiatedDocs directive to your httpd.conf file (or a <VirtualHost> tag) • note that this directive will only have an impact on browsers that are communicating using http/1.0, not 1.1

More Related