Evolution of WWW and Web Browsers: A Historical Overview

Computer Networks 2 The Application Layer: WWW Veton Këpuska

WWW: Brief Historical Overview • Originated from the work done at CERN (Switzerland) by physicist Tim Berners-Lee. • March 1989: Tim Berners-Lee proposed web linked documents as a solution to the problem of communication between numerous researchers of various European countries participating in nuclear research. • September 1990: The first (text-based) prototype was operational. • December 1991: a first public demonstration was given at Hypertext’91 conference in San Antonio Texas. • February 1993: Marc Andreessen at the University of Illinois start developing the first graphical browser: Mosaic. • Mosaic become so popular that made Andreessen to form Netscape Communications Crop whose goal is to develop clients, servers, and other Web software: • 1995: Record initial public offering of $1.5 billion. • 1998: Sold to AOL for $4.2 billion. • W3C (World Wide Web Consortium) by MIT and CERN: www.w3.org Veton Këpuska

WWW: Architectural Overview • WWW consists of vast worldwide collection of documents or Web pages. • Each page may contain a link (hyperlink) to other pages anywhere in the world. • The link can be followed by clicking on it. • Vannavar Bush in 1945, a visionary MIT professor of Electrical Engineering Department, invented the idea of having one page point to the another (now called hypertext). • Browser are programs that allow the users to view those pages. Hyperlinks are strings of text that links to other pages. They are often highlighted, by underlining, displayed in different color or both. • Non-graphical browsers (Lynx) are not as popular as • Graphical browsers (Microsoft’s Internet Explorer, Netscape's Navigator). • Voice-based browsers are currently being developed. • Basic model of how the Web works is shown in Fig.7-19. Veton Këpuska

WWW: Architectural Overview • The parts of the Web Model Veton Këpuska

The Client Side • Browser is a program that can: • Display a Web page • Catch mouse clicks to items on the displayed page. • When an item is selected the browser follows the hyperlink and fetches the page selected. • Embedded hyperlink needs a way to name any other page on the Web. • Pages are named using URLs (Uniform Resource Locators). Veton Këpuska

The Client Side • URL has three parts: • Name of the protocol (http) • DNS name of the machine where the page is located (www.abcd.com) • Optional name of the file containing the page (products.html). • Steps that occur when a link is selected (e.g., http://www.itu.org/home/index.html): • The browser determines the URL (e.g., www.itu.org) • The browser asks DNS for the IP address of www.itu.org. • DNS replies with 195.167.168.15 • The browser makes a TCP connection to port 80 on 195.167.168.15. • It sends a request for file /home/index.html • The www.itu.org server sends the file /home/index.html • The TCP connection is released. • The browser displays all the text in home/index.html. • The browser fetches and displays all images in this file. Veton Këpuska

Browsers • “Understanding” of pages requires standardization of the language (HTML: HyperText Markup Language) used to compose such pages. • Note all pages contain HTML. A page may contain: • a document in PDF format, • an icon in GIF format, • a photograph in JPEG format, • a song in MP3 format, • a video in MPEG format, or • any of hundreds of other file types. • Standard HTMPL pages may link to any of these documents the browser will have a problem when it encounters a page it cannot interpret. Veton Këpuska

Browsers • Rather than making browsers larger and larger by building in interpreters for a rapidly growing collection of file types most browsers have chosen a more general solution: • When a server returns a page it also return some additional information about the page. This information includes MIME type of the page (see Fig. 7-12). • If MIME type is not one of the built-in ones the browser consults its table of MIME types to tell it how to display the page. This table associates a MIME type with a viewer. Veton Këpuska

Browsers • Two possibilities for viewers: • Plug-ins: • Helper applications. • Plug-ins • A code module that the browser fetches from a special directory on the disk and installs as an extension itself as illustrated in Fig.7-20(a). • Plug-ins run inside the browser. • Plug-ins are removed from the browsers memory when the job is done. • Each browser has a set of procedures that all plug-ins must implement so the browser can call the plug-in. This set of procedures is the plug-in’s interface and is browser specific. • Installation: • Downloading and installing the file form plug-in’s Web site. • Register setting are augmented and plug-in’s MIME type are associated with the plug-in. Veton Këpuska

Helper Applications • Helper Applications: • Complete programs running as separate process as illustrated in Figure 7-20(b). • Helper does not make use of browser services nor does it interface to the browser. • Typically they are large programs that exist independently of the browser: • Adobe’s Acrobat Reader for displaying PDF files or Microsoft Word. • Note some helper programs (e.g., Adobe) has plug-in that invokes the helper file. • Many helper applications use the MIME type application. Veton Këpuska

Helper Applications • Not restricted to application MIME type: • Adobe Photoshop uses image/x-photoshop • RealOne Player handles audio/mp3. • On Windows when a program is installed on the computer, it registers the MIME types it wants to handle. • This mechanism leads to conflict when multiple viewers are available for some subtype: e.g., video/mpg. • Last program registered overwrites existing (MIME type, helper application) associations capturing the type for itself. • Consequently installing anew program may change the way a browser handles existing types. • On UNIX, this registration process is generally not automatic. The user must manually update certain configuration files (more work less surprises). Veton Këpuska

Helper Applications • Ability to extend the browser with large number of new types is convenient but can also lead to trouble. • A browser that fetches a file with extensions exe it realizes that has no helper file associated with it thus obvious action is to run it. This originally created enormous security hole. • To prevent problems like the one of executing a virus (exe file) unintentionally browsers (e.g., Internet Explorer) can be configured to be selective about running unknown programs automatically. • On UNIX an analogues problem can exist with shell scripts, but that requires the user to consciously install the shell as a helper. This installation process is sufficiently complicated that nobody could possibly do it by accident. Veton Këpuska

The Server Side • Browser parses the URL and interprets the part between “http://” and next “/” as a DNS name to look up. • After obtaining IP address of the server the browser establishes a TCP connection to port 80 on that server and then it sends the command containing the remaining part of URL (which is the name of the file on that server). • Server returns the file for browser to display. • Servers main steps thus are: • Accept a TCP connection from a client (a browser). • Get the name of the file requested. • Get the file. • Send the file to the client. • Release the TCP connection. Veton Këpuska

The Server Side • The problem with this design is that every request requires making a disk access to get the file. • Consequently Web server cannot serve more requests per second than it can make disk accesses. • Hi-end SCSI disk has an average access time of around 5 msec, which limits the server to at most 200 requests/sec or less if file size is large. For major Web site this figure is too low. Veton Këpuska

The Server Side • Cashing: • Frequently accessed files are cashed. • Multithreaded Server: • Server Designed with Front End module that accepts all incoming request and k processing modules as shown in Fig.7-21. • K+1 threads all belong to the same process so the processing modules all have access to the cache within the process’ address space. Veton Këpuska

The Server Side • To get real improvement over the single-threaded model it is necessary to have multiple disks. With • k processing modules and • k disks, • the throughput can be as much as k times higher than with a single-threaded server and one disk. • Modern Web servers do more than just accept file names and return files. Actual processing of each request can get quite complicated. For this reason in many servers each processing module performs a series of steps depending on the case at hand as outlined bellow. Veton Këpuska

The Server Side • Resolve the name of the Web page requested. • Authenticate the client. • Perform access control on the client. • Perform access control on the Web page. • Check the cache. • Fetch the requested page from disk. • Determine the MIME type to include in the response. • Take care of miscellaneous odds and ends. • Return the reply to the client. • Make an entry in the server log. Veton Këpuska

The Server Side • If too many requests come in the CPU will not be able to handle the processing load no matter how many disks are used in parallel. • Adding more nodes possibly with replicated disks to avid having the disk become the next bottleneck. • Server replication solution leads to creation of the server farm as depicted in Fig.7-22. Veton Këpuska

Server Side • Problems with server farms: • There is no shared cache because each processing node has its own memory – unless an expensive shared-memory multiprocessor is used. • One way reduce the impact of this problem is to have the front-end keep track of where it sends each request and send subsequent requests for the same page to the same node. This procedure makes each node to specialize in certain pages so that cache space is not wasted by having every file in every cache. • Client’s TCP connection terminates at the front end so the reply must go through the front end (as depicted in Fig.7-23(a)). • A trick called TCP handoff is used to get around this problem as depicted in Fig.7-23(b). • Normal request-reply message sequence. • Sequence when TCP handoff is used. Veton Këpuska

URLs – Uniform Resource Locators • At the creation of Web it was immediately clear that it needs mechanisms for naming and locating pages. • In particular three questions had to be answered before a selected page could be displayed: • What is the page called? • Where is the page located? • How can the page be accessed? • URL solution effectively solves all three problems at once. Veton Këpuska

URLs – Uniform Resource Locators • URLs have three parts: • Protocol (also known as scheme) • DNS name of the machine (on which the page is located), and • Local name uniquely indicating the specific page (usually just the file name on the machine where it resides). • Example: • http://www.cs.vu.nl/video/index-en.html Veton Këpuska

Protocols (schemes): http: (HyperText Transfer Protocol) protocol is Web’s native language. ftp: (File Transfer Protocol) used to download files form FTP servers. file: is used to access local files (it does not use ftp that requires a FTP server) news: USENET news system that originates long before Internet. The Web browser will use news protocol to call up news articles as though it were Web page. Many browsers have the interface that enable users to read USENET news even easier than using standard news readers. Newsgroup format – used to get list of articles from a preconfigured news site. Specific File format – used to access specific articles. NNTP (Network News Transfer Protocol) gopher: information retrieval scheme supporting only text and no images. mailto: e-mail protocol. telnet: is used to establish an on-line connection to a remote machine. Most browsers just call telnet program as helper applications. Some common URL’s URLs – Uniform Resource Locators Veton Këpuska

URLs – Uniform Resource Locators • URN (Universal Resource Names) • URLs do not provide any way to reference a page without simultaneously telling where that page is. That is there is no way to request page xyz and to indicate that it does not matter where it comes from. • URN is a system that IETF is working on to allow pages to be replicated. It is a generalized URL scheme. Veton Këpuska

Statelessness and Cookies • Web is stateless: • There is no concept of login session. • Problems with implementation of various additional functions: • Web sites that require clients to register • E-commerce applications (how does the server keep track of the contents of the cart?). • Customized web portals (e.g., yahoo, msn). Veton Këpuska

Statelessness and Cookies • Having a server keep track users by observing their IP addresses is a bad idea. • Users may login from various computers (work, home, remote location using the laptop, etc.) also • NAT will make all outgoing packets from all users of its network bear the same IP addresses. • Netscape devised much-criticized technique called cookies. • Wen a client requests a Web page the server can supply additional information along with the requested page. This information may include a cookie which is a small (at most 4KB) file (or string). • Browsers store offered cookies in a cookie directory on the client’s HD unless user has disabled cookies. • In principle cookie could contain a virus but since cookies are treated as data there is no official way for the virus to actually run and do damage. • Cookie may contain up to five fields as shown in the following figure (Fig.7-25.) Veton Këpuska

Domain: Originating server where cookie came form. Browsers are supposed to check that servers are not lying about their domain. Each domain may store up to 20 cookies per client. Path: indicates a path in the server’s directory structure that identifies which parts of the server’s file tree may use the cookie. It is often “/” which means the whole tree. Content: this is the field were cookie content is stored. It takes the form of name = value. Both name and value can be set to anything by the server. Expires: specifies when the cookie expires. If the field is absent the browser discards the cookie when it exits. Such a cookie is called a nonpersistent cookie. If time and date are supplied the cookie is said to be persistent and is kept until it expires. Expiration times are given in Greenwich Mean Time. Secure: it is used to indicate the browser may only return the cookie to a secure server. This feature is used for e-commerce, banking and other secure applications. Some examples of cookies: Statelessness and Cookies Veton Këpuska

Statelessness and Cookies • Use of cookies to secretly collect information about users Web browsing habits. Veton Këpuska

Static Web Documents • In the simplest form Web pages are just files residing on some server. Those pages are static. • Web pages are written in a language called HTML (Hypertext Markup Language). Veton Këpuska

HTML • The HTML consists of a number of formatting directives enclosed in <> called tags. • Some tags have parameters that are called attributes. Veton Këpuska

HTML: Forms • HTML 1.0 was designed mainly to provide one way traffic from server to client. There was a need to be able to take orders for products via Web stores. In addition Web stores wanted their costumers to do searches of their product. • Those demands lead to HTML 2.0 which included support for forms. Veton Këpuska

HTML • HTML Versions: Veton Këpuska

HTML • Information supplied by the user (after a form is filled) is send to the server as one long string: • It is responsibility of the server to make sense out of this string. • Secure servers encode this information to protect it against possible interceptions. Veton Këpuska

XML and XSL • More complex applications are putting increasing pressure on HTML to separate content from the formatting of a page which standard HTML does not support. • W3C has developed enhancements to HTML to allow Web pages to be structured for automated processing. • XML (eXtensible Markup Language) was developed which describes Web content in a structured way, and • XSL (eXtensible Style Language) describes the formatting independently of the content. Veton Këpuska

A simple Web page in XML A style sheet in XSL XML and XSL example Veton Këpuska

XML Extensions • VXML (Voice eXtensible Markup Language). • “What is VoiceXML? Well it's an XML language for writing Web pages you interact with by listening to spoken prompts and jingles, and control by means of spoken input. VoiceXML brings the Web to telephones. If you want to get a hands on feeling for what this is like, there are an increasing number of voice portals which you can phone into and try out for yourself. Several sites also offer free hosting for VoiceXML” • For more information visit: http://www.w3.org/Voice/ Veton Këpuska

XHTML – The eXtended HyperText Markup Language • Many people in industry feel that in the future the majority of Web-enabled devices will not be PCs but wireless handheld PDA-type devices. • Those devices have limited memory for large browsers full of heuristics that try to somehow deal with syntactically incorrect Web pages. • The next step after HTML 4 is a language that is called XHTML rather than HTML 5 because it is essentially HTML 4 reformulated in XML. • Tags such <h1> have no intrinsic meaning. • To get HTML 4 effect one would need a definition in the XSL file. • XHTML thus has become a new Web standard and should be used for all new Web pages to achieve maximum portability across platforms and browsers. • There are 6 major differences between XHTML and HTML 4: Veton Këpuska

XHTML – The eXtended HyperText Markup Language • XHTML pages and browsers must strictly comply to the standard. • All tags and attributes must be in lower case. • Closing tags are required. For tags that have no natural closing tag such as <br> (line break), <hr> (horizontal ruler) and <img> a slash must precede the closing “>”: <img src=“pic001.jpg” /> • Attributes must be contained within quotation marks:<img src=“pic001.jpg” height=“500”/> • Tags must nest properly:<center><b>Vacation Pictures</center></b>is not legal in XHTML:<center><b>Vacation Pictures</b></center> • Every document must specify its document type. For more details see www.w3.org Veton Këpuska

Dynamic Web Documents • In recent years more and more content has of Web pages has become dynamic; it is generated on demand rather than stored on disk. • Content generation can take place either on the server side or on the client side. Veton Këpuska

Server-Side Dynamic Web Generation • To understand why server side content generation is needed consider following example: • When a user fills a form and clicks on the submit button a message is sent to the server indicating that is contains the contents of a form along with the fields the user filled in. • Note that this message is not the name of the file to return. • Thus what is needed is that the message is given to a program or script to process. Veton Këpuska

Server-Side Dynamic Web Generation • Processing involves: • Using the user-supplied information to look up a record in a database on the server’s disk, and • Generation of a custom HTML page to send back to the client. • E-commerce application: • Browser returns the cookie containing the contents of the shopping cart after the user clicks on “Proceed to Checkout”. • A process (program or a script) on the server side has be invoked to process the cookie and generate an HTML page in response. • The steps required to process the information from an HTML form are illustrated in Fig.7-33. Veton Këpuska

CGI (Common Gateway Interface): Standardized interface to allow Web servers to talk to back-end programs and scripts that can accept input (e.g., forms) and generate HTML pages in response. Perl and Python are used to write CGI scripts. PHP (PHP: Hypertext Preprocessor): Embedded scripting language withing HTML pages. Servers must understand PHP extensions of HTML. PHP it was designed specifically to work well with Apaches (also is open source) – most widely used Web server. JSP (JavaServer Pages): Similar to PHP except that dynamic part is written in the Java programming language instead of PHP. ASP (Active Server Pages): Microsoft’s version of PHP and JavaServer Pages. It uses Microsoft’s proprietary scripting languages: Visual Basic Script for generating the dynamic content. Steps in processing the information from HTML form. Server-Side Dynamic Web Generation Veton Këpuska

Client-Side Dynamic Web Generation • CGI, PHP, JSP and ASP can not interact directly with the user (e.g., respond to mouse movements). • For this purpose HTML 4.0 was extended to have scripts embedded in HTML pages that are executed on the client machine rather then on the server. It uses the tag <script> for this purpose. • Most popular client side scripting language is JavaScript. Veton Këpuska

Client-Side Dynamic Web Generation • JavaScript: • Inspired by Java programming language, but • Definitely not Java. • Due to its popularity and widespread and speedy adaptation it mutated in numerous versions and there is no current standard to regulate it. • This posses a problem of portability in different platforms. Veton Këpuska

Server & Client-Side Dynamic Web Generation • Server-side scripting with PHP • Client-side scripting with JavaScript Veton Këpuska

Client-Side Dynamic Web Generation • Java Applets: • Another way to make pages highly interactive. • Applets are small java programs that are compiled into machine instructions for a virtual computer called JVM (Java Virtual Machine) • Applets are embedded into HTML using <applet> </applet> tags. • Microsoft’s response to Sun’s Java applets was allowing Web pages to hold ActiveX controls. • ActiveX controls are programs compiled to Pentium machine language and executed on the bare hardware. This feature makes them vastly faster and more flexible than interpreted Java applets. • However downloading and running foreign programs (ActiveX) raises security issues. Veton Këpuska

Summary • Complete pages can be generated on the fly (dynamically) by various scripts on the server machine. • The scripts can be written in Perl, Python, PHP, JSP or ASP as shown in Fig.7-40. • Those pages once they are received from the client side they will be displayed as normal HTML pages. • The content can be also dynamically generated from the client side. • Web pages can be written in XML and them converted to HTML according to an XSL script (residing on the other xsl file). • JavaScript programs can perform arbitrary computations and are embedded into HTML script. • Various plug-ins and helper applications can be used to display content in variety of formats. Veton Këpuska

Wireless Web • Considerable interest in small portable devices capable of accessing the Web via a wireless link. • First wide area wireless Web systems: • WAP (Wireless Application Protocol) • i-mode. Veton Këpuska

WAP-Wireless Application Protocol • Developed and control by consortium initially led by Nokia, Ericsson, Motorola, and Phone.com. • WAP device may be: • Mobile phone, • PDA, or • Notebook computer. • Device will call WAP gateway over the wireless link and send requests for Web pages. Veton Këpuska

Designed to accommodate: Low-bandwidth connections for devices with Slow CPU’s Little memory, and Small screen. WDP (Wireless Datagram Protocol) which essentially is UDP. WML (Wireless Markup Language) is not HTML it is rather an application of XML. The WAP Protocol Stack: WAP-Wireless Application Protocol Veton Këpuska

WAP-Wireless Application Protocol • The WAP architecture Veton Këpuska

Evolution of WWW and Web Browsers: A Historical Overview

Evolution of WWW and Web Browsers: A Historical Overview

Presentation Transcript

Computer Networks

Computer Networks

Computer Networks 2

COMPUTER NETWORKS

Computer Networks 2

Computer Networks

Computer Networks

Chapter 2 Computer Networks

Computer networks

Computer Networks

Computer Networks

Computer Networks 2

Computer Networks 2

Computer Networks 2

Computer Networks 2

Computer Networks 2

Computer Networks

Computer Networks