1 / 97

Basic WWW Technologies

Basic WWW Technologies. Thanks to P. Smyth, Hayes, Mark Sapossnekk, B. Arms. Web and Internet. Focus Infrastructure Standards Languages Structure (crawlers) Access. What Is the World Wide Web?.

breagle
Download Presentation

Basic WWW Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic WWW Technologies Thanks to P. Smyth, Hayes, Mark Sapossnekk, B. Arms.

  2. Web and Internet • Focus • Infrastructure • Standards • Languages • Structure (crawlers) • Access

  3. What Is the World Wide Web? • The world wide web (web) is a network of information resources. The web relies on three mechanisms to make these resources readily available to the widest possible audience: • 1. A uniform naming scheme for locating resources on the web (e.g., URIs). • 2. Protocols, for access to named resources over the web (e.g., HTTP). • 3. Hypertext, for easy navigation among resources (e.g., HTML).

  4. Internet vs. Web • Internet: • Internet is a more general term • Includes physical aspect of underlying networks and mechanisms such as email, FTP, HTTP… • Web: • Associated with information stored on the Internet • Refers to a broader class of networks, i.e. Web of English Literature • Both Internet and web are networks

  5. Networks vs Graphs Examples? http://www.cybergeography.org/

  6. Essential Components of WWW • Resources: • Conceptual mappings to concrete or abstract entities, which do not change in the short term • ex: IST512 website (web pages and other kinds of files) • Resource identifiers (hyperlinks): • Strings of characters represent generalized addresses that may contain instructions for accessing the identified resource • http://clgiles.ist.psu.edu/IST512 is used to identify our course homepage • Transfer protocols: • Conventions that regulate the communication between a browser (web user agent) and a server

  7. Internet Technologies The World Wide Web • A way to access and share information • Technical papers, marketing materials, recipes, ... • A huge network of computers: the Internet • Graphical, not just textual • Information is linked to other information • Application development platform • Shop from home • Provide self-help applications for customers and partners • ...

  8. Internet TechnologiesWWW Architecture • Client/Server, Request/Response architecture • You request a Web page • e.g. http://www.msn.com/default.asp • HTTP request • The Web server responds with data in the form of a Web page • HTTP response • Web page is expressed as HTML • Pages are identified as a Uniform Resource Locator (URL) • Protocol: http • Web server: www.msn.com • Web page: default.asp • Can also provide parameters: ?name=Leon

  9. Internet TechnologiesWeb Standards • Internet Engineering Task Force (IETF) • http://www.ietf.org/ • Founded 1986 • Request For Comments (RFC) at http://www.ietf.org/rfc.html • World Wide Web Consortium (W3C) • http://www.w3.org • Founded 1994 by Tim Berners-Lee • Publishes technical reports and recommendations

  10. Internet TechnologiesWeb Design Principles • Interoperability: Web languages and protocols must be compatible with one another independent of hardware and software. • Evolution: The Web must be able to accommodate future technologies. Encourages simplicity, modularity and extensibility. • Decentralization: Facilitates scalability and robustness.

  11. Languages of the WWW • Markup languages • A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language is in modern use is HTML (Hypertext Markup Language), one of the foundations of the World Wide Web. Historically, markup was (and is) used in the publishing industry in the communication of printed work between authors, editors, and printers.

  12. What is a markup language? • Textual (i.e. person readable) language where significant elements are indicated by markers • <TITLE>XML</TITLE> • Examples are RTF, HTML, XML, TEX etc. • Easy to process and can be manipulated by a variety of application programs

  13. Standard Generalized Markup Language (SGML) • Based on GML (generalized markup language), developed by IBM in the 1960s • An international standard (ISO 8879:1986) defines how descriptive markup should be embedded in a document • Can define any document format of any complexity • Enables, extensibility, structure and validation • Too many optional features for the Web • Gave birth to the extensible markup language (XML), W3C recommendation in 1998

  14. The Purpose of SGML • SGML is designed to make your information last longer than the systems that created it. Such longevity also implies immunity to short-term changes -- such as a change from one application program to another -- so SGML is also inherently designed for re-purposing and portability.

  15. What is SGML? • SGML (and it's derivatives, HTML and XML) are ASCII character based representations of electronic data • Remember, it's all bits--meaning is derived from how they are organized… • Think of SGML docs as strings that must be parsed--A web browser parses an HTML doc and uses the markup codes to display the data contained • Since it's all ASCII, these docs can also be handled by non parsing tools (such as vi, emacs, perl, etc.)

  16. What is SGML? • SGML is: • very large, powerful and complex • been in heavy industrial and commercial use for two decades (ISO standard 1985) • XML is lightweight, cut down version of SGML

  17. SGMLXMLHTML • SGML is the “mother tongue” – but is overkill for most common desktop applications. • XML is an abbreviated version of SGML • easier to define own document types • easier for programmers to write programs to handle documents (and data) • omits all the options (and most of more complex and less-used parts) of SGML) • HTML is just one of many SGML or XML “applications” – most frequently used on the Web

  18. SGML Components • SGML documents have three parts: • Declaration: specifies which characters and delimiters may appear in the application • DTD (document type definition) / style sheet: defines the syntax of markup constructs • Document instance: actual text (with the tag) of the documents • More info could be found: http://www.W3.Org/markup/SGML

  19. Structure of SGML documents • Prolog • SGML Declaration--information about the dialect of SGML used, codes used, delimiters. • Document Type Description (DTD)--external description of the relationship of data elements • Instance • Content • Descriptive Markup • Output Specifications (eg. a style sheet) • DSSSL (Document Style Semantic Specification Language) • FOSI (Formatted Output Specification Instance)

  20. SGML Markup • Looks like HTML (really, HTML looks like SGML, because it is SGML!) • What is done with tagged text determined by applications <anthology><poem><title>The SICK ROSE <stanza> <line>O Rose thou art sick. <line>The invisible worm, <line>That flies in the night <line>In the howling storm: <stanza> <line>Has found out thy bed <line>Of crimson joy: <line>And his dark secret love <line>Does thy life destroy. <poem> <!-- more poems go here --> </anthology>

  21. The DTD • In SGML, documents are given a type, defined in the Document Type Definition • The DTD is just another text file that: • Lists Constituent Parts in a series of Declaration Statements • Insures Consistent Structure • Think of this as being analogous to objects, which have specific properties and values • SGML Documents can be checked against the DTD by a parser

  22. Simple Example of a DTD • "!" marks a Declaration Statement • Elements are named and have Start Tags and End Tags • Declarations consist of three parts: • Type and Name, consisting of other Elements or Reserved keywords (eg. #PCDATA or Element) • Minimization Rules governing tags • Content Model, with Occurrence Indicators (+ ? *) or Group Connectors (, & |) <!ELEMENT anthology - - (poem+)> <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA) >

  23. Using Data:The "Traditional" Model CGI stands for Common Gateway Interface. CGI allows HTML pages to interact with programming applications. Open Database Connectivity (ODBC) is a standard software API for connecting to database management systems (DBMS).

  24. Using Data:The SGML Approach

  25. The Up Side • Data Independence--data structure controlled by use of an open standard • Longevity--structure is determined by DTD, not a monolithic and possibly proprietary application • Flexibility--separation of formatting and content description yields multiple uses by different parsing systems

  26. The Down Side • Strict encoding • DTDs • Lack of SGML Applications

  27. HTML Background • HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. • The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML. • HTML standards are organized by W3C : http://www.w3.org/MarkUp/

  28. HTML Functionalities • HTML gives authors the means to: • Publish online documents with headings, text, tables, lists, photos, etc • Include spread-sheets, video clips, sound clips, and other applications directly in their documents • Link information via hypertext links, at the click of a button • Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc

  29. HTML Versions • HTML 4.01 is a revision of the HTML 4.0 Recommendation first released on 18th December 1997. • HTML 4.01 Specification: • http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt • HTML 4.0 was first released as a W3C Recommendation on 18 December 1997 • HTML 3.2 was W3C's first Recommendation for HTML which represented the consensus on HTML features for 1996 • HTML 2.0 (RFC 1866) was developed by the IETF's HTML Working Group, which set the standard for core HTML features based upon current practice in 1994.

  30. Sample Webpage HTML Structure • <HTML> • <HEAD> • <TITLE>The title of the webpage</TITLE> </HEAD> • <BODY> <P>Body of the webpage • </BODY> • </HTML>

  31. HTML Structure • An HTML document is divided into a head section (here, between <HEAD> and </HEAD>) and a body (here, between <BODY> and </BODY>) • The title of the document appears in the head (along with other information about the document) • The content of the document appears in the body. The body in this example contains just one paragraph, marked up with <P>

  32. HTML Hyperlink • <a href="relations/alumni">alumni</a> • A link is a connection from one Web resource to another • It has two ends, called anchors, and a direction • Starts at the "source" anchor and points to the "destination" anchor, which may be any Web resource (e.g., an image, a video clip, a sound bite, a program, an HTML document)

  33. What is XML? • XML – eXtensible Markup Language • designed to improve the functionality of the Web by providing more flexible and adaptable information and identification • “extensible” because not a fixed format like HTML • a language for describing other languages (a meta-language) • design your own customised markup language

  34. Why use XML? • XML is written in SGML – the Standardized General Markup Language, an international standard (ISO 8879) • XML = very simple dialect of SGML • goal = enable generic SGML to be served, received and processed on the Web in ways not possible with HTML

  35. Why use XML? • XML is not just for Web pages • use to store any kind of structured document • to enclose/encapsulate information in order to pass it between different computing systems that are otherwise unable to communicate

  36. Key feature of XML • An application is free to use XML tagged data in many different ways, e.g. • produce an image • generate a formatted text listing • display the XML document’s markup in pretty colors • restructure the data into a format for storing in a database, transmission over a network, input to another program.

  37. XML is important because... • Removes 2 constraints that held back Web development: • dependence on a single, inflexible document type (HTML) [much abused] • reduced the complexity of full SGML [many options but hard to program]

  38. XML… allows the flexible development of user-defined document types. • provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web

  39. XML Software? • hundreds (probably thousands) of programs are “XML ready” already today. • xml.coverpages.org covers news of new additions to XML

  40. Is XML a Computer Language? • XML is not C or C++ or like any other programming language • By itself, it cannot specify calculations, actions, decisions to be carried out in any order • XML is a markup specification language

  41. XML - a Markup Language • with XML, you can design ways of describing information (text or data), usually for storage, transmission or processing by a program • XML conveys no information about what should be done with the data or text – it merely describes it. • By itself, XML does anything – it is a data description format

  42. How do I run or execute an XML file? • You can’t and you don’t ! • XML is not a programming language • XML is a markup specification language • XML files are just data (waiting for a program to do something with them) • XML files can be viewed with an XML editor or XML-compatible browser

  43. Things to Remember • XML does not replace HTML – it provides an alternative which allows you to define your own set of markup elements to a published standard: • <?xml version="1.0" standalone="yes"?> • <conversation> • <greeting>Hello, world!</greeting> • <response>Stop the planet, I want to get off!</response> • </conversation>

  44. Things to Remember • All parts of an XML document are case sEnSiTiVe • Element type names are case sensitive, so <BODY> …</b ody> is out. • Attribute names are case sensitive … • <PIC width=“7cm”/> and • <PIC WIDTH=“6cm”/> • describe different attributes, not just different values for the attribute “PIC width”.

  45. What is XQuery? • XQuery is the language for querying XML data • The best way to explain XQuery is to say that • XQuery is to XML what SQL is to database • tables. • XQuery uses XPath expressions to extract XML data. • XPath is a language for finding information in an XML document. • XPath is used to navigate through elements and attributes in an XML document. • XQuery is defined by the W3C. • XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.) • XQuery 1.0 is not yet a W3C Recommendation (XQuery is a Working Draft). Hopefully it will be a recommendation in the near future.

  46. Resource Identifiers • URI: Uniform Resource Identifiers • URL: Uniform Resource Locators • URN: Uniform Resource Names • Legacy, not used • Ex urn://isbn:4322347

  47. Ping – TCP/IPIP discovery • PING - Packet Internet Groper; a utility used to determine whether a particular computer is currently connected to the Internet. It works by sending a packet to the specified IP address and waiting for a reply.

  48. Ping (Packet Internet Groper) ping command

  49. Introduction to URIs • Every resource available on the Web has an address that may be encoded by a URI • URIs typically consist of three pieces: • The naming scheme of the mechanism used to access the resource. (HTTP, FTP) • The name of the machine hosting the resource • The name of the resource itself, given as a path

More Related