Introduction to web science
1 / 65

Introduction to Web Science - PowerPoint PPT Presentation

  • Uploaded on

Introduction to Web Science. Web 1.0. Introducing Web 1.0. Packet switching network IP Addressing Internet Applications The WWW and markup Searching the WWW Intelligent Agents Internet Governance. Packet-Switched Networks (1). Local area network (LAN)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Introduction to Web Science' - kevork

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introducing web 1 0
Introducing Web 1.0

  • Packet switching network

  • IP Addressing

  • Internet Applications

  • The WWW and markup

  • Searching the WWW

  • Intelligent Agents

  • Internet Governance

Packet switched networks 1
Packet-Switched Networks (1)

  • Local area network (LAN)

    • Network of computers located close together

  • Wide area networks (WANs)

    • Networks of computers connected over greater distances

  • Circuit

    • Combination of telephone lines and closed switches that connect them to each other

Packet switched networks 2
Packet-Switched Networks (2)

  • Circuit switching is used in telephone communication

  • The Internet uses packet switching

  • Packet switching needs computers called ‘routers’ and the programs called ‘routing algorithms’

Packet switched networks 3
Packet-Switched Networks (3)

  • Information is divided into packets

  • It is passed from node to node

  • It is recomposed as one chunk on the destination server

Routing packets
Routing Packets

  • Routing computers

    • Computers that decide how best to forward packets

  • Routing algorithms

    • Rules contained in programs on router computers that determine the best path on which to send packets

    • Programs apply their routing algorithms to information they have stored in routing tables

Tcp ip

  • Communications protocol suite

    • Packet switched protocol

      • No end-to-end connection is required

      • Each message broken down into small pieces called packets

      • Packets possibly routed to destination over different paths

    • Transmission Control Protocol (TCP)

      • Breaks messages into packets

      • Numbers packets in order

      • Reorders packets at the destination

    • Internet Protocol (IP)

      • Routes packets to the proper destination

Open systems interconnections model
Open Systems Interconnections Model

OSI Model (also called TCP/IP protocol suite) layers (from the highest to the lowest):

Ip address
IP Address

  • Internet addresses are based on a 32-bit number called an IP address

  • IP addresses appear as a series of up to four separate numbers delineated by a period

  • An address such as uniquely identifies a computer connected to the Internet

  • IP Subnettingconceptually divides a large network into smaller sub-networks

Without subnetting
Without subnetting …

  • Explosion in size of IP routing tables.

  • Every time more address space was needed, the administrator would have to apply for a new block of addresses.

  • Any changes to the internal structure of a company's network would potentially affect devices and sites outside the organization.

  • Keeping track of all those different Class C networks would be a bit of a headache in its own right.

Benefits of subnetting
Benefits of Subnetting

  • Better Match to Physical Network Structure

  • Flexibility

  • Invisibility To Public Internet

  • No Need To Request New IP Addresses

  • No Routing Table Entry Proliferation

Ip vr6 or ip next generation
IP Vr6 (or IP Next Generation)

  • Network Layer

  • Developed in 1994

  • Will replace the IP Vr4 standard

    • limits on network addresses will eventually lead to exhaustion of available addresses (by 2023)

    • supports only 4,294,967,296 addresses (32bits)

  • Improvements include

    • providing future cell phones and mobile devices their own unique & permanent addresses

    • supports about 3.4 × 1038 (128bits)

Domain names
Domain Names

  • A Uniform Resource Locator (URL) consists of names and abbreviations that are much easier to remember than IP addresses

  • The HTTP protocol defines how an Internet resource is accessed

  • An address such as is called a domain name

  • Domain Name System (DNS)

    • A database of Internet names

    • DNS Servers convert Internet names to IP addresses

    • Top level domains

Top level domain names
Top-Level Domain Names

  • Internet Corporation for Assigned Names and Numbers (ICANN)

    • Responsible for managing domain names and coordinating them with IP address registrars

Domain name case study
Domain Name case study

  • The web was not an ‘open’ place

  • One company available where you could buy a .com, .net or .org domain

  • Price of 100 dollars and a two year minimum

  • Back then, there was a big chance you would be able to buy a dictionary word as .com

  • In 2000, they lost the monopoly position and domain prices dropped over 95%

  • Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars

Internet applications
Internet Applications

  • E-Mail

  • File transfers

  • Instant messaging (IM)

  • Newsgroups

  • Streaming audio and video

  • Internet telephony

  • World Wide Web (WWW)

E mail

  • Most popular and widely used Internet application

  • 30 billion e-mails sent every day

    • Spam – junk e-mail messages

    • Spam costs corporate America $9 billion per year

  • Every e-mail message contains head that describes source and destination for the message

  • E-mail messages are text, but may have attachments of many types of digital data

    • Viruses often transmitted via e-mail

Smtp pop and imap 1
SMTP, POP, and IMAP (1)

  • E-mail is sent across the Internet is managed and stored by mail servers

  • Simple Mail Transfer Protocol (SMTP) is the standard to send mails to the server

  • Post Office Protocol (POP) is the standard to get mails from the server

  • The Interactive Mail Access Protocol (IMAP) is a newer e-mail protocol

Controlling spam
Controlling Spam

  • Use complex email addresses rather than name and surname combination

    • Why? Bots? Name Directories?

  • Control exposure of email address

    • How? Java script? JPEG?

  • Use multiple email addresses for different purposes

    • In what occasions?

  • Use content-filtering software

    • black list spam filter 

    • white list spam filter 

    • challenge response using graphical challenges ?

E mail case study
E-Mail Case Study

  • Hotmail (1995)

  • First place to get a free email address, disconnected from an ISP

  • 4 years later, 30 million people worldwide were exchanging @hotmail email addresses

  • Bought by Microsoft in 1998 for just 400 million dollars

  • 2007 the end of Hotmail

    • transformation to “Live” mail to become an integrated part of the Microsoft’s “Live” family

File transfers
File Transfers

  • File transfer protocol (FTP)

    • Protocol providing for transmission of a file between an Internet server and a user’s computer

  • Peer-to-peer (P2P) file sharing

    • Share data from one computer to another

    • Every user can be a server

    • Napster

      • Kazaa

      • Gnutella

      • Torrent

    • With P2P, every user on the network can make data available to every other user on the network

Instant messaging
Instant Messaging

  • Allows user to create a private chat session with another user

  • IM started with AOL

  • IM sneaking into corporate networks

  • Many Web-based companies use IM technology for customer service

    • eBay

Icq case study
ICQ case study

  • ICQ abbreviation of “I seek you”

  • 1996 first easy to use instant messenger program where you could add friends to your list, and see if they were online

  • Back then it was revolutionary for the masses and it became the ‘application’ everybody had installed

  • Acquired by AOL in June 1998 for a whopping $287 million  

  • Eventually the program got too many additional features that made the application heavy and unorganized

  • Competition of AOL IM, Yahoo IM, and MSN Messenger increased, and friends on your ICQ-list left the application eventually resulting in a mass abandoning of the network

Usenet newsgroups
Usenet Newsgroups

  • Online, bulletin board discussion forums

  • Users post and read messages

  • More than 100,000 newsgroups

  • Millions of newsgroup readers

  • Important information resource, especially for technical issues and products

  • Newsgroup messages distributed using open standard

    • Many are uncensored

Streaming audio and video
Streaming Audio and Video

  • Creating and sending audio and video files

    • Sports

      • Basketball at

      • Major league baseball

    • News

      • Fox News

      • CNN radio

    • Business

      • ZDNet

    • Education

      • Warriors of the Net

Internet telephony
Internet Telephony

  • Voice-over Internet Protocol (VoIP)

  • Use your computer like a telephone

  • Software connects computers via the Internet and transmits voice data

  • Savings comes from eliminating toll charges between locations

The world wide web
The World Wide Web

  • Collection of hyperlinked computer files on the Internet

  • Client-server application

    • Web servers

    • Web browsers as clients

  • WWW standards

    • Hypertext markup language (HTML)

      • Current standard for writing Web pages

      • Tags in HTML instruct the client browser how to format and display the Web page content

    • Hypertext transfer protocol (HTTP)

      • Establishes a connection between Web server and client

    • Extensible markup language (XML)

      • A meta-markup language

      • Gives meaning to the data enclosed within XML tags

Website case study
Website case study

  • Create your own free homepage on the web

  • 1997 Fifth most popular website, with over 500,000 homepages created

  • Yahoo bought Geocities two years later for $3.57 billion dollars and started to actively commercialize the homepages with various advertising types that resulted in their death sentence

  • ‘Real’ web hosting becoming affordable for anybody, the need for free homepages in this form vanished

Overview of markup languages
Overview of Markup Languages

  • SGML is a rich meta language that is useful for defining markup languages

  • HTML is particularly useful for displaying Web pages

  • XML defines data structures for electronic commerce (and much more …)

Development of markup languages

Development of Markup Languages

Standard generalized markup language
Standard Generalized Markup Language

  • The ISO adopted SGML standard in 1986

  • SGML is nonproprietary and platform-independent

  • SGML supports user-defined tags and architecture to complement the required richness of documents

Extensible markup language
Extensible Markup Language

  • XML is a descendant of SGML

  • XML allows designers to easily describe and deliver structured data from any application in a standard, consistent way

  • XML can be embedded within an HTML document

  • XML allows you to create your own customized markup language.

Learn xml in a slide
Learn XML in a slide

  • Tag – a piece of Markup

    • An opening tag <name>

    • A closing tag </name>

  • Element – well formed usage of tags

    • <name>Alexiei</name>

  • Attribute – properties

    • <name length=“7”>Alexiei</name>

  • Rules to keep XML well formed

    • Can be nested but not overlapping

    • Case sensitivity

    • Quoted attributes

    • Required end tag

  • Short hand

    • <abc></abc> is equivalent to <abc/>

Some xml examples
Some XML examples


<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”>







Some xml examples1
Some XML examples


<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”>







Processing a request for an xml page
Processing a Request for an XML Page

  • Why going through all this hassle?

  • How would you go about displaying HTML on a

    • PC

    • Handheld

    • Mobile

Hypertext markup language
Hypertext Markup Language

  • Tim Berners-Lee invented HTML

  • HTML is a document production language that includes a set of tags that define the format and style of a document

  • HTML is based on SGML

  • HTML is an instance of one particular SGML document type – Document Type Definition (DTD)

Html tags

  • An HTML document contains both document content and tags

  • The tags are the HTML codes inserted in a document to specify the format on screen

  • Each tag is enclosed in brackets (< >)

  • Most tags are two-sided – opening and closing tags

  • Well formed tags, bots, meta tags?? Why are they important?

Html links
HTML Links

  • Hyperlinks are bits of text that connect the current document to:

    • Another location in the same document

    • Another document on the same host machine

    • Another document on the Internet

    • Can they link to a toaster at home?

  • Hyperlinks are created using the HTML anchor tag

  • Two popular link structures:

    • Linear hyperlink structure

    • Hierarchical hyperlink structure

Html version history
HTML Version History

  • HTML version 1.0 was introduced in 1991

  • HTML 2.0 was released in Sept. 1995

  • HTML 3.2 was introduced in 1997

  • HTML 4.0 was released by W3C in Dec 1997

  • HTML 4.01 was released in Dec 1999

  • XHTML 1.0 became a W3C recommendation in Jan 2000

Html editors 1
HTML Editors (1)

  • Low end editor displays HTML code on the screen and allow you to insert HTML tag pairs by clicking selected buttons

  • High end editor are Web site builder programs, they provide a rich environment that displays the Web page, not the HTML code

  • Microsoft FrontPage and Macromedia Dreamweaver are examples of Web site builders

Static versus dynamic pages
Static versus Dynamic Pages

  • HTML and XML only display and exchange data

  • No interactivity; no processing of data

  • Scripting languages

    • Provides basic interactivity

      • Rollovers

      • Crawling text

    • JavaScript

    • VBScript

  • Full-featured Web programming

    • Java

    • Client side scripting or browser side scripting

    • Applets

    • J2EE

  • Common Gateway Interface (CGI)

    • Allows passing of data between a static HTML page and a computer program

Searching the www
Searching the WWW

  • Most data on the Internet is part of the WWW

  • Search engines – large databases that index WWW content

  • Building the search engine database

    • Submit a site to the search engine administrator for listing

    • Spiders

      • Metatags

    • Google

    • Yahoo

Search engines
Search Engines

  • A search engine is a special kind of Web page software that finds other Web pages that match a word or phrase you entered

  • A Web directory is a listing of hyperlinks to Web pages that is organized into hierarchical categories Eg:

  • Search engines contain three major parts: spider, index, and utility

Search engine case study
Search Engine case study

  • Search engine AltaVista was the Google of the last millennium

  • First real effort to index the World Wide Web

  • One of the few search engines that actually came up with good search results

  • Had a hard time fighting spam listings in their results

  • While spam grew logarithmic in Altavista, some company named Google found a way to prioritize web pages more intelligently, and thus keep spam out better

Case study s pagerank
Case Study: ’s PageRank

  • PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value

  • Google interprets a link from page A to page B as a vote, by page A, for page B

  • But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote

  • Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Intelligent agents
Intelligent Agents

  • An intelligent agent is a program that performs functions such as

    • information gathering,

    • information filtering,

    • mediation running,

    • in the background on behalf of a person or entity

  • What agents can you think of?

Intelligent agents 2
Intelligent Agents (2)

  • Search Agents

    • Improve your information retrieval on the Internet

    • Used to find pages on the Web easily and quickly

      • Meta Agents, Specialised (MP3), etc

  • Web Agents

    • Improve browsing experience

      • Automate form filling, off-line browsing, etc

  • Monitoring Agents

    • Monitor web sites or specific themes

    • Used to get automatic alerts about the latest news

Intelligent agents 3
Intelligent Agents (3)

  • Virtual Assistants

    • Artificial life

    • Characters, plants, animals or people living on your desktop

  • Shop Bots

    • Allow users to compare prices on the Internet

    • Find the best price for books, CDs, movies, etc.

  • Webmastering Agents

    • Make it easy to manage a Web site and make it more effective

    • Monitor broken links, content gathering etc.

Intelligent agents 4
Intelligent Agents (4)

  • Other agents …

    • Development agents

      • Used to develop other agents

    • Games agents

      • Used in games

Ms dewey not your ordinary search agent
Ms Dewey not your ordinary search agent!

Internet governance
Internet Governance

  • Internet Engineering Task Force (IETF)

    • Works in groups to develop standards

  • Internet Engineering Steering Group (IESG)

    • Approves or disapproves standards developed by the IETF

  • Internet Architecture Board (IAB)

    • The oversight authority for the standards development process

  • World Wide Web Consortium (W3C)

    • Promotes the WWW and develops new web technologies and standards


  • We’re all very familiar with Web 1.0

  • But what makes Web 2.0?

  • Next lecture …