programming for www ice 1338 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Programming for WWW (ICE 1338) PowerPoint Presentation
Download Presentation
Programming for WWW (ICE 1338)

Loading in 2 Seconds...

play fullscreen
1 / 33

Programming for WWW (ICE 1338) - PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on

Programming for WWW (ICE 1338). Lecture #4 July 2, 2004 In-Young Ko iko .AT. i cu . ac.kr Information and Communications University (ICU). Announcements. Our TA Name: Mr. Trinh Minh Cuong Email: minhcuong .AT. icu.ac.kr Office: F641 Office Hours: Tuesday 11-12PM, Thursday 2-4PM

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Programming for WWW (ICE 1338)' - jonco


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
programming for www ice 1338

Programming for WWW(ICE 1338)

Lecture #4July 2, 2004In-Young Koiko .AT. icu.ac.krInformation and Communications University (ICU)

announcements
Announcements
  • Our TA
    • Name: Mr. Trinh Minh Cuong
    • Email: minhcuong .AT. icu.ac.kr
    • Office: F641
    • Office Hours: Tuesday 11-12PM, Thursday 2-4PM
  • Please send the instructor your team information
  • Please send the instructor your information for creating a Unix account
  • Submit your homework#1 (a URL or HTML source) by tomorrow

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

review of the previous lecture
Review of the Previous Lecture
  • Cascading Style Sheet
  • Web-based Information Integration
    • Examples
    • Information Mediators
    • Information Wrappers (Web Wrappers)

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

contents of today s lecture
Contents of Today’s Lecture
  • Basic UNIX Commands
  • More on Web-based Information Integration
  • JavaScript

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

unix operating system
UNIX Operating System
  • A multi-user, multi-tasking operating system
  • Developed by Ken Thompson and Dennis Ritchie at the Bell Lab in early 70’s
  • Success factors of UNIX
    • Written in a high-level language (C language) – improving readability and portability
    • Support of primitives (system calls) – permitting complex programs to be built efficiently
    • A hierarchical file system – easy maintenance
    • Hiding the machine architecture from the user – allowing programs to be run on different machines
  • http://www.unix-systems.org/

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

architecture of unix systems
Architecture of UNIX Systems

Other application programs

sh

who

nroff

a.out

cpp

Kernel

date

comp

Hardware

cc

we

as

ld

grep

vi

ed

Other application programs

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

basic unix shell commands
Basic UNIX Shell Commands
  • cd - Changes directories to the one named
  • pwd - Displays the current working directory
  • ls - Lists the contents of the current directory
  • ls -l - Same as above, but it lists with more information
  • mkdir - Make a directory
  • rmdir - Remove a directory
  • cat - Concatenate or show a files contents
  • cp - Copy a file
  • mv - Rename or move a file to a different name or directory
  • rm - Remove a file
  • logout - Terminates a Unix Shell session
  • man - Access manual pages

http://infohost.nmt.edu/tcc/help/unix/unix_cmd.html

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

publishing web pages on the server
Publishing Web Pages on the Server
  • Copy your files to the ‘public_html’ directory under your home directory in the server
  • Use FTP to copy your files in a local directory to the server directory

ftp vega.icu.ac.kr (login with your user ID)

cd public_html

lcd d:\myweb

put index.html (mput *.html)

quit

  • Your homepage is now accessible from

http://vega.icu.ac.kr/~yourid

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

connections between web clients and servers
Connections Between Web Clients and Servers

A Web Server

A Web Browser

Listen

Connect

Accept

Write

80

Process

Read

Return

A Web server is a daemon process that executes in the background waiting for some event to occur

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

sockets
Sockets

A Web Server

  • A socket is an end point for communication between two machines
  • A socket is an association of a protocol, address and process to an end point of communication

A Web Browser

Listen

Connect

Accept

Write

80

Process

Read

Return

Sockets

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

accessing web contents from java programs via sockets
Accessing Web Contents from Java Programs via Sockets

import java.net.*;

import java.io.*;

Socket sk = new Socket(www.icu.ac.kr, 80);

OutputStream os = sk.getOutputStream();

PrintWriter pw = new PrintWriter(os);

pw.println("GET /index.html");

pw.println();

pw.flush();

InputStream is = sk.getInputStream();

InputStreamReader ips = new InputStreamReader(is);

BufferedReader in = new BufferedReader(ips);

String line;

while ((line=in.readLine()) != null) {

System.out.println(line);

}

Socket Creation

Write Request

Read Results

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

accessing web contents from java programs via url connections
Accessing Web Contents from Java Programs via URL Connections

import java.net.*;

import java.io.*;

URL url = new URL(“http://www.icu.ac.kr”);

URLConnection urlc = url.openConnection();

InputStream is = urlc.getInputStream();

InputStreamReader ips = new InputStreamReader(is);

BufferedReader in = new BufferedReader(ips);

String line;

while ((line=in.readLine()) != null) {

System.out.println(line);

}

URL Object Creation

URL Connection Creation

Read Results

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

java string manipulation methods for result parsing
Java String Manipulation Methods for Result Parsing
  • int indexOf(String str, int fromIndex)
  • int lastIndexOf(String str, int fromIndex)
  • boolean startsWith(String prefix)
  • boolean endsWith(String suffix)
  • boolean matches(String regex)
  • String[] split(String regex)
  • String substring(int begineIndex, int endIndex)
  • String toLowerCase()
  • String toUpperCase()

http://java.sun.com/j2se/1.4.2/docs/api/index.html

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

web wrapper for naver com
Web Wrapper for Naver.com

URL

Summary

Title

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

result parsing strategies
Result Parsing Strategies
  • Structure-based Parsing
    • Analyzes Web pages based on tag hierarchies
    • Cannot be used for ill-formed HTML documents
  • Pattern-based Parsing
    • Search for a unique string pattern to locate a result item
    • Needs to identify such unique string patterns first

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

structure based result parsing
Structure-based Result Parsing

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

pattern based result parsing
Pattern-based Result Parsing
  • Find out a unique pattern to locate a result item
    • e.g., “<tr><td><font” in the Naver result pages
  • Find the prefix and suffix patterns to extract an information piece (e.g., URL, title, summary) from the result item
    • e.g., “a href=” to extract a URL from a result line

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

java implementation of web wrapper
Java Implementation of Web Wrapper

public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) {

try {

String address = "http://" + host + path + "?where=webkr" + "&query=" + query +

"&start=" + startIndex + "1" + “&display=" + pageSize;

URL url = new URL(address);

URLConnection urlc = url.openConnection();

urlc.setRequestProperty("Accept", "*/*");

urlc.setRequestProperty("User-Agent", "Mozilla/4.0");

InputStream is = urlc.getInputStream();

InputStreamReader ips = new InputStreamReader(is);

BufferedReader in = new BufferedReader(ips);

String line;

while ((line=in.readLine()) != null) {

//

System.out.println(line);

//

}

} catch(Exception e) {

e.printStackTrace();

}

}

Query Translation

Parsing Results

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

web robots
Web Robots
  • A Web robot is a program (agent) that collects information while following all the links on a Web page
  • Web Robots = Crawlers = Spiders
  • Web search engines use Web robots to collect and index Web documents
    • A tag to tell Web robots not to index a page: <metaname=“robots" content=“noindex,nofollow”/>
  • Crawling methods:
    • Breadth-first crawling
    • Depth-first crawling

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

breadth first crawlers
Breadth First Crawlers

http://ibook.ics.uci.edu/Slides/39

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

depth first crawlers
Depth First Crawlers

http://ibook.ics.uci.edu/Slides/39

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

web based information management applications example scenario
Web-based Information Management Applications (Example Scenario)

Identify Recurring Disaster Areas in China, e.g. Locations of Floods

Cross-product between place names and the disaster-type categories

An Web document collection about ‘China disasters’

Classify documents based on the disaster types mentioned

For each map layer displayed, get the set of place names and classify the documents based on the place names

Plot the document clusters on the map to figure out the major flooding areas

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

web based information management applications example app design
Web-based Information Management Applications (Example App. Design)

: Sequential connection

: Pipelined connection

Keyword Editor

Keyword Extractor

Product Categories

Mapping Clusters

Search Engines

Place Name Extractor

Place Name Generator

Pipelined components

Generate multiple sets of place names

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

problems in composing large scale information management applications
Problems in Composing Large-scale Information Management Applications
  • Time-consuming to explore and test a large number of options
    • Hard to choose appropriate services for collections
    • Hard to quickly substitute and test a service within a sequence of steps
  • Difficulties of capturing and reusing shared patterns of information management steps
    • Difficult to record and recurrently perform information management steps
    • Necessity of extracting abstract patterns of information management steps and reusing them
    • Hard to cope with dynamic aspects of Web resources

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

characteristics of large scale information management tasks
Characteristics of Large-scale Information Management Tasks
  • Incremental development of information management steps for an abstract task goal
  • Recurrent executions of the steps
  • Evolving requirements of users
  • Shared patterns of management steps
  • Collection-based information processing
  • Dynamic aspects of information sources and services
  • Large and growing number of component services

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

improvement goals
Improvement Goals
  • Significantly reduce construction time, keeping costs low
  • Enable very rapid construction/adaptation of new applications
  • Provide static and run-time diagnostic tools, facilitating debugging and performance tuning tasks

Rapid Composition and Reconfiguration of Large-scale Custom Applications

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

javascript
JavaScript
  • The goal of JavaScript is to provide programming capability at both the client and server ends of a Web connection
  • Originally developed by Netscape, as LiveScript
  • Became a joint venture of Netscape and Sun in1995, renamed JavaScript
  • Now standardized by the European Computer Manufacturers Association as ECMA-262(also ISO 16262)
  • User interactions with HTML documents inJavaScript use the event-driven model ofcomputation

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

a popup window
A Popup Window

<html>

<head><title>ICE1338</title>

<style type = "text/css">

<!--

p { font-size: 12pt; color: blue; background-color: yellow }

h2, h3 { font-size: 16pt; color: red; font-style: oblique }

-->

</style>

<script language = "JavaScript">

function displayDate() {

alert("Today's date is: " +

new Date() + "!!");

}

</script>

</head>

<body onLoad="displayDate()">

<br/>

<h2>Programming for WWW</h2>

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

javascript vs java
JavaScript vs. Java
  • Both share similar syntax
  • JavaScript is a scripting language, not a programming language
  • JavaScript is an interpreter-based language
  • JavaScript is dynamically typed
  • JavaScript does not support class-based inheritance
  • JavaScripts are usually embedded in HTML documents

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

general syntax of javascript
General Syntax of JavaScript
  • Direct embedding of a JavaScript code:

<script language = "JavaScript">

-- JavaScript script –

</script>

  • Indirect JavaScript specification:

<script language = "JavaScript" src = "myScript.js“/>

  • Identifier form: begin with a letter or underscore,followed by any number of letters, underscores, and digits
  • Case sensitive
  • 25 reserved words, plus future reserved words
  • Comments: both // and /* … */

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

document object model html
Document Object Model HTML

“A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents”

<html>

<head>

<title>My Document</title>

</head>

<body>

<h1>Header</h1>

<p>Paragraph</p>

</body>

</html>

http://www.mozilla.org/docs/dom/technote/intro/

var header = document.getElementsByTagName("H1").item(0);

header.firstChild.data = "A dynamic document";

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

dom specification
DOM Specification
  • http://www.w3.org/TR/DOM-Level-2-HTML/html.html
  • e.g.,

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University

screen outputs
Screen Outputs
  • The model for the browser display window is the Window object
    • Properties:
      • window.document
      • window.screenLeft
      • window.screenTop
    • Methods:
      • alert:
      • confirm
      • prompt

http://devedge.netscape.com/central/javascript/

Programming for WWW (Lecture#4) In-Young Ko, Information Communications University