cs246 web information systems l.
Skip this Video
Loading SlideShow in 5 Seconds..
CS246: Web Information Systems PowerPoint Presentation
Download Presentation
CS246: Web Information Systems

Loading in 2 Seconds...

play fullscreen
1 / 25

CS246: Web Information Systems - PowerPoint PPT Presentation

  • Uploaded on

CS246: Web Information Systems. Junghoo “ John ” Cho Spring 2013. Course Information. Web page: http://oak.cs.ucla.edu/cs246/ Topic: Web information management Time: MW 2 :00 -- 3:50 p m Place: Boelter Hall 5264 Instructor: Junghoo “ John ” Cho office: 3732J Boelter Hall

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

CS246: Web Information Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs246 web information systems

CS246: Web Information Systems

Junghoo “John” Cho

Spring 2013

CS246 by John Cho

course information
Course Information
  • Web page: http://oak.cs.ucla.edu/cs246/
  • Topic: Web information management
  • Time: MW 2:00 -- 3:50 pm
  • Place: Boelter Hall 5264
  • Instructor: Junghoo “John” Cho
    • office: 3732J Boelter Hall
    • email: cho@cs.ucla.edu
      • please use subject “CS246: …”
    • office hours: Mon 10:30-11:30 am.

CS246 by John Cho

who is this class for
Who is this class for?

Strong interest in research

Interest in Web information systems

Time commitment:

Around 2-3 papers every week

Typically one full day of paper reading

One indepedent project

Similar to paper writing

In fact we read papers from past student projects!

Or interesting application implementation

CS246 by John Cho


today s topics
Today’s Topics
  • Overview of the course topics
  • Course logistics
    • Paper reading assignments
    • Class project

CS246 by John Cho

  • Introductory database, e.g., CS143
    • e.g.: query? SQL?
  • Basic algorithms and data structures
  • Basic probability and statistics
    • P(A|C), Bayes rule, …
  • Design and implementation experience
    • Basic C++
  • Quick test: Grab a sample paper
    • See if you can read, understand and build it

CS246 by John Cho

tell us about you
Tell Us About You
  • Name
  • Department & Program
  • Before coming to UCLA
  • Brief history at UCLA
  • Technical/research interests
  • Expectation from the class

CS246 by John Cho

information galore
Information Galore

Biblio sever

Legacy database

Plain text files

CS246 by John Cho

central problem
Central Problem
  • How to manage/access information on the Web?
  • Three major approaches
    • Central indexing
      • E.g., Web search engine
    • Dynamic integration
      • E.g., comparison shopping services
    • Data extraction
      • E.g., spamming companies

CS246 by John Cho

topic web search central indexing
Topic: Web Search (Central Indexing)

Central Index

CS246 by John Cho

topic web search central indexing10
Topic: Web Search (Central Indexing)
  • Web: collection of passive HTML pages
    • Find Web pages relevant to a query
  • Traditional Information Retrieval:
    • Web = collection of HTML pages
    • HTML page = a bag of words
  • More than that?
    • Links, structure of the Web
    • User access patterns
    • HTML tags (markups)

CS246 by John Cho

topic dynamic integration
Topic: Dynamic Integration





CS246 by John Cho

topic dynamic integration12
Topic: Dynamic Integration





Source 1

Source 2

Source n

CS246 by John Cho

topic data extraction
Topic: Data Extraction

Structured data


Beatles $10

Madonna $20

NSync $20

  • How can we extract “structured data” from free text automatically?

CS246 by John Cho

main course workload
Main Course Workload
  • Paper reading
      • Paper reading assignments
      • Class discussion
      • We mainly focus on “central indexing”
  • Independent projects

CS246 by John Cho

high level goal
High-Level Goal
  • Learn core ideas and techniques
    • Some of the techniques can be useful for other fields
  • Learn how to read papers
  • Hopefully learn what it is like to do research
    • Sometimes very frustrating but often very rewarding

CS246 by John Cho

paper reading
Paper Reading
  • Why:
    • Something that you will do all the time as a researcher
    • Learn to be critical and communicate well
    • Acquire knowledge to conduct research/project
  • About 20 papers from
    • Conferences: SIGMOD, VLDB, WWW, and …
  • Before the class:
    • Everyone: read and review the paper
  • During the class:
    • Instructor: present his own understanding and lead class discussion
    • Everyone: participate!!!

CS246 by John Cho

how to get papers
How to Get Papers
  • From the class homepage
    • http://oak.cs.ucla.edu/cs246/
  • Some of the materials password protected
    • User name: cs246
    • Password: papers
  • Let me know if any problem

CS246 by John Cho

how to read papers
How to Read Papers
  • Understand the “Big Picture”
  • What is the problem?
  • Why is it important?
  • Why is it difficult?
  • What has this paper done?
  • What others have done?

CS246 by John Cho

paper reviews 1
Paper Reviews (1)
  • Due by the preceding Sunday
    • Submit through our Web submission interface on the class Web page
  • Required components: at most 3 paragraph
    • Summary (1 paragraph): your own words

This paper discusses how to optimize queries with...

    • Comments/criticisms (1-2 paragraphs): the good & the bad

It addresses a real problem and the solution is interesting …

But I feel the experiments are not realistic because...

  • Optional: questions, as many as you want

Why the authors assume that queries are independent?

CS246 by John Cho

paper reviews 2
Paper Reviews (2)
  • May skip 3 paper summaries without penalty
  • Most reviews will get full score unless they are written extremely poorly
    • 10% Excellent
    • 80% Good
    • 10% Poor

CS246 by John Cho

class project
Class Project
  • Why:
    • Work on a specific problem and learn to find a solution
  • 40% of the class
  • Team of up to 3
  • Topic: any problem related to the general problem
  • Open style
    • Rigorous study of a research problem or
    • Any interesting system implementation

CS246 by John Cho

class project schedule
Class Project Schedule
  • Important Milestones
    • Group formation: 4/10 (2nd week Wed)
    • Project proposal: 4/21 (3rd week Sun)
    • Project progress: 5/08 (6th week Wed)
    • Final report: 5/22 (8th week Sun)
    • Project presentation: 9th and 10th weeks
  • You are responsible to stay on track
  • Make appointments with instructor as needed

CS246 by John Cho

project please remember
Project: Please Remember
  • Put your aims high and be realistic
  • Expect to read at least 4-5 papers along the way
  • Start early
    • Don’t do it right before the deadline
    • Always unexpected obstacles
    • Most students could not finish in previous quarters
      • Please, please start early
  • You are responsible to be on track

CS246 by John Cho

  • Midterm: 40%
  • Paper reviews: 20%
  • Project: 40%

CS246 by John Cho

  • First review due Sunday 4/07
    • Three papers for class 3 and 4
      • The Anatomy of a Large-Scale Hypertextual …
      • Authoratative sources in a hyperlinked environment
      • Indexing by Latent Semantic Analysis

CS246 by John Cho