From memex to google in 120 minutes
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

From Memex to Google in 120 minutes PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

From Memex to Google in 120 minutes. Rivka Taub Amit Levin. “As We May think” By Vannevar Bush A Paper that talks about the Future. Vannevar- Bush: Biography. Vannevar-Bush (1890-1974). Vannevar- Bush: Biography. Vannevar-Bush (1890-1974). * Was Born in Massachusetts

Download Presentation

From Memex to Google in 120 minutes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


From memex to google in 120 minutes

From Memex to Google in 120 minutes

Rivka Taub

Amit Levin


From memex to google in 120 minutes

“As We May think”

By Vannevar Bush

A Paper that talks about the Future


From memex to google in 120 minutes

Vannevar-

Bush:

Biography

Vannevar-Bush (1890-1974)


From memex to google in 120 minutes

Vannevar-

Bush:

Biography

Vannevar-Bush (1890-1974)

* Was Born in Massachusetts

* Studied engineering in Tuft college

* Earned his bachelor and master degree in 1913

* Earned his doctorate of engineering at 1917


From memex to google in 120 minutes

Vannevar-

Bush:

Biography

Vannevar-Bush (1890-1974)

* In 1919, Bush joined MIT’s electrical engineering department,

and had stayed there for 25 years.

* Completed the differential analyzer in 1931

* During the 1930s, worked on technology for document retrieval

and information organization (used microfilm)

* In 1938, designed and built the microfilm rapid selector,

rumored to have been used for cryptanalysis during WWII


From memex to google in 120 minutes

Vannevar-

Bush:

Biography

Vannevar-Bush (1890-1974)

* Was the planner and chairman of a committee that brought

together government, military, business and scientists (NDRC)

* Supervised the Manhattan project which developed the first

atomic bomb

* In reply to President Roosevelt’s request for post-war direction,

published the articles “As We May Think” (1945) and ”Science

the Endless Frontier” (1945)

* Served as the chairman of the MIT Corporation

* Continued pushing for analog computers, as digital computers

rose to prominence


From memex to google in 120 minutes

Bush’s Vision:

By Science

For Science

Bush’s Vision

Organizing the information:

by science, for science


From memex to google in 120 minutes

  • By Science

  • For Science

  • Tech

  • Predictions

The Record-Technological Predictions

Dry

Photography

Storage

Acquisition

Head-mounted

camera

Improved

microfilm

Dictation

Technology


From memex to google in 120 minutes

  • By Science

  • For Science

  • Tech

  • Predictions

Technological Predictions-The Record

Machines will manipulate

and analyze data

Retrieval

Calculation

And

Automation

Microfilm rapid selector

Calculuation of “advanced math”

and logical thought


From memex to google in 120 minutes

  • By Science

  • For Science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

Microfilm Rapid Selector

* Microfilm storage was popular

during the 1920s and 1930s

* The problem: Selecting documents

* Option: Punched-cards. BUT they are too

slow, and retrieve only the address of the

document, not the document itself

* Goal: A system that will combine

documents and index


From memex to google in 120 minutes

  • By Science

  • For Science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

Microfilm Rapid Selector


From memex to google in 120 minutes

  • By Science

  • For science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

  • The Memex

The Memex

“A memex is a device in which an individual stores all

his books, records, and communications, and which is

mechanized so that it may be consulted with

exceeding speed and flexibility. It is an enlarged

supplement to his memory” (As We May Think,1945)


From memex to google in 120 minutes

  • By Science

  • For science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

  • The Memex

The Memex


From memex to google in 120 minutes

  • By Science

  • For science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

  • The Memex

The Memex - Features

* Storage on microfilm

* Workstation for stored documents and for projection

* An option of adding new images

* An option of adding personal comments to a document

* Retrieval by document and code


From memex to google in 120 minutes

  • By Science

  • For science

  • Tech

  • predictions

  • Microfilm

  • Rapid

  • Selector

  • The Memex

So, What’s new?

Associative annotation and

selection: “trails” .

Imitation of the human brain


From memex to google in 120 minutes

From

Memex to

Hypertext

From Memex to Hypertext

“The 1987 Hypertext conference: The influence of

Bush’s essay “As We May Think” on the emerging field

of hypertext was widely acknowledged” (“From Memex

to Hypertext”,Nyce & Kahn, 1991)

“To a large part we have MEMEXes on our desks today…a

web browser with an editor gives quite a good substitute for

a MEMEX.” (Berners-Lee, talk at Bush symposium MIT,

1995)


From memex to google in 120 minutes

  • From

  • Memex to

  • Hypertext

  • Previous

  • Ideas

BUT…

* Emanuel Goldberg’s statistical machine- a microfilm

selector. A US patent was issued in 1931.

* Paul Otlet, 1934: “The Trait de Documentation”.

Described a workstation for scholars, enables to read,

write, and select documents. Scholars can connect

documents. Coined the term ‘link’.


From memex to google in 120 minutes

  • The Memex

  • Critic

The Memex - Critic

* Trails are artificial. Not an objective measure

* Every user has his own Memex, no networking

* Bush predicted the affect of the record in

laboratory research, law, and business accounting

and not on the “ordinary person”


From memex to google in 120 minutes

  • Internet and

  • WWW

The Birth of the Internet and the WWW

* 1969: The Advanced Research Projects Agency

(ARPA)prepared a plan for the United States to

maintain control over its missiles and bombers after a

nuclear attack. Through this work the Internet was

born.

* Almost 20 years after the birth of the Internet, the

World Wide Web was born to allow the public

exchange of information on a global basis. It was built

on the backbone of the Internet


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

A Brief History of Search Engines

WWWW(1993):Indexed titles and URLs. Listed

results in the order it found them

Excite (1993):Used statistical analysis of word

relationships to make searching more

efficient.

Yahoo (1994) :A collection of favorite websites, that

became a searchable directory. It

provided a description with each URL


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

A Brief History of Search Engines

WebCrawler (1994): Indexed entire web pages. Was

bought in 1997 by Excite

Lycos (1994): Provided ranked relevance

retrieval and prefix matching

Alta Vista (1995): Had nearly unlimited bandwidth

(for that time), allowed natural

language queries, advanced

searching techniques, and

allowed users to add or delete

their own URL within 24 hours.


From memex to google in 120 minutes

“The Anatomy of a Large-

Scale Hypertextual Web

Search Engine”

By S. Brin and L. Page


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

Google

* Google was born in Stanford university

* Was launched in 1998

* Main goal: High Quality Search

Quality = Relevance


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

Obstacles

Web:

* Scalability of the web and a growing number of

queries

* There is no control on what comes in the web-

heterogeneous collection

Search Engines:

* Textual search provides many ‘junk results’ (A

search engine that does not return itself to the top

of 10 results)

* Commercial SE, loss of relevance

* Spam


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

How Google Achieves Quality search

It Makes use of the hypertextual information. In

particular it utilizes:

1. The link structure of the web to calculate a quality ranking

for each web page (PageRank)

2. Anchor text . Associated to the page in points to: Improves

search results and causes for results that are not text-based

3. Other features such as proximity and visual presentation

details (e.g. font size)


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Google’s Architecture

Major functions:

1. Crawling

2. Indexing

3. Ranking

4. Searching


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Google’s Architecture

URL Server

- sends lists of URLs to crawlers

Crawler

- downloads web pages

Store Server

- compresses & stores web pages

into the repository

Indexer

- reads the repository &

uncompresses the documents

- parses the documents

- creates forward index

- parses out the link


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Google’s Architecture

URL Revolver

- converts relative URLs from the anchors

file, to absolute URLs and then to docIDs

- generates a database of links

- puts the anchor text into the f. index

Sorter

- generates the inverted index

Searcher

- answers queries


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Crawling

The Web

Crawling The Web


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Searching the Web

1. Parse the query.

2. Convert words into wordIDs.

3. Seek to the start of the doclist in the short barrel for

every word.

4. Scan through the doclists until there is a

document that matches all the search terms.


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

Searching the Web

5. Compute the rank of that document for the query.

6. If we are in the short barrels and at the end of

any doclist, seek to the start of the doclist in the

full barrel for every word and go to step 4.

7. If we are not at the end of any doclist go to step 4.

8. Sort the documents that have matched by rank

and return the top k.


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

The Ranker

* Uses hit lists, anchor text hits and PageRank

* Types of hits: title, anchor, URL, plain text small

font…


From memex to google in 120 minutes

  • Internet and

  • WWW

  • Search

  • Engines

  • Google

  • Obstacles

  • Quality search

  • Architecture

The Ranker

Vectors:

* Type- weight vector, sorted by types for one word query

* type-prox weight vector, for multiple words query

* Count-weight vector

* IR Score is a the dot product of the count weight and the

types-weight vectors


From memex to google in 120 minutes

What we saw so far:

Bush : Memex, Hypertext, Goldberg, Otlet

Google: Goal, Obstacles, How to achieve

quality, architecture


From memex to google in 120 minutes

TO BE CONTINUED...


  • Login