Why Search Engines are used increasingly
Download
1 / 37

Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: bjorn.ols - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Why Search Engines are used increasingly to Offload Queries from Databases. Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: [email protected] Cell: +47 48011157. The Typo Problem. Talent Offloading .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: bjorn.ols' - raoul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Why Search Engines are used increasingly to Offload Queries from Databases

  • Bjørn Olstad

  • CTO FAST Search & Transfer

  • Adjunct Prof. The Norwegian University of Science & Technology

  • Email: [email protected]

  • Cell: +47 48011157





The rdbms experience l.jpg
The RDBMS Experience

High input barrier

”You are viewing 5 random jobs out of 2461 jobs in total....”


Careerbuilder use scenario part 1 l.jpg

1

CareerBuilderUse scenario, part 1

30956 jobs


Careerbuilder use scenario part 2 l.jpg

2

CareerBuilderUse scenario, part 2

1084 jobs


Careerbuilder use scenario part 3 l.jpg

3

CareerBuilderUse scenario, part 3

30 jobs


Careerbuilder use scenario part 4 l.jpg
CareerBuilderUse scenario, part 4

5 jobs

30956  5 targeted jobs in 3 steps


Challenger shuttle launch l.jpg
Challenger Shuttle Launch

Fax to NASA from contractor with O-ring concern



Iyp a disruptive change l.jpg

ESP: Cleansing, Mining, Relevance and Discovery

IYP: A Disruptive Change

Taylor or Gibson guitar?

Good local offers?

Compare offerings

Phone / Directions

BTW: I’m using my iPAQ

What is the phone numberto Will’s Barber shop?

Product &ServicesBlogs++

Companyweb site


Isvs a disruptive change l.jpg

Search

ISVs: A Disruptive Change

Siebel 2000

Siebel 2005

“my” CRM Application

“my” CRM Application

Information Access Layer

3rd party content

Search is a tactical afterthought

Search is a strategic enabler


Revisit the assumptions l.jpg

Relational algebralarge – but “finite”data sets

structured data

SQL-70

Oracle-79

SQL-89

SQL-92

SQL-99

Search & Explore focused“infinite”data sets

Unstructured & Structured

GIGABYTES

SQL-03

Revisit the Assumptions …

2003: 24B

2002: 12B

Cave paintings,Bone tools 40,000 BCE

Writing 3500 BCE

2001: 6B

0 C.E.

Paper 105

2000: 3B

Printing 1450

Electricity, Telephone 1870

80% Unstructured

Transistor 1947

Computing 1950

Internet (DARPA) Late 1960s

The Web 1993

1999


Extreme capabilities l.jpg
Extreme Capabilities?

  • Feeding/streaming, transaction, retrieval or analytics centric?

  • Content size: M, L, VL, VVVL or Vn∞ L?

  • Schema centric, Semi-structured XML, Text, Agnostic?

  • Fuzzy & Value vs. Binary & Completeness?

  • Discovery primitives?

  • User interaction part of design target?


Query latency rdbms vs esp l.jpg

ESP

The Result:

  • #1: FAST ESP w/ disk

    • Mean = 99 [ms]

    • St.dev. = 36 [ms]

  • #2: Oracle w/ memory mapping

    • Mean = 4 057 [ms]

    • St.dev. = 9 368 [ms]

RDBMS

Query LatencyRDBMS vs ESP

Test Data:

  • Structured data:

    • 5 million records;

    • 13 fields per record

  • Structured queries:

    • 22 SQL queries( Representative in ERP )


Query per second rdbms vs esp l.jpg
Query Per SecondRDBMS vs ESP

QPS

Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks

Identical data : auction data from eBay, 3.6 million doc’s

Identical queries: 200 queries defined by Oracle


Disruptive change l.jpg

Relational Model

Disruptive Change

  • Star, snowflake schemas++

  • Cubes / datamarts ++ Incremental fixes to painful shortcomings Adds complexity

Queries that fit The Model

Queries that don’t fit The Model

Alternative I

Alternative II

  • Schema agnostic

  • Scalable ad-hoc querying

  • BLOBS  Contextual Insight

  • Real-time fusion of disparate data models

  • Massive fault tolerant scalability


Extreme capabilities esp design targets l.jpg

Contextual Insight

Value/Noise SNR

User Interaction

ContextualRefinement

Extreme CapabilitiesESP Design Targets

Powering Search Derivative Applications (SDAs)

Game Changer driven by Extreme Retrival and on-the-fly Analytics


Database query offloading example autotrader com l.jpg

ESP

Database Query OffloadingExample: AutoTrader.com

RDBMS:

  • HW-cost: $320K (32CPU on 4 Sun servers)

  • 90% sub-second query responseAverage = 12 s for the rest ….

  • Relevance = Sorting

  • 5 FTE to maintain

ESP:

  • HW-cost: $90K

  • 100% sub-second query response

  • Flexible relevance and discovery

  • 0.5 FTE to maintain

Car Dealers - Product Supply


Content scalability rdbms vs esp l.jpg
Content ScalabilityRDBMS vs ESP

Examples of ESP deployments

  • Compliance case:

    • 50B documents @ 80k average

    •  4 PB (around 100 web indexes)

  • Storage:

    • Intelligent content addressable storage

    • XML metadata and full content

    • EMC Centera: N * 256TB (N=1..400)

  • Webmining – Webfountain:

    • 60.000 : 1 in query capacity (ESP : DB)


Intelligent storage storage and search unite l.jpg
Intelligent StorageStorage and Search Unite

Discover

Simple

Scalable

Secure


Contextual search l.jpg

From ACCESS To INSIGHT

Contextual Search

  • “Best of Web”Recommender / Authority

  • “Best of Enterprise”Linguistic / Statistic

Any new supiciousfinancial transactionpatterns?

Where is the emailfrom Peter aboutROI analysis?

FIND

EXPLORE

Contextual Relevance

Contextual Navigation

  • Contextual fact discovery

  • On-the-fly meta-dataanalysis


Turning around the pyramid hbz de leading german library service center l.jpg
Turning around the PyramidHBZ.de – Leading German Library Service Center

From:

Librarians

To:

Researchers

Single Field Search

Quering

FAST ESP

WWW

(HTML, XML, WML,

JavaScript)

SQL LIB

DB

DB

DB

DB

DB

STRUCTURED


Esp @ scopus l.jpg
ESP @ SCOPUS

  • >200M articles / 180M citations

  • 180TB capacity / 14000 journals

David Goodman standing up and declaring in public, that Scopus is the best-designed database he's ever seen …


Relevance drives revenue l.jpg
Relevance Drives Revenue

Search Reduces Clicks to Purchase and Browsing…

… and Drives Revenue

  • Reduced # of clicks to buy content from > 4 to < 2

  • 50% reduction in ringtone browsing

  • 100% increase in search

  • 20% increase in ringtone revenue

Launched search

Launched search

4.50

140%

140%

4.00

120%

120%

3.50

100%

100%

3.00

Search

page views per sale

80%

80%

2.50

Clicks to Purchase

2.00

60%

60%

1.50

40%

40%

1.00

Revenue

20%

20%

0.50

0.00

0%

0%

-20%

-20%

Week 1

Week 10

Week 1

Week 10

-40%

-40%

-60%

-60%

Browsing


Slide27 l.jpg

ØKOKRIM

Business AnalyticsProcessing of real-time streams

Example: Norwegian Customs Foreign Exchange Transaction Monitoring

SECURITY ACCESS MODULE

ACL Monitor

User Monitor

Real-time Registration

Queries

MessageQueue

Results

Alerts

Database

connector

Transaction Log

Data

Validation

Firewall

Firewall



Business intelligence esp vs rdbms technology l.jpg
Business IntelligenceESP vs. RDBMS Technology

OBSERVATIONThe Enterprise Search Platform (ESP), a relatively new concept, integrating advanced technologies typically associated with search engines, database tools, and analytical systems, is fast becoming able to solve modern business intelligence problems (using both structured and unstructured data) in a way that is fundamentally different from, and ultimately superior to, that of other currently available analytical or database software.

PREDICTIONEnterprise Search Platform and search centric application technology represents a true paradigm shift in the way data will be stored, analyzed and reported on in the future. Resulting realignments in the marketplace may be both rapid and tumultuous.

- Chief strategist leading BI vendor


If your only tool is a hammer l.jpg
If your only tool is a hammer ....

... every problem looks like a nail



Text structure l.jpg
Text  Structure

<Category>FINANCIAL</ Category >

<Author>George Stein</ Author >

BC-dynegy-enron-offer-update5

Dynegy May Offer at Least $8 Bln to Acquire Enron (Update5)

By George Stein

SOURCEc.2001 Bloomberg News

BODY

<Company>Dynegy Inc</Company>

<Person>Roger Hamilton</Person>

<Company>John Hancock Advisers Inc.</Company>

<PersonPositionCompany>

<OFFLENOFFSET="3576" LENGTH="63" />

<Person>RogerHamilton</Person>

<Position>moneymanager</Position>

<Company>John Hancock Advisers Inc.</Company>

</PersonPositionCompany>

…….

``Dynegy has to act fast,'' said Roger Hamilton, a money manager with John Hancock Advisers Inc., which sold its Enron shares in recent weeks. ``If Enron can't get financing and its bonds go to junk, they lose counterparties and their marvelous business vanishes.''

Moody's Investors Service lowered its rating on Enron's bonds to ``Baa2'' and Standard & Poor's cut the debt to ``BBB.'' in the past two weeks.

……

Fact

<Company>Enron Corp</Company>

<Company>Moody's Investors Service</Company>

<CreditRating>

<OFFLENOFFSET="3814" LENGTH="61" />

<Company_Source>Moody'sInvestorsService</Company_Source>

<Company_Rated>EnronCorp</Company_Rated>

<Trend>downgraded</Trend><Rank_New>Baa2</Rank_New>

<__Type>bonds</__Type>

</CreditRating>

Event


The bi hammer approach l.jpg
The BI “hammer” Approach

Document Vector

Antiobiotics,Peptidyl,Eubacteria,RNA,Mg,…

SVD Analysis

( λ1, λ2, ..., λn )

{ λ1, λ2, ..., λn, Structured attributes }


Contextual refinement etl and semantic understanding unite l.jpg
Contextual RefinementETL and Semantic understanding unite

Direct access to RDBMs

for info from some Telco’s

ESP lookup

Logic for cleansing

Ordered hits (by quality)

XML feed from other Telco’s

Cleansed data

to ESP

XML

Ambigous data

(close hits or unidentified)

Flat files (CSV or fixed)from the ’laggards’

clean data

’Error’ database for manual inspection, correction, storage/learning

Master database for persistant storage


Contextual insight query time fact analysis @ sub document level l.jpg
Contextual InsightQuery-time fact analysis @ sub-document level

“…entry probe carried to[Saturn]’s moon Titanas part of the…”

Intent

Concepts


Slide36 l.jpg

Automatedvisitor ratings

Contextual NavigationThisIsTravel


Revisit the assumptions37 l.jpg

SQL-70

Oracle-79

SQL-89

SQL-92

SQL-99

GIGABYTES

SQL-03

Revisit the Assumptions …

2003: 24B

Scalable Search

2002: 12B

Cave paintings,Bone tools 40,000 BCE

Writing 3500 BCE

2001: 6B

0 C.E.

Paper 105

2000: 3B

Printing 1450

Electricity, Telephone 1870

80% Unstructured

Transistor 1947

Computing 1950

Internet (DARPA) Late 1960s

The Web 1993

1999


ad