1 / 37

Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: b

Why Search Engines are used increasingly to Offload Queries from Databases Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: bjorn.olstad@fast.no Cell: +47 48011157 The Typo Problem... Talent Offloading ....

salena
Download Presentation

Bjørn Olstad CTO FAST Search & Transfer Adjunct Prof. The Norwegian University of Science & Technology Email: b

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why Search Engines are used increasingly to Offload Queries from Databases • Bjørn Olstad • CTO FAST Search & Transfer • Adjunct Prof. The Norwegian University of Science & Technology • Email: bjorn.olstad@fast.no • Cell: +47 48011157

  2. The Typo Problem...

  3. Talent Offloading ....

  4. The Web Search Experience

  5. The RDBMS Experience High input barrier ”You are viewing 5 random jobs out of 2461 jobs in total....”

  6. 1 CareerBuilderUse scenario, part 1 30956 jobs

  7. 2 CareerBuilderUse scenario, part 2 1084 jobs

  8. 3 CareerBuilderUse scenario, part 3 30 jobs

  9. CareerBuilderUse scenario, part 4 5 jobs 30956  5 targeted jobs in 3 steps

  10. Challenger Shuttle Launch Fax to NASA from contractor with O-ring concern

  11. Presentation Matters …

  12. ESP: Cleansing, Mining, Relevance and Discovery IYP: A Disruptive Change Taylor or Gibson guitar? Good local offers? Compare offerings Phone / Directions BTW: I’m using my iPAQ What is the phone numberto Will’s Barber shop? Product &ServicesBlogs++ Companyweb site

  13. Search ISVs: A Disruptive Change Siebel 2000 Siebel 2005 “my” CRM Application “my” CRM Application Information Access Layer 3rd party content Search is a tactical afterthought Search is a strategic enabler

  14. Relational algebralarge – but “finite”data sets structured data SQL-70 Oracle-79 SQL-89 SQL-92 SQL-99 Search & Explore focused“infinite”data sets Unstructured & Structured GIGABYTES SQL-03 Revisit the Assumptions … 2003: 24B 2002: 12B Cave paintings,Bone tools 40,000 BCE Writing 3500 BCE 2001: 6B 0 C.E. Paper 105 2000: 3B Printing 1450 Electricity, Telephone 1870 80% Unstructured Transistor 1947 Computing 1950 Internet (DARPA) Late 1960s The Web 1993 1999

  15. Extreme Capabilities? • Feeding/streaming, transaction, retrieval or analytics centric? • Content size: M, L, VL, VVVL or Vn∞ L? • Schema centric, Semi-structured XML, Text, Agnostic? • Fuzzy & Value vs. Binary & Completeness? • Discovery primitives? • User interaction part of design target?

  16. ESP The Result: • #1: FAST ESP w/ disk • Mean = 99 [ms] • St.dev. = 36 [ms] • #2: Oracle w/ memory mapping • Mean = 4 057 [ms] • St.dev. = 9 368 [ms] RDBMS Query LatencyRDBMS vs ESP Test Data: • Structured data: • 5 million records; • 13 fields per record • Structured queries: • 22 SQL queries( Representative in ERP )

  17. Query Per SecondRDBMS vs ESP QPS Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks Identical data : auction data from eBay, 3.6 million doc’s Identical queries: 200 queries defined by Oracle

  18. Relational Model Disruptive Change • Star, snowflake schemas++ • Cubes / datamarts ++ Incremental fixes to painful shortcomings Adds complexity Queries that fit The Model Queries that don’t fit The Model Alternative I Alternative II • Schema agnostic • Scalable ad-hoc querying • BLOBS  Contextual Insight • Real-time fusion of disparate data models • Massive fault tolerant scalability

  19. Contextual Insight Value/Noise SNR User Interaction ContextualRefinement Extreme CapabilitiesESP Design Targets Powering Search Derivative Applications (SDAs) Game Changer driven by Extreme Retrival and on-the-fly Analytics

  20. ESP Database Query OffloadingExample: AutoTrader.com RDBMS: • HW-cost: $320K (32CPU on 4 Sun servers) • 90% sub-second query responseAverage = 12 s for the rest …. • Relevance = Sorting • 5 FTE to maintain ESP: • HW-cost: $90K • 100% sub-second query response • Flexible relevance and discovery • 0.5 FTE to maintain Car Dealers - Product Supply

  21. Content ScalabilityRDBMS vs ESP Examples of ESP deployments • Compliance case: • 50B documents @ 80k average •  4 PB (around 100 web indexes) • Storage: • Intelligent content addressable storage • XML metadata and full content • EMC Centera: N * 256TB (N=1..400) • Webmining – Webfountain: • 60.000 : 1 in query capacity (ESP : DB)

  22. Intelligent StorageStorage and Search Unite Discover Simple Scalable Secure

  23. From ACCESS To INSIGHT Contextual Search • “Best of Web”Recommender / Authority • “Best of Enterprise”Linguistic / Statistic Any new supiciousfinancial transactionpatterns? Where is the emailfrom Peter aboutROI analysis? FIND EXPLORE Contextual Relevance Contextual Navigation • Contextual fact discovery • On-the-fly meta-dataanalysis

  24. Turning around the PyramidHBZ.de – Leading German Library Service Center From: Librarians To: Researchers Single Field Search Quering FAST ESP WWW (HTML, XML, WML, JavaScript) SQL LIB … DB DB DB DB DB STRUCTURED

  25. ESP @ SCOPUS • >200M articles / 180M citations • 180TB capacity / 14000 journals David Goodman standing up and declaring in public, that Scopus is the best-designed database he's ever seen …

  26. Relevance Drives Revenue Search Reduces Clicks to Purchase and Browsing… … and Drives Revenue • Reduced # of clicks to buy content from > 4 to < 2 • 50% reduction in ringtone browsing • 100% increase in search • 20% increase in ringtone revenue Launched search Launched search 4.50 140% 140% 4.00 120% 120% 3.50 100% 100% 3.00 Search page views per sale 80% 80% 2.50 Clicks to Purchase 2.00 60% 60% 1.50 40% 40% 1.00 Revenue 20% 20% 0.50 0.00 0% 0% -20% -20% Week 1 Week 10 Week 1 Week 10 -40% -40% -60% -60% Browsing

  27. ØKOKRIM Business AnalyticsProcessing of real-time streams Example: Norwegian Customs Foreign Exchange Transaction Monitoring SECURITY ACCESS MODULE ACL Monitor User Monitor Real-time Registration Queries MessageQueue Results Alerts Database connector Transaction Log Data Validation Firewall Firewall

  28. Technology Maturity...RDBMS vs ESP

  29. Business IntelligenceESP vs. RDBMS Technology OBSERVATIONThe Enterprise Search Platform (ESP), a relatively new concept, integrating advanced technologies typically associated with search engines, database tools, and analytical systems, is fast becoming able to solve modern business intelligence problems (using both structured and unstructured data) in a way that is fundamentally different from, and ultimately superior to, that of other currently available analytical or database software. PREDICTIONEnterprise Search Platform and search centric application technology represents a true paradigm shift in the way data will be stored, analyzed and reported on in the future. Resulting realignments in the marketplace may be both rapid and tumultuous. - Chief strategist leading BI vendor

  30. If your only tool is a hammer .... ... every problem looks like a nail

  31. UIMA: Architecture

  32. Text  Structure <Category>FINANCIAL</ Category > <Author>George Stein</ Author > BC-dynegy-enron-offer-update5 Dynegy May Offer at Least $8 Bln to Acquire Enron (Update5) By George Stein SOURCEc.2001 Bloomberg News BODY <Company>Dynegy Inc</Company> <Person>Roger Hamilton</Person> <Company>John Hancock Advisers Inc.</Company> <PersonPositionCompany> <OFFLENOFFSET="3576" LENGTH="63" /> <Person>RogerHamilton</Person> <Position>moneymanager</Position> <Company>John Hancock Advisers Inc.</Company> </PersonPositionCompany> ……. ``Dynegy has to act fast,'' said Roger Hamilton, a money manager with John Hancock Advisers Inc., which sold its Enron shares in recent weeks. ``If Enron can't get financing and its bonds go to junk, they lose counterparties and their marvelous business vanishes.'' Moody's Investors Service lowered its rating on Enron's bonds to ``Baa2'' and Standard & Poor's cut the debt to ``BBB.'' in the past two weeks. …… Fact <Company>Enron Corp</Company> <Company>Moody's Investors Service</Company> <CreditRating> <OFFLENOFFSET="3814" LENGTH="61" /> <Company_Source>Moody'sInvestorsService</Company_Source> <Company_Rated>EnronCorp</Company_Rated> <Trend>downgraded</Trend><Rank_New>Baa2</Rank_New> <__Type>bonds</__Type> </CreditRating> Event

  33. The BI “hammer” Approach Document Vector Antiobiotics,Peptidyl,Eubacteria,RNA,Mg,… SVD Analysis ( λ1, λ2, ..., λn ) { λ1, λ2, ..., λn, Structured attributes }

  34. Contextual RefinementETL and Semantic understanding unite Direct access to RDBMs for info from some Telco’s ESP lookup Logic for cleansing Ordered hits (by quality) XML feed from other Telco’s Cleansed data to ESP XML Ambigous data (close hits or unidentified) Flat files (CSV or fixed)from the ’laggards’ clean data ’Error’ database for manual inspection, correction, storage/learning Master database for persistant storage

  35. Contextual InsightQuery-time fact analysis @ sub-document level “…entry probe carried to[Saturn]’s moon Titanas part of the…” Intent Concepts

  36. Automatedvisitor ratings Contextual NavigationThisIsTravel

  37. SQL-70 Oracle-79 SQL-89 SQL-92 SQL-99 GIGABYTES SQL-03 Revisit the Assumptions … 2003: 24B Scalable Search 2002: 12B Cave paintings,Bone tools 40,000 BCE Writing 3500 BCE 2001: 6B 0 C.E. Paper 105 2000: 3B Printing 1450 Electricity, Telephone 1870 80% Unstructured Transistor 1947 Computing 1950 Internet (DARPA) Late 1960s The Web 1993 1999

More Related