Proxy server architectures for olap
Download
1 / 15

Download Powerpoit presentation - PowerPoint PPT Presentation


  • 339 Views
  • Updated On :

Proxy-Server Architectures for OLAP. Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY. The Problem. Data warehouses: Large repositories of historical summarized information Distributed: Centralized or decentralized. Static structure!

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Download Powerpoit presentation' - KeelyKia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Proxy server architectures for olap l.jpg

Proxy-Server Architectures for OLAP

Panos Kalnis, Dimitris Papadias

THE HONG KONG UNIVERSITY

OF SCIENCE AND TECHNOLOGY


The problem l.jpg
The Problem

  • Data warehouses: Large repositories of historical summarized information

    • Distributed: Centralized or decentralized. Static structure!

  • WWW: new opportunities to access warehouses. Example:Stock market data

    • Professional brokers: Access directly the warehouse by special purpose OLAP software

    • Individual investors around the world: Use web browsers. Slow network? Server overloading? Caching?

London Stock Market Warehouse

OLAP clients

Internet

Tokyo

Singapore

Hong Kong


Olap cache servers ocs l.jpg

Tokyo

Singapore

Hong Kong

OLAP Cache Servers (OCS)

  • Similar to WWW Proxy-Servers

  • Geographically spanned and connected through an arbitrary network

  • They cache results from OLAP queries

  • Can derive new results from the cached data

  • Clients connect to an OCS. If the OCS cannot answer, the query is redirected to a neighbor OCS or to the warehouse

  • Result: Lower network cost, better scalability, lower response time

London Stock Market Warehouse

OLAP clients

Internet

OCS

OCS


Ocs vs www proxy servers l.jpg
OCS vs. WWW Proxy-Servers

  • OCS has computational capabilities.

  • The cache admission and replacement policies are optimized for OLAP operations.

  • OCS can update its contents incrementally, instead of invalidating the cached data


Background l.jpg
Background

  • Data Cube Lattice: Interdependencies among views

    SELECT P_id, T_id, SUM(Sales)

    FROM data

    GROUP BY P_id, T_id

  • Client-Server OLAP Caching

    • Watchman: Semantic caching

    • Dynamat: Stores fragments

    • Caching chunks

  • OCSs may use any of these methods

    • The prototype caches entire views


System architecture l.jpg
System Architecture

  • Multiple levels of caching

  • Cooperation among OCSs

  • Physical organization and fragmentation may differ in each OCS

  • Centralized: Query optimization and cache control in a central site (intranet)

  • Semi-centralized: Only query optimization in central site. Each OCS controls its local cache

  • Autonomous: All decisions are taken locally (internet)


Query optimizer l.jpg
Query Optimizer

  • A client sends a query q

    Autonomous policy:

    • OCS has the exact answer

    • OCS cannot answer q

    • OCS can derive q

Cost = Read + Transfer


Query optimizer cont l.jpg
Query Optimizer (cont.)

  • Autonomous: Scalable, easy to implement, high availability.

    • Large, unstructured, dynamic environments

    • BUT may produce inefficient plans

  • Centralized (and semi-centralized):

    • A central site has global information for all OCSs.

    • Creates the execution and routing plan for all queries

    • Low availability, low scalability

    • Suitable for intranets


Caching policy autonomous l.jpg
Caching Policy: Autonomous

  • Lower Benefit First: Considers interdependencies, but:

    • Cost() difficult to calculate; If v cannot be answered locally we assume that it is answered by the warehouse

    • The complexity of LBF grows quadratically with the number of materialized views

  • We evict a set from the cache if the combined benefit < benefit(u). Select the victim set: Similar idea to [HRU96]


Caching policy centralized l.jpg
Caching Policy: Centralized

  • All the decisions are taken at the central site

  • Centralized policy uses Smaller Penalty First

    • Experiments show that the difference between SPF and LBF is not significant

  • In general: A bad decision of the caching algorithm does not affect the performance significantly BUT a bad decision of the optimizer has significant impact


Updates l.jpg
Updates

  • Changes are propagated periodically to the warehouse. It computes deltas for its materialized views

  • No down time for the OCSs

  • OCS updates its cache on-demand: Invalidate vs. incrementally update

  • Deltas are treated as normal data

  • Deltas are evicted at the end of the update period

  • Non-updated results are also evicted


Experimental setup l.jpg

DCSR vs. Cmax

Experimental Setup

  • APB and TPC-H

  • Cmax = max Cache as a percentage of the entire cube

  • 1500 queries at each OCS

Worst case

OCS configuration

Client-Side-Cache


Effect of network cost l.jpg
Effect of Network Cost

  • 3 OCSs – we vary the speed of the links to the DW

  • In slow networks, OCSs utilize the contents of their neighbors

  • In fast networks, many queries reach the warehouse, because the computation cost is lower

DCSR vs. Cmax

Warehouse Hit Ratiovs. Cmax


Autonomous vs semi centralized l.jpg

DCSR vs. tightness

DCSR vs. #of OCSs

Autonomous vs. Semi-centralized

  • Centralized  Semi-Centralized

  • High tightness or many OCSs  Autonomous  Semi-Centralized

100 OCSs


Conclusions l.jpg
Conclusions

  • OCS: Architecture for caching OLAP results

  • Beneficial for ad-hoc, geographically spanned and possibly mobile users, who sporadically need to access a warehouse

  • Complimentary to both client-side-cache systems and distributed OLAP approaches

  • Future work: Prototype on top of a DBMS, support of multiple DWs, finer granularity of cached data, special queries.


ad