Proxy server architectures for olap
1 / 15

Download Powerpoit presentation - PowerPoint PPT Presentation

  • Updated On :

Proxy-Server Architectures for OLAP. Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY. The Problem. Data warehouses: Large repositories of historical summarized information Distributed: Centralized or decentralized. Static structure!

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Download Powerpoit presentation' - KeelyKia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Proxy server architectures for olap l.jpg

Proxy-Server Architectures for OLAP

Panos Kalnis, Dimitris Papadias



The problem l.jpg
The Problem

  • Data warehouses: Large repositories of historical summarized information

    • Distributed: Centralized or decentralized. Static structure!

  • WWW: new opportunities to access warehouses. Example:Stock market data

    • Professional brokers: Access directly the warehouse by special purpose OLAP software

    • Individual investors around the world: Use web browsers. Slow network? Server overloading? Caching?

London Stock Market Warehouse

OLAP clients




Hong Kong

Olap cache servers ocs l.jpg



Hong Kong

OLAP Cache Servers (OCS)

  • Similar to WWW Proxy-Servers

  • Geographically spanned and connected through an arbitrary network

  • They cache results from OLAP queries

  • Can derive new results from the cached data

  • Clients connect to an OCS. If the OCS cannot answer, the query is redirected to a neighbor OCS or to the warehouse

  • Result: Lower network cost, better scalability, lower response time

London Stock Market Warehouse

OLAP clients




Ocs vs www proxy servers l.jpg
OCS vs. WWW Proxy-Servers

  • OCS has computational capabilities.

  • The cache admission and replacement policies are optimized for OLAP operations.

  • OCS can update its contents incrementally, instead of invalidating the cached data

Background l.jpg

  • Data Cube Lattice: Interdependencies among views

    SELECT P_id, T_id, SUM(Sales)

    FROM data

    GROUP BY P_id, T_id

  • Client-Server OLAP Caching

    • Watchman: Semantic caching

    • Dynamat: Stores fragments

    • Caching chunks

  • OCSs may use any of these methods

    • The prototype caches entire views

System architecture l.jpg
System Architecture

  • Multiple levels of caching

  • Cooperation among OCSs

  • Physical organization and fragmentation may differ in each OCS

  • Centralized: Query optimization and cache control in a central site (intranet)

  • Semi-centralized: Only query optimization in central site. Each OCS controls its local cache

  • Autonomous: All decisions are taken locally (internet)

Query optimizer l.jpg
Query Optimizer

  • A client sends a query q

    Autonomous policy:

    • OCS has the exact answer

    • OCS cannot answer q

    • OCS can derive q

Cost = Read + Transfer

Query optimizer cont l.jpg
Query Optimizer (cont.)

  • Autonomous: Scalable, easy to implement, high availability.

    • Large, unstructured, dynamic environments

    • BUT may produce inefficient plans

  • Centralized (and semi-centralized):

    • A central site has global information for all OCSs.

    • Creates the execution and routing plan for all queries

    • Low availability, low scalability

    • Suitable for intranets

Caching policy autonomous l.jpg
Caching Policy: Autonomous

  • Lower Benefit First: Considers interdependencies, but:

    • Cost() difficult to calculate; If v cannot be answered locally we assume that it is answered by the warehouse

    • The complexity of LBF grows quadratically with the number of materialized views

  • We evict a set from the cache if the combined benefit < benefit(u). Select the victim set: Similar idea to [HRU96]

Caching policy centralized l.jpg
Caching Policy: Centralized

  • All the decisions are taken at the central site

  • Centralized policy uses Smaller Penalty First

    • Experiments show that the difference between SPF and LBF is not significant

  • In general: A bad decision of the caching algorithm does not affect the performance significantly BUT a bad decision of the optimizer has significant impact

Updates l.jpg

  • Changes are propagated periodically to the warehouse. It computes deltas for its materialized views

  • No down time for the OCSs

  • OCS updates its cache on-demand: Invalidate vs. incrementally update

  • Deltas are treated as normal data

  • Deltas are evicted at the end of the update period

  • Non-updated results are also evicted

Experimental setup l.jpg

DCSR vs. Cmax

Experimental Setup

  • APB and TPC-H

  • Cmax = max Cache as a percentage of the entire cube

  • 1500 queries at each OCS

Worst case

OCS configuration


Effect of network cost l.jpg
Effect of Network Cost

  • 3 OCSs – we vary the speed of the links to the DW

  • In slow networks, OCSs utilize the contents of their neighbors

  • In fast networks, many queries reach the warehouse, because the computation cost is lower

DCSR vs. Cmax

Warehouse Hit Ratiovs. Cmax

Autonomous vs semi centralized l.jpg

DCSR vs. tightness

DCSR vs. #of OCSs

Autonomous vs. Semi-centralized

  • Centralized  Semi-Centralized

  • High tightness or many OCSs  Autonomous  Semi-Centralized

100 OCSs

Conclusions l.jpg

  • OCS: Architecture for caching OLAP results

  • Beneficial for ad-hoc, geographically spanned and possibly mobile users, who sporadically need to access a warehouse

  • Complimentary to both client-side-cache systems and distributed OLAP approaches

  • Future work: Prototype on top of a DBMS, support of multiple DWs, finer granularity of cached data, special queries.