proxy server architectures for olap l.
Skip this Video
Loading SlideShow in 5 Seconds..
Proxy-Server Architectures for OLAP PowerPoint Presentation
Download Presentation
Proxy-Server Architectures for OLAP

Loading in 2 Seconds...

play fullscreen
1 / 15

Proxy-Server Architectures for OLAP - PowerPoint PPT Presentation

  • Uploaded on

Proxy-Server Architectures for OLAP. Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY. The Problem. Data warehouses: Large repositories of historical summarized information Distributed: Centralized or decentralized. Static structure!

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Proxy-Server Architectures for OLAP' - KeelyKia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
proxy server architectures for olap

Proxy-Server Architectures for OLAP

Panos Kalnis, Dimitris Papadias



the problem
The Problem
  • Data warehouses: Large repositories of historical summarized information
    • Distributed: Centralized or decentralized. Static structure!
  • WWW: new opportunities to access warehouses. Example:Stock market data
    • Professional brokers: Access directly the warehouse by special purpose OLAP software
    • Individual investors around the world: Use web browsers. Slow network? Server overloading? Caching?

London Stock Market Warehouse

OLAP clients




Hong Kong

olap cache servers ocs



Hong Kong

OLAP Cache Servers (OCS)
  • Similar to WWW Proxy-Servers
  • Geographically spanned and connected through an arbitrary network
  • They cache results from OLAP queries
  • Can derive new results from the cached data
  • Clients connect to an OCS. If the OCS cannot answer, the query is redirected to a neighbor OCS or to the warehouse
  • Result: Lower network cost, better scalability, lower response time

London Stock Market Warehouse

OLAP clients




ocs vs www proxy servers
OCS vs. WWW Proxy-Servers
  • OCS has computational capabilities.
  • The cache admission and replacement policies are optimized for OLAP operations.
  • OCS can update its contents incrementally, instead of invalidating the cached data
  • Data Cube Lattice: Interdependencies among views

SELECT P_id, T_id, SUM(Sales)

FROM data

GROUP BY P_id, T_id

  • Client-Server OLAP Caching
    • Watchman: Semantic caching
    • Dynamat: Stores fragments
    • Caching chunks
  • OCSs may use any of these methods
    • The prototype caches entire views
system architecture
System Architecture
  • Multiple levels of caching
  • Cooperation among OCSs
  • Physical organization and fragmentation may differ in each OCS
  • Centralized: Query optimization and cache control in a central site (intranet)
  • Semi-centralized: Only query optimization in central site. Each OCS controls its local cache
  • Autonomous: All decisions are taken locally (internet)
query optimizer
Query Optimizer
  • A client sends a query q

Autonomous policy:

    • OCS has the exact answer
    • OCS cannot answer q
    • OCS can derive q

Cost = Read + Transfer

query optimizer cont
Query Optimizer (cont.)
  • Autonomous: Scalable, easy to implement, high availability.
    • Large, unstructured, dynamic environments
    • BUT may produce inefficient plans
  • Centralized (and semi-centralized):
    • A central site has global information for all OCSs.
    • Creates the execution and routing plan for all queries
    • Low availability, low scalability
    • Suitable for intranets
caching policy autonomous
Caching Policy: Autonomous
  • Lower Benefit First: Considers interdependencies, but:
    • Cost() difficult to calculate; If v cannot be answered locally we assume that it is answered by the warehouse
    • The complexity of LBF grows quadratically with the number of materialized views
  • We evict a set from the cache if the combined benefit < benefit(u). Select the victim set: Similar idea to [HRU96]
caching policy centralized
Caching Policy: Centralized
  • All the decisions are taken at the central site
  • Centralized policy uses Smaller Penalty First
    • Experiments show that the difference between SPF and LBF is not significant
  • In general: A bad decision of the caching algorithm does not affect the performance significantly BUT a bad decision of the optimizer has significant impact
  • Changes are propagated periodically to the warehouse. It computes deltas for its materialized views
  • No down time for the OCSs
  • OCS updates its cache on-demand: Invalidate vs. incrementally update
  • Deltas are treated as normal data
  • Deltas are evicted at the end of the update period
  • Non-updated results are also evicted
experimental setup

DCSR vs. Cmax

Experimental Setup
  • APB and TPC-H
  • Cmax = max Cache as a percentage of the entire cube
  • 1500 queries at each OCS

Worst case

OCS configuration


effect of network cost
Effect of Network Cost
  • 3 OCSs – we vary the speed of the links to the DW
  • In slow networks, OCSs utilize the contents of their neighbors
  • In fast networks, many queries reach the warehouse, because the computation cost is lower

DCSR vs. Cmax

Warehouse Hit Ratiovs. Cmax

autonomous vs semi centralized

DCSR vs. tightness

DCSR vs. #of OCSs

Autonomous vs. Semi-centralized
  • Centralized  Semi-Centralized
  • High tightness or many OCSs  Autonomous  Semi-Centralized

100 OCSs

  • OCS: Architecture for caching OLAP results
  • Beneficial for ad-hoc, geographically spanned and possibly mobile users, who sporadically need to access a warehouse
  • Complimentary to both client-side-cache systems and distributed OLAP approaches
  • Future work: Prototype on top of a DBMS, support of multiple DWs, finer granularity of cached data, special queries.