adaptive data structures
Download
Skip this Video
Download Presentation
Adaptive Data Structures

Loading in 2 Seconds...

play fullscreen
1 / 45

Adaptive Data Structures - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

Adaptive Data Structures. Towards Declarative Queries on. Simon Zeltser. Based on the article by Nicolas Bruno and Pablo Castro. Contents. 1. Introduction. 2. LINQ on Rich Data Structures. 3. LINQ Query Optimization. 4. Conclusions and Discussion. Introduction. THE PROBLEM

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Adaptive Data Structures' - aaron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
adaptive data structures

Adaptive Data Structures

Towards Declarative Queries on

Simon Zeltser

Based on the article by Nicolas Bruno and Pablo Castro

contents
Contents

1

Introduction

2

LINQ on Rich Data Structures

3

LINQ Query Optimization

4

Conclusions and Discussion

Technion

introduction
Introduction
  • THE PROBLEM
    • There is an increasing number of applications that need to manage data outside the DBMS
    • Need for a solution to simplify the interaction between objects and data sources
    • Current solutions lack rich declarative query mechanism
  • THE NEED
    • Unified way to query various data sources
  • THE SOLUTION
    • LINQ (Language Integrated Query)

Technion

introduction1
Introduction
  • LINQ : Microsoft.NET 3.5 Solution
    • Accessing multiple data sources via the same API
    • Technology integrated into the programming language
    • Supports operations:
      • Traversal – grouping, joins
      • Filter – which rows
      • Projection –which columns

vargraduates =fromstudent in students

wherestudent.Degree = “Graduate”

orderbystudent.Name, student.Gender, student.Age

selectstudent;

  • BUT… The default implementation is simplistic
    • Appropriate for small ad-hoc structures in memory

Technion

introduction2
Introduction
  • THE GOAL OF THIS SESSION
    • Introduce LINQ key principles
    • Show model of customization of LINQ’s Execution Model on Rich Data Structures
    • Evaluate the results

Technion

linq high level architecture
LINQ – High Level Architecture

C# 3.0

Visual Basic

Other Languages…

.NET Language Integrated Query (LINQ)

LINQ – Enabled Data Sources

LINQ

To Objects

LINQ

To Entities

LINQ

To XML

LINQ

To Datasets

LINQ

To SQL

<xml

>

Objects

Databases

XML

Technion

compare two approaches
Compare two approaches
  • Iteration

List<String>matches = new List<String>();

// Find the matches

foreach(stringitemindata) {

if (item.StartsWith("Eric")){

matches.Add(item);

}

}

// Sort the matches

matches.Sort();

// Print out the matches

foreach(stringiteminmatches)

}

Console.WriteLine(item);

{

  • LINQ

// Find and sort matches

varmatches = fromn in data

wheren.StartsWith("Eric")

orderbyn

selectn;

// Print out the matches

foreach(varmatchinmatches)

}

Console.WriteLine(match);

{

Technion

language integration
Language Integration

Lambda

Expressions

Function

intStringLength(String s)

{

returns.Length();

{

Lambda Expression

s => s.Length();

Query

Syntax

var matches = from n in data

wheren.StartsWith("Eric")

orderby n

select n;

var matches = data

.Where(n => n.StartsWith("Eric"))

.OrderBy(n => n)

.Select(n => n)

Extension

Methods

publicstaticIEnumerable<TSource> Where<TSource> (thisIEnumerable<TSource> source, Func<TSource, bool> predicate)

var name = "Eric";

var age = 43;

var person = new { Name = "Eric", Age = 43 };

var names = new [] {"Eric", "Ryan", "Paul" };

foreach (var item in names)

Anonymous

Types

Technion

linq example
LINQ - Example

Lambda

Expressions

// Retrieve all CS students with more

// than 105 points

var query =

from stud in students

where ( stud.Faculty == “CS” && stud.Points > 105)

orderbystud.Pointsdescending

selectnew { Details = stud.Name + “:” + stud.Phone };

// Iterate over results

foreach(var student in query)

{

Console.WriteLine(student.Details);

}

Query

Syntax

Extension

Methods

Anonymous

Types

Technion

customizing linq execution model
Customizing LINQ Execution Model

EXPRESSION TREES

  • LINQ represents queries as in-memory abstract syntax tree
  • Query description and implementation are not tied together

THE PROBLEM

  • The default implementation of the operations uses fixed, general purpose algorithms

SUGGESTED SOLUTION

  • Change how the query is executed without changing how it’s expressed
  • Analyze alternative implementations of a given query and dynamically choose the most appropriate version depending on the context.

*

+

1

5

7

Technion

customizing linq execution model 2
Customizing LINQ Execution Model (2)

PROBLEM EXAMPLE

WHERE operator is implemented by performing a sequential scan over the input and evaluating the selection predicateon each tuple!

1

int[] A = {1, 2, 3, 10, 20, 30};

var q = from x in A

where x < 5

select 2*x;

foreach(intiin q)

Console.WriteLine(i);

var q = A.Where(x=>x<5)

.Select(x=>2*x);

2

IEnumerable<int> q = Enumerable.Project(

Enumerable.Where(A, AF1),

AF2);

bool AF1(int x) { return x<5; }

int AF2(int x) { return 2*x; }

3

Query Implementation:

IEnumerable<int> res = new List<int>();

foreach(int a in A)

if (AF1(a)) res.Add(AF2(a));

return res;

Technion

rich data structures dataset

ForeignKeyConstraint

Rich Data Structures - DataSet
  • In-memory cache of data
  • Typically populated from a database
  • Supports indexing of DataColumns via DataViews

DataSet object

DataTable object

DataTable object

DataRow

Unique

Constraint

Data

Column

Unique

Constraint

  • We will use LINQ on DataSet for demonstrating query optimization techniques

Technion

linq on rich data structures
LINQ on Rich Data Structures
  • Enable LINQ to work over DataSets.

EXAMPLE

    • Given R and S – two DataTables

from r in R.AsEnumerable()

join s in S.AsEnumerable()

on r.Field<int>(“x”) equals

s.Field<int>(“y”)

selectnew { a = r.Field<int>(“a”),

b = s.Field<int>(“b”) };

Compile and run-time phases on an implementation of our prototype

LINQ on DataSet

Standard

C# Code

Interm.

Language

Expression

Tree

Optimized

Expression

Tree

Interm.

Language

Compile Time

Run Time

Self-tuning

State

DataSet

Technion

expression tree optimizer
Expression Tree Optimizer

Our solution will be built according to the following architecture

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion

query cost estimator
Query Cost Estimator

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion

query estimation cost model
Query Estimation - Cost Model
  • Follow traditional database approach:
    • COST: {execution plans} -> [expected execution time]
    • Relies on:
      • a set of statistics maintained in DataTablesfor some of its columns
      • formulas to estimate selectivity of predicates and cardinality of sub-plans
      • formulas to estimate the expected costs of query execution for every operator

Technion

cardinality estimation
Cardinality Estimation
  • Returns an approximate number of rows that each operator in a query plan would output
    • To reduce the overhead, we will use only these statistical estimators:
      • maxVal – maximum number of distinct values
      • minVal– minimum number of distinct values
      • dVal – number of distinct values in a column
    • If statistics are unavailable, rely on “magic numbers” until automatically creation of statistics

Technion

predicate selectivity estimation
Predicate Selectivity Estimation
  • Let: σp(T ) be an arbitrary expression.
  • The cardinality of T is defined: Card(σp(T )) =sel(p)·
    • Under this definition we define:
  • COSTT(Execution Plan) = Σ (COST(p))
  • EXAMPLE:Consider full table scan of table T):

COST(T) = Card(T) * MEM_ACCESS_COST

For each p in {operators of T}

Average Cost Of Memory Access

Technion

predicate selectivity estimation1
Predicate Selectivity Estimation

Intuition:

We model sel(co<=c<=c1) as the probability to get a “c” value in interval [c0, c1] among all possible “c” values

c1

maxVal(c)

c

  • Let: σp(T ) be an arbitrary expression.
  • The cardinality of T is defined: Card(σp(T )) =sel(p)·
    • Under this definition we define:
  • COSTT(Execution Plan) = Σ (COST(p))
  • EXAMPLE:Consider full table scan of table T:

COST(T) = Card(T) * MEM_ACCESS_COST

c0

minVal(c)

For each p in {operators of T}

Average Cost Of Memory Access

Technion

predicate selectivity estimation2
Predicate Selectivity Estimation

Consider now a join predicate: T1 c1=c2T2

  • Card(T1 c1=c2 T2)=
  • Let: σp(T ) be an arbitrary expression.
  • The cardinality of T is defined: Card(σp(T )) =sel(p)·
    • Under this definition we define:
  • COSTT(Execution Plan) = Σ (COST(p))
  • EXAMPLE:Consider full table scan of table T):

COST(T) = Card(T) * MEM_ACCESS_COST

For each p in {operators of T}

Average Cost Of Memory Access

Technion

query analyzer
Query Analyzer

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion

execution alternatives
Execution Alternatives
  • Rely on indexes on DataColumns when possible
    • Example: σa=7∧(b+c)<20

5

Index on “a”

column

3

7

Alternative 1:

Alternative 2:

2

4

Full Table Scan

a=7

b+c < 20

Technion

analyzing execution plans
Analyzing Execution Plans
  • Global vs. Local Execution Plan – EXAMPLE:
  • Local Execution Plan
      • HashJoin?
      • IndexJoin?
      • MergeJoin?

Global Execution Plan

Technion

enumeration architecture
Enumeration Architecture
  • Two phases:
    • First phase: Join reordering based on estimated cardinalities
    • Second phase: Choose the best physical implementation for each operator
  • EXAMPLE:Suppose we analyze JOIN operator.
    • We evaluate the following JOIN implementations:
      • Hash Join
      • Merge Join (inputs must be sorted in the join columns)
      • Index Join (index on the inner join column must be available)
      • Other possible calculation options
  • Choose the alternative with the smallest cost

Technion

query analysis
Query Analysis

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion

self tuning organization
Self Tuning Organization
  • We want to reach the smallest query execution time.
    • Indexes can be used to speedup query execution
  • PROBLEM:
    • It might become problematic to forecast in advance what indexes to build for optimum performance
  • SOLUTION:
    • Continuous monitoring/tuning component that addresses the challenge of choosing and building adequate indexes and statistics automatically

Technion

self tuning organization example
Self Tuning Organization - Example
  • Consider the following execution plan:
  • The selection predicate Name=“Pam” over CustomersDataTable can be improved if an index on Customers(Name) is built
  • Both hash joins can be improved if indexes I2 and I3 are available, since we can transform hash join into index join

* The three sub-plans enclosed in

dotted lines might be improved if

suitable indexes were present

Technion

index tuning
Index Tuning
  • High-Level Description:
    • Identify a good set of candidate indexesthat would improve performance if they were available.
    • Later, when the optimized queries are evaluated, we aggregate the relative benefits of both candidate and existing indexes.
    • Based on this information, we periodically trigger index creations or deletions, taking into account storage constraints, overall utility of the resulting indexes, and the cost to creating and maintaining them.

Technion

index tuning algorithm
Index tuning algorithm
  • Notation:
    • H – a set of candidate indexes to materialize
    • T – task set for query qi
      • Ii – either a candidate or an existing index
      • δIi – amount that I would speed up query q

H (initially empty)

Task Set

I1, δI1

I2, δI2

. . …

In, δIn

Technion

index tuning algorithm1
Index tuning algorithm
  • Notation:
    • ΔI – value maintained for each index I
    • Materialized index – already created one
      • SELECT query: ΔI = ΔI + δI
      • UPDATE query: ΔI = ΔI – δI

H

I1

Task Set

I1, δI1

I1, δI1

I2, δI2

. . …

In, δIn

Technion

index tuning algorithm2
Index Tuning Algorithm
  • The purpose of ΔI:
  • We maintain ΔI on every query evaluation
  • If the potential aggregated benefit of materializing a candidate index exceeds its creation cost, we should create it, since we gathered enough evidence that the index is useful

Technion

index tuning algorithm3
Index tuning algorithm
  • Remove “bad” indexes phase
  • Notation:
    • Δmin– minimum Δ value for index I
    • Δmax– maximum Δ value for index I
    • BI – the cost of creating index I
    • Residual(I)= BI – (Δmax – Δ)

(the “slack” an index has before being deemed “droppable”)

IF (Residual(I)) <= 0) THEN Drop(I)

    • Net-Benefit(I) = (Δ-Δmin)-BI

(the benefit from creating the index)

IF (Net-Benefit(I) >= 0) THEN Add(I)

Technion

index tuning algorithm4
Index tuning algorithm
  • Notation:
    • ITM – all the indexes from H which creation is cost effective
    • ITD – subset of existing indexes such that:
      • ITD fits in existing memory
      • It’s still cost effective to create new index I after possibly dropping members from ITD
    • If creating index I is more effective than maintaining existing indexes in ITD, DROP(ITD) && CREATE(I)
    • Remove I from H (set of candidate indexes to materialize)

Technion

experimental evaluation
Experimental Evaluation

Consider the following schema:

checkCarts($1) =

from p inProducts.AsEnumerable()

join cart inCarts.AsEnumerable()

onp.Field<int>("id")

equalscart.Field<int>("p_id")

join c inCustomers.AsEnumerable()

oncart.Field<int>("cu_id")

equalsc.Field<int>("id")

where c.name = $1

selectnew { cart, p }

browseProducts($1) =

from p inProducts.AsEnumerable()

join c inCategories.AsEnumerable()

onp.Field<int>("ca_id") equals

c.Field<int>("id")

where c.par id = $1

select p

  • Generated:
    • 200,000 products
    • 50,000 customers
    • 1,000 categories
    • 5,000 items in the shopping carts

Possible Indexes

I1 Categories(par_id)

I2 Products(c_id)

I3 Carts(cu_id)

I4 Products(ca_id)

I5 Customers(name)

Technion

experimental evaluation cont
Experimental Evaluation – Cont.

Generated schedule when tuning was disabled

Technion

experimental evaluation cont1
Experimental Evaluation – Cont.

Generated schedule when tuning was enabled

Technion

summary
Summary
  • We’ve discussed:
    • LINQ – for declarative query formulation
    • DataSet - a uniform way of representing in-memory data.
    • A lightweight optimizer for automatically adjusting query execution strategies
  • Article’s main contribution:
    • NOT a new query processing technique
    • BUT: careful engineering of traditional database concepts in a new context

Technion

simon zeltser

Thank You !

Simon Zeltser

linq execution model
LINQ Execution Model
  • Compiler merges LINQ extension methods
  • Query syntax is converted to function calls and lambda expressions
  • Lambda expressions are converted to expression trees
  • Adds query operations to IEnumerable<T>
  • Expressions are evaluated at run-time
  • Parsed and type checked at compile-time
  • At compile time
  • Compiler finds a query pattern
  • Query is executed lazily
  • Compiler infers types produced by queries
  • Datasets are strongly typed
  • Operations on
  • data sets are strongly typed
  • Specialized or base
  • Can optimize and
  • re-write query
  • Expressions and operations
  • can execute remotely
  • At run-time, when results are used
  • We can force evaluations (ToArray())

Technion

ad