Adaptive data structures
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Adaptive Data Structures PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

Adaptive Data Structures. Towards Declarative Queries on. Simon Zeltser. Based on the article by Nicolas Bruno and Pablo Castro. Contents. 1. Introduction. 2. LINQ on Rich Data Structures. 3. LINQ Query Optimization. 4. Conclusions and Discussion. Introduction. THE PROBLEM

Download Presentation

Adaptive Data Structures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Adaptive data structures

Adaptive Data Structures

Towards Declarative Queries on

Simon Zeltser

Based on the article by Nicolas Bruno and Pablo Castro


Contents

Contents

1

Introduction

2

LINQ on Rich Data Structures

3

LINQ Query Optimization

4

Conclusions and Discussion

Technion


Introduction

Introduction

  • THE PROBLEM

    • There is an increasing number of applications that need to manage data outside the DBMS

    • Need for a solution to simplify the interaction between objects and data sources

    • Current solutions lack rich declarative query mechanism

  • THE NEED

    • Unified way to query various data sources

  • THE SOLUTION

    • LINQ (Language Integrated Query)

Technion


Introduction1

Introduction

  • LINQ : Microsoft.NET 3.5 Solution

    • Accessing multiple data sources via the same API

    • Technology integrated into the programming language

    • Supports operations:

      • Traversal – grouping, joins

      • Filter – which rows

      • Projection –which columns

        vargraduates =fromstudent in students

        wherestudent.Degree = “Graduate”

        orderbystudent.Name, student.Gender, student.Age

        selectstudent;

  • BUT… The default implementation is simplistic

    • Appropriate for small ad-hoc structures in memory

Technion


Introduction2

Introduction

  • THE GOAL OF THIS SESSION

    • Introduce LINQ key principles

    • Show model of customization of LINQ’s Execution Model on Rich Data Structures

    • Evaluate the results

Technion


Linq high level architecture

LINQ – High Level Architecture

C# 3.0

Visual Basic

Other Languages…

.NET Language Integrated Query (LINQ)

LINQ – Enabled Data Sources

LINQ

To Objects

LINQ

To Entities

LINQ

To XML

LINQ

To Datasets

LINQ

To SQL

<xml

>

Objects

Databases

XML

Technion


Compare two approaches

Compare two approaches

  • Iteration

    List<String>matches = new List<String>();

    // Find the matches

    foreach(stringitemindata) {

    if (item.StartsWith("Eric")){

    matches.Add(item);

    }

    }

    // Sort the matches

    matches.Sort();

    // Print out the matches

    foreach(stringiteminmatches)

    }

    Console.WriteLine(item);

    {

  • LINQ

    // Find and sort matches

    varmatches = fromn in data

    wheren.StartsWith("Eric")

    orderbyn

    selectn;

    // Print out the matches

    foreach(varmatchinmatches)

    }

    Console.WriteLine(match);

    {

Technion


Language integration

Language Integration

Lambda

Expressions

Function

intStringLength(String s)

{

returns.Length();

{

Lambda Expression

s => s.Length();

Query

Syntax

var matches = from n in data

wheren.StartsWith("Eric")

orderby n

select n;

var matches = data

.Where(n => n.StartsWith("Eric"))

.OrderBy(n => n)

.Select(n => n)

Extension

Methods

publicstaticIEnumerable<TSource> Where<TSource>(thisIEnumerable<TSource> source, Func<TSource, bool> predicate)

var name = "Eric";

var age = 43;

var person = new { Name = "Eric", Age = 43 };

var names = new [] {"Eric", "Ryan", "Paul" };

foreach (var item in names)

Anonymous

Types

Technion


Linq example

LINQ - Example

Lambda

Expressions

// Retrieve all CS students with more

// than 105 points

var query =

from stud in students

where ( stud.Faculty == “CS” && stud.Points > 105)

orderbystud.Pointsdescending

selectnew { Details = stud.Name + “:” + stud.Phone };

// Iterate over results

foreach(var student in query)

{

Console.WriteLine(student.Details);

}

Query

Syntax

Extension

Methods

Anonymous

Types

Technion


Customizing linq execution model

Customizing LINQ Execution Model

EXPRESSION TREES

  • LINQ represents queries as in-memory abstract syntax tree

  • Query description and implementation are not tied together

    THE PROBLEM

  • The default implementation of the operations uses fixed, general purpose algorithms

    SUGGESTED SOLUTION

  • Change how the query is executed without changing how it’s expressed

  • Analyze alternative implementations of a given query and dynamically choose the most appropriate version depending on the context.

*

+

1

5

7

Technion


Customizing linq execution model 2

Customizing LINQ Execution Model (2)

PROBLEM EXAMPLE

WHERE operator is implemented by performing a sequential scan over the input and evaluating the selection predicateon each tuple!

1

int[] A = {1, 2, 3, 10, 20, 30};

var q = from x in A

where x < 5

select 2*x;

foreach(intiin q)

Console.WriteLine(i);

var q = A.Where(x=>x<5)

.Select(x=>2*x);

2

IEnumerable<int> q = Enumerable.Project(

Enumerable.Where(A, AF1),

AF2);

bool AF1(int x) { return x<5;}

int AF2(int x) { return 2*x;}

3

Query Implementation:

IEnumerable<int> res = new List<int>();

foreach(int a in A)

if (AF1(a)) res.Add(AF2(a));

return res;

Technion


Rich data structures dataset

ForeignKeyConstraint

Rich Data Structures - DataSet

  • In-memory cache of data

  • Typically populated from a database

  • Supports indexing of DataColumns via DataViews

DataSet object

DataTable object

DataTable object

DataRow

Unique

Constraint

Data

Column

Unique

Constraint

  • We will use LINQ on DataSet for demonstrating query optimization techniques

Technion


Linq on rich data structures

LINQ on Rich Data Structures

  • Enable LINQ to work over DataSets.

    EXAMPLE

    • Given R and S – two DataTables

from r in R.AsEnumerable()

join s in S.AsEnumerable()

on r.Field<int>(“x”) equals

s.Field<int>(“y”)

selectnew { a = r.Field<int>(“a”),

b = s.Field<int>(“b”) };

Compile and run-time phases on an implementation of our prototype

LINQ on DataSet

Standard

C# Code

Interm.

Language

Expression

Tree

Optimized

Expression

Tree

Interm.

Language

Compile Time

Run Time

Self-tuning

State

DataSet

Technion


Expression tree optimizer

Expression Tree Optimizer

Our solution will be built according to the following architecture

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion


Query cost estimator

Query Cost Estimator

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion


Query estimation cost model

Query Estimation - Cost Model

  • Follow traditional database approach:

    • COST: {execution plans} -> [expected execution time]

    • Relies on:

      • a set of statistics maintained in DataTablesfor some of its columns

      • formulas to estimate selectivity of predicates and cardinality of sub-plans

      • formulas to estimate the expected costs of query execution for every operator

Technion


Cardinality estimation

Cardinality Estimation

  • Returns an approximate number of rows that each operator in a query plan would output

    • To reduce the overhead, we will use only these statistical estimators:

      • maxVal – maximum number of distinct values

      • minVal– minimum number of distinct values

      • dVal – number of distinct values in a column

    • If statistics are unavailable, rely on “magic numbers” until automatically creation of statistics

Technion


Predicate selectivity estimation

Predicate Selectivity Estimation

  • Let: σp(T ) be an arbitrary expression.

  • The cardinality of T is defined: Card(σp(T )) =sel(p)·

    • Under this definition we define:

  • COSTT(Execution Plan) = Σ (COST(p))

  • EXAMPLE:Consider full table scan of table T):

    COST(T) = Card(T) * MEM_ACCESS_COST

For each p in {operators of T}

Average Cost Of Memory Access

Technion


Predicate selectivity estimation1

Predicate Selectivity Estimation

Intuition:

We model sel(co<=c<=c1) as the probability to get a “c” value in interval [c0, c1] among all possible “c” values

c1

maxVal(c)

c

  • Let: σp(T ) be an arbitrary expression.

  • The cardinality of T is defined: Card(σp(T )) =sel(p)·

    • Under this definition we define:

  • COSTT(Execution Plan) = Σ (COST(p))

  • EXAMPLE:Consider full table scan of table T:

    COST(T) = Card(T) * MEM_ACCESS_COST

c0

minVal(c)

For each p in {operators of T}

Average Cost Of Memory Access

Technion


Predicate selectivity estimation2

Predicate Selectivity Estimation

Consider now a join predicate: T1 c1=c2T2

  • Card(T1 c1=c2 T2)=

  • Let: σp(T ) be an arbitrary expression.

  • The cardinality of T is defined: Card(σp(T )) =sel(p)·

    • Under this definition we define:

  • COSTT(Execution Plan) = Σ (COST(p))

  • EXAMPLE:Consider full table scan of table T):

    COST(T) = Card(T) * MEM_ACCESS_COST

For each p in {operators of T}

Average Cost Of Memory Access

Technion


Query analyzer

Query Analyzer

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion


Execution alternatives

Execution Alternatives

  • Rely on indexes on DataColumns when possible

    • Example: σa=7∧(b+c)<20

5

Index on “a”

column

3

7

Alternative 1:

Alternative 2:

2

4

Full Table Scan

a=7

b+c < 20

Technion


Analyzing execution plans

Analyzing Execution Plans

  • Global vs. Local Execution Plan – EXAMPLE:

  • Local Execution Plan

    • HashJoin?

    • IndexJoin?

    • MergeJoin?

Global Execution Plan

Technion


Enumeration architecture

Enumeration Architecture

  • Two phases:

    • First phase: Join reordering based on estimated cardinalities

    • Second phase: Choose the best physical implementation for each operator

  • EXAMPLE:Suppose we analyze JOIN operator.

    • We evaluate the following JOIN implementations:

      • Hash Join

      • Merge Join (inputs must be sorted in the join columns)

      • Index Join (index on the inner join column must be available)

      • Other possible calculation options

  • Choose the alternative with the smallest cost

Technion


Query analysis

Query Analysis

Self Tuning Organizer

Query

Analyzer

Index

Reorganizer

Oscillation

Manager

Cost Model

Query Cost

Estimator

Statistics

Manager

Technion


Self tuning organization

Self Tuning Organization

  • We want to reach the smallest query execution time.

    • Indexes can be used to speedup query execution

  • PROBLEM:

    • It might become problematic to forecast in advance what indexes to build for optimum performance

  • SOLUTION:

    • Continuous monitoring/tuning component that addresses the challenge of choosing and building adequate indexes and statistics automatically

Technion


Self tuning organization example

Self Tuning Organization - Example

  • Consider the following execution plan:

  • The selection predicate Name=“Pam” over CustomersDataTable can be improved if an index on Customers(Name) is built

  • Both hash joins can be improved if indexes I2 and I3 are available, since we can transform hash join into index join

* The three sub-plans enclosed in

dotted lines might be improved if

suitable indexes were present

Technion


Algorithm for automatic index tuning

Algorithm for automatic index tuning

Technion


Index tuning

Index Tuning

  • High-Level Description:

    • Identify a good set of candidate indexesthat would improve performance if they were available.

    • Later, when the optimized queries are evaluated, we aggregate the relative benefits of both candidate and existing indexes.

    • Based on this information, we periodically trigger index creations or deletions, taking into account storage constraints, overall utility of the resulting indexes, and the cost to creating and maintaining them.

Technion


Algorithm for automatic index tuning1

Algorithm for automatic index tuning

Technion


Index tuning algorithm

Index tuning algorithm

  • Notation:

    • H – a set of candidate indexes to materialize

    • T – task set for query qi

      • Ii – either a candidate or an existing index

      • δIi – amount that I would speed up query q

H (initially empty)

Task Set

I1, δI1

I2, δI2

. . …

In, δIn

Technion


Algorithm for automatic index tuning2

Algorithm for automatic index tuning

Technion


Index tuning algorithm1

Index tuning algorithm

  • Notation:

    • ΔI – value maintained for each index I

    • Materialized index – already created one

      • SELECT query: ΔI = ΔI + δI

      • UPDATE query: ΔI = ΔI – δI

H

I1

Task Set

I1, δI1

I1, δI1

I2, δI2

. . …

In, δIn

Technion


Index tuning algorithm2

Index Tuning Algorithm

  • The purpose of ΔI:

  • We maintain ΔI on every query evaluation

  • If the potential aggregated benefit of materializing a candidate index exceeds its creation cost, we should create it, since we gathered enough evidence that the index is useful

Technion


Algorithm for automatic index tuning3

Algorithm for automatic index tuning

Technion


Index tuning algorithm3

Index tuning algorithm

  • Remove “bad” indexes phase

  • Notation:

    • Δmin– minimum Δ value for index I

    • Δmax– maximum Δ value for index I

    • BI – the cost of creating index I

    • Residual(I)= BI – (Δmax – Δ)

      (the “slack” an index has before being deemed “droppable”)

      IF (Residual(I)) <= 0) THEN Drop(I)

    • Net-Benefit(I) = (Δ-Δmin)-BI

      (the benefit from creating the index)

      IF (Net-Benefit(I) >= 0) THEN Add(I)

Technion


Algorithm for automatic index tuning4

Algorithm for automatic index tuning

Technion


Index tuning algorithm4

Index tuning algorithm

  • Notation:

    • ITM – all the indexes from H which creation is cost effective

    • ITD – subset of existing indexes such that:

      • ITD fits in existing memory

      • It’s still cost effective to create new index I after possibly dropping members from ITD

    • If creating index I is more effective than maintaining existing indexes in ITD, DROP(ITD) && CREATE(I)

    • Remove I from H (set of candidate indexes to materialize)

Technion


Experimental evaluation

Experimental Evaluation

Consider the following schema:

checkCarts($1) =

from p inProducts.AsEnumerable()

join cart inCarts.AsEnumerable()

onp.Field<int>("id")

equalscart.Field<int>("p_id")

join c inCustomers.AsEnumerable()

oncart.Field<int>("cu_id")

equalsc.Field<int>("id")

where c.name = $1

selectnew { cart, p }

browseProducts($1) =

from p inProducts.AsEnumerable()

join c inCategories.AsEnumerable()

onp.Field<int>("ca_id") equals

c.Field<int>("id")

where c.par id = $1

select p

  • Generated:

    • 200,000 products

    • 50,000 customers

    • 1,000 categories

    • 5,000 items in the shopping carts

Possible Indexes

I1 Categories(par_id)

I2 Products(c_id)

I3 Carts(cu_id)

I4 Products(ca_id)

I5 Customers(name)

Technion


Execution plans for evaluation queries

Execution plans for evaluation queries

Technion


Experimental evaluation cont

Experimental Evaluation – Cont.

Generated schedule when tuning was disabled

Technion


Experimental evaluation cont1

Experimental Evaluation – Cont.

Generated schedule when tuning was enabled

Technion


Summary

Summary

  • We’ve discussed:

    • LINQ – for declarative query formulation

    • DataSet - a uniform way of representing in-memory data.

    • A lightweight optimizer for automatically adjusting query execution strategies

  • Article’s main contribution:

    • NOT a new query processing technique

    • BUT: careful engineering of traditional database concepts in a new context

Technion


Simon zeltser

Thank You !

Simon Zeltser


Linq execution model

LINQ Execution Model

  • Compiler merges LINQ extension methods

  • Query syntax is converted to function calls and lambda expressions

  • Lambda expressions are converted to expression trees

  • Adds query operations to IEnumerable<T>

  • Expressions are evaluated at run-time

  • Parsed and type checked at compile-time

  • At compile time

  • Compiler finds a query pattern

  • Query is executed lazily

  • Compiler infers types produced by queries

  • Datasets are strongly typed

  • Operations on

  • data sets are strongly typed

  • Specialized or base

  • Can optimize and

  • re-write query

  • Expressions and operations

  • can execute remotely

  • At run-time, when results are used

  • We can force evaluations (ToArray())

Technion


  • Login