To share or not to share
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

To Share or Not to Share? PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

To Share or Not to Share?. Ryan Johnson. Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University. **HP LABS. output. output. aggregate. aggregate. join. scan.

Download Presentation

To Share or Not to Share?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


To share or not to share

To Share or Not to Share?

Ryan Johnson

Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi

PARALLEL DATA LABORATORY

Carnegie Mellon University

**HP LABS


Motivation for work sharing

output

output

aggregate

aggregate

join

scan

scan

scan

Dept

Student

Student

Motivation For Work Sharing

Query:

What is the average GPA in the ECE dept.?

Query:

What is the highest undergraduate GPA?

http://www.pdl.cmu.edu/


Motivation for work sharing1

output

output

aggregate

aggregate

join

scan

scan

scan

Dept

Student

Student

Motivation For Work Sharing

  • Many queries in system

    • Similar requests

    • Redundant work

  • Work Sharing

    • Detect redundant work

    • Compute results once and share

  • Big win for I/O, uniprocessors

  • 2x speedup for TPC-H queries [hariz05]

http://www.pdl.cmu.edu/


Work sharing on modern hardware

8 CPU

core

core

7x

L1

L1

L2

L2

L2

Memory

Memory

Work Sharing on Modern Hardware

2.0

1 CPU

1.5

1.0

Speedup due to WS

0.5

0.0

0

15

30

45

Shared Queries

  • Work sharing can hurt performance!

http://www.pdl.cmu.edu/


Contributions

Contributions

  • Observation

    • Work sharing can hurt performance on parallel hardware

  • Analysis

    • Develop intuitive analytical model of work sharing

    • Identify trade-off between total work, critical path

  • Application

    • Model-based policy outperforms static ones by up to 6x

http://www.pdl.cmu.edu/


Outline

Outline

  • Introduction

  • Part I: Intuition and Model

  • Part II: Analysis and Experiments

http://www.pdl.cmu.edu/


Challenges of exploiting work sharing

Challenges of Exploiting Work Sharing

  • Independent execution only?

    • Load reduction from work sharing can be useful

  • Work sharing only?

    • Indiscriminate application can hurt performance

  • To share or not to share?

    • System and workload dependent

    • Adapt decisions at runtime

  • Must understand work sharing to exploit it fully

http://www.pdl.cmu.edu/


Work sharing vs parallelism

Scan

Join

Aggregate

Query 2 response time

Query 1 response time

Critical Paths

Query 2

Query 1

Work Sharing vs. Parallelism

P = 4.33

Independent Execution

http://www.pdl.cmu.edu/


Work sharing vs parallelism1

Scan

Join

Aggregate

Query 2 response time

Query 1 response time

Query 2

Query 1

Work Sharing vs. Parallelism

P = 4.33

P = 2.75

Critical path now longer

Penalty

Shared Execution

  • Total work and critical path both important

http://www.pdl.cmu.edu/


Understanding work sharing

Understanding Work Sharing

  • Performance depends on two factors:

  • Work sharing presents a trade-off

    • Reduces total work

    • Potentially lengthens critical path

  • Balance both factors or performance suffers

http://www.pdl.cmu.edu/


Basis for a model

Basis for a Model

  • “Closed” system

    • Consistent high load

    • Throughput computing

    • Assumed in most benchmarks

    • Fixed number of clients

  • Little’s Law governs throughput

    • Higher response time = lower throughput

    • Total work not a direct factor!

  • Load reduction secondary to response time

http://www.pdl.cmu.edu/


Predicting response time

Predicting Response Time

  • Case 1: Compute-bound

  • Case 2: Critical path-bound

  • Larger bottleneck determines response time

  • Model provides u and pmax

http://www.pdl.cmu.edu/


An analytical model of work sharing

Throughput for m queries and n processors

Improved by work sharing

Potentially worsened by work sharing

An Analytical Model of Work Sharing

U = requested utilization

Pmax = longest pipe stage

  • Sharing helpful when Xshared > Xalone

http://www.pdl.cmu.edu/


Outline1

Outline

  • Introduction

  • Part I: Intuition and Model

  • Part II: Analysis and Experiments

http://www.pdl.cmu.edu/


Experimental setup

Experimental Setup

  • Hardware

    • Sun T2000 “Niagara” with 16 GB RAM

    • 8 cores (32 threads)

    • Solaris processor sets vary effective CPU count

  • Cordoba

    • Staged DBMS

    • Naturally exposes work sharing

    • Flexible work sharing policies

  • 1GB TPCH dataset

    • Fixed Client and CPU counts per run

http://www.pdl.cmu.edu/


Model validation tpch q1

Predicted vs. Measured Performance

1.4

1 CPU

1.2

1 CPU model

1

2 CPU

2 CPU model

0.8

Speedup due to WS

8 CPU

8 CPU model

0.6

32 CPU

0.4

32 CPU model

0.2

0

0

15

30

45

Shared Queries

Model Validation: TPCH Q1

  • Avg/max error: 5.7% / 22%

http://www.pdl.cmu.edu/


Model validation tpch q4

Model Validation: TPCH Q4

  • Behavior varies with both system and workload

http://www.pdl.cmu.edu/


Exploring ws vs parallelism

Independent - 37%

Serial - 4%

Shared - 59%

Exploring WS vs. Parallelism

  • Work sharing splits query into three parts

    • Independent work

      • Per-query, parallel

      • Total work

    • Serial work

      • Per-query, serial

      • Critical path

    • Shared work

      • Computed once

      • “Free” after first query

Example:

http://www.pdl.cmu.edu/


Exploring ws vs parallelism1

Potential Speedup

Exploring WS vs. Parallelism

2.5

CPUs

2

4

8

1.5

16

Benefit from Work Sharing

32

1

0.5

0

0

8

16

24

32

  • Behavior matches previously published results

Shared Queries

http://www.pdl.cmu.edu/


Exploring ws vs parallelism2

Potential Speedup

Saturated

Exploring WS vs. Parallelism

2.5

CPUs

2

4

8

1.5

Benefit from Work Sharing

1

0.5

0

0

8

16

24

32

Shared Queries

http://www.pdl.cmu.edu/


Exploring ws vs parallelism3

Potential Speedup

Saturated

Exploring WS vs. Parallelism

2.5

CPUs

2

4

8

1.5

16

Benefit from Work Sharing

32

1

0.5

0

0

8

16

24

32

Shared Queries

http://www.pdl.cmu.edu/


Exploring ws vs parallelism4

Potential Speedup

Saturated

Exploring WS vs. Parallelism

2.5

CPUs

2

4

8

1.5

16

Benefit from Work Sharing

32

1

0.5

0

0

8

16

24

32

  • More processors shift bottleneck to critical path

Shared Queries

http://www.pdl.cmu.edu/


Performance impact of serial work

1%

2%

7%

Performance Impact of Serial Work

Impact of Serial Work (32 CPU)

2.5

0%

2

1.5

Benefit from Work Sharing

1

0.5

0

0

10

20

30

40

Shared Queries

  • Critical path quickly becomes major bottleneck

http://www.pdl.cmu.edu/


Model guided work sharing

Model-guided Work Sharing

  • Integrate predictive model into Cordoba

    • Predict benefit of work sharing for each new query

  • Consider multiple groups of queries at once

    • Shorter critical path, increased parallelism

  • Experimental setup

    • Profile run with 2 clients, 2 CPUs

    • Extract model parameters with profiling tools

    • 20 clients submit mix of TPCH Q1 and Q4

  • Compare against always-, never-share policies

http://www.pdl.cmu.edu/


Comparison of work sharing strategies

32 CPU

250

200

150

Queries/min

100

50

0

50/50

50/50

All Q4

All Q4

All Q1

All Q1

Query Ratio

Query Ratio

Comparison of Work Sharing Strategies

2 CPU

200

always share

model guided

150

never share

100

Queries/min

50

0

  • Model-based policy balances critical path and load

http://www.pdl.cmu.edu/


Related work

Related Work

  • Many existing work sharing schemes

    • Identification occurs at different stages in the query’s lifetime

    • All allow pipelined query execution

Materialized Views

[rouss82]

Multiple Query Optimization

[roy00]

Staged DBMS

[hariz05]

Synchronized Scanning

[lang07]

Late

Early

Schema design

Query compilation

Query execution

Buffer Pool Access

  • Model describes all types of work sharing

http://www.pdl.cmu.edu/


Conclusions

Conclusions

  • Work sharing can hurt performance

    • Highly parallel, memory resident machines

  • Intuitive analytical model captures behavior

    • Trade-off between load reduction and critical path

  • Model-guided work sharing highly effective

    • Outperforms static policies by up to 6x

http://www.cs.cmu.edu/~StagedDB/

http://www.pdl.cmu.edu/


References

References

  • [hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005.

  • [lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007.

  • [rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982.

  • [roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000.

http://www.cs.cmu.edu/~StagedDB/

http://www.pdl.cmu.edu/


  • Login