Code layout optimization for transaction processing workloads
Download
1 / 21

Code Layout Optimization for Transaction Processing Workloads - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Code Layout Optimization for Transaction Processing Workloads. Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo, Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero. 2006/05/29 KINS Kyuhwan Kim. Introduction. OLTP ( O n L ine T ransaction P rocessing)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Code Layout Optimization for Transaction Processing Workloads' - garret


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Code layout optimization for transaction processing workloads

Code Layout Optimization for Transaction Processing Workloads

Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo,

Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero

2006/05/29

KINS

Kyuhwan Kim


Introduction
Introduction Workloads

  • OLTP (OnLine Transaction Processing)

    • A form of transaction processing conducted via computer network.

    • Electronic banking, order processing, e-commerce.

    • Large number of clients who continually access and update small portions of the database through short running transactions.

    • Large memory stall  Large instructions and data footprints and high communication miss rates.


Introduction cont
Introduction (cont.) Workloads

  • Code Layout Optimization

    • Large applications have a particular problem:

      • A lot of instructions.

      • Can’t hold entire application on-chip at any one time.

      • Stalled waiting to fetch new instructions from memory.

    • Hold more useful instructions  improve performance


Outline
Outline Workloads

  • Introduction

  • Code Layout Optimizations

  • Methodology

  • Behavior of the Database Application in Isolation

  • Combined Database Application and O/S Behavior

  • Conclusion


Code layout optimizations
Code Layout Optimizations Workloads

  • Spike

    • DTKS tool for performing code optimization after linking

    • Profile-driven optimization.

  • Three parts of Spike optimizer algorithm

    • Basic Block Chaining

    • Fine-Grain Procedure Splitting

    • Procedure Ordering


Basic block chaining
Basic Block Chaining Workloads

  • Definition

    • Order the basic blocks within a procedure.

  • Algorithm

    • Simple greedy algorithm

    • Sort flow edges by weight

    • Chain two block with heaviest weight.

  • Gain

    • Improve instruction cache behavior


Ex basic block chaining

Unconditional branch / Fall-through Workloads

A1

Conditional branch

10

A1

A1

10 Node weight

10

A2

10

0.6 0.4 Branch probability

A2

10

A3

10

A3

10

A4

6

0.6 0.4

A5

A4

6

4

A7

7.6

0.4 0.6

A7

A6

7.6

2.4

A8

10

A8

10

A5

4

A6

2.4

Ex) Basic Block Chaining


Fine grain procedure splitting
Fine-Grain Procedure Splitting Workloads

  • Definition

    • Divide the chain into multiple code segments  new procedures.

  • Algorithm

    • Find unconditional branch or return. (just study)

    • Split into hot and cold part. (current available)

  • Gain

    • Extra degree of flexibility for the procedure ordering algorithm.


Ex fine grain procedure splitting
Ex) Fine-Grain Procedure Splitting Workloads

Procedure 1

Unconditional branch

Procedure 2

Subroutine return

RET

Procedure 3

Subroutine return

RET

Procedure 4

Subroutine return

RET


Procedure ordering
Procedure Ordering Workloads

  • Definition

    • Place related procedures near one another.

  • Algorithm

    • Build call graph and assign weight (# call).

    • Select the most heavily weighted edge and merge.

    • Use weights in original graph when merge.

    • Iterate until graph is reduced to a single node.

  • Gain

    • Improve instruction cache behavior


Ex procedure ordering

7 Workloads

7

B

A,C

B,D

A,C

A

8 1

1 1

4 10

1

D

E

E

3

B

C

8 1

1

D

E

2

D,B,A,C

E

Ex) Procedure Ordering

E,D,B,A,C


Outline1
Outline Workloads

  • Introduction

  • Code Layout Optimizations

  • Methodology

  • Behavior of the Database Application in Isolation

  • Combined Database Application and O/S Behavior

  • Conclusion


Methodology
Methodology Workloads

  • OLTP Workload

    • TPC-B

    • Oracle 8.0.4

  • Collecting Profiles

    • OLTP profile data  Pixie.

    • Kernel profile  Tru64 Unix kprofile tool.

  • Hardware and Simulation Platforms

    • SimOS-Alpha environment


Outline2
Outline Workloads

  • Introduction

  • Code Layout Optimizations

  • Methodology

  • Behavior of the Database Application in Isolation

  • Combined Database Application and O/S Behavior

  • Conclusion


Behavior of the db app only
Behavior of the DB App. Only Workloads

  • Instruction cache miss

    • X-axis: cache line size

    • Y-axis: # instruction cache miss

    • Reduction of misses is 55~65%.

Baseline OLTP binary

Optimized OLTP binary


Experiment cont
Experiment (cont.) Workloads

  • Impact of different code layout optimization.

    • Procedure ordering  increase cache misses.

    • Largest benefit comes from basic block chaining.

    • Procedure ordering after splitting  improve performance further.


Experiment cont1
Experiment (cont.) Workloads

  • Sequentially executed instructions.

    • Optimized binary  7.3 to over 10 instructions.

  • Temporal locality.

    • # instructions reused before eviction

    • Optimized binary  Increase # of instructions reused.


Outline3
Outline Workloads

  • Introduction

  • Code Layout Optimizations

  • Methodology

  • Behavior of the Database Application in Isolation

  • Combined Database Application and O/S Behavior

  • Conclusion


Behavior of combined db app os
Behavior of Combined DB App. & OS Workloads

  • Instruction cache miss

    • Reduction of misses is 45~60%.

    • Reduction of misses is 55~65% (App. in isolation).

Baseline OLTP binary

Optimized OLTP binary


Experiment cont2
Experiment (cont.) Workloads

  • Interference between App. and OS

    • Majority of app. misses arise due to self interference.

    • Kernel interferes very little with itself.

Baseline OLTP binary

Optimized OLTP binary


Conclusion
Conclusion Workloads

  • Profile-driven compiler optimization to improve code layout in OLTP workloads.

  • App in isolation  reduce 55~65% cache misses.

  • With OS  reduce 45~60% cache misses.

  • Overall, these optimizations yield improvement in performance of 1.33 times


ad