code layout optimization for transaction processing workloads
Download
Skip this Video
Download Presentation
Code Layout Optimization for Transaction Processing Workloads

Loading in 2 Seconds...

play fullscreen
1 / 21

Code Layout Optimization for Transaction Processing Workloads - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Code Layout Optimization for Transaction Processing Workloads. Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo, Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero. 2006/05/29 KINS Kyuhwan Kim. Introduction. OLTP ( O n L ine T ransaction P rocessing)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Code Layout Optimization for Transaction Processing Workloads' - garret


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
code layout optimization for transaction processing workloads

Code Layout Optimization for Transaction Processing Workloads

Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo,

Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero

2006/05/29

KINS

Kyuhwan Kim

introduction
Introduction
  • OLTP (OnLine Transaction Processing)
    • A form of transaction processing conducted via computer network.
    • Electronic banking, order processing, e-commerce.
    • Large number of clients who continually access and update small portions of the database through short running transactions.
    • Large memory stall  Large instructions and data footprints and high communication miss rates.
introduction cont
Introduction (cont.)
  • Code Layout Optimization
    • Large applications have a particular problem:
      • A lot of instructions.
      • Can’t hold entire application on-chip at any one time.
      • Stalled waiting to fetch new instructions from memory.
    • Hold more useful instructions  improve performance
outline
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
code layout optimizations
Code Layout Optimizations
  • Spike
    • DTKS tool for performing code optimization after linking
    • Profile-driven optimization.
  • Three parts of Spike optimizer algorithm
    • Basic Block Chaining
    • Fine-Grain Procedure Splitting
    • Procedure Ordering
basic block chaining
Basic Block Chaining
  • Definition
    • Order the basic blocks within a procedure.
  • Algorithm
    • Simple greedy algorithm
    • Sort flow edges by weight
    • Chain two block with heaviest weight.
  • Gain
    • Improve instruction cache behavior
ex basic block chaining

Unconditional branch / Fall-through

A1

Conditional branch

10

A1

A1

10 Node weight

10

A2

10

0.6 0.4 Branch probability

A2

10

A3

10

A3

10

A4

6

0.6 0.4

A5

A4

6

4

A7

7.6

0.4 0.6

A7

A6

7.6

2.4

A8

10

A8

10

A5

4

A6

2.4

Ex) Basic Block Chaining
fine grain procedure splitting
Fine-Grain Procedure Splitting
  • Definition
    • Divide the chain into multiple code segments  new procedures.
  • Algorithm
    • Find unconditional branch or return. (just study)
    • Split into hot and cold part. (current available)
  • Gain
    • Extra degree of flexibility for the procedure ordering algorithm.
ex fine grain procedure splitting
Ex) Fine-Grain Procedure Splitting

Procedure 1

Unconditional branch

Procedure 2

Subroutine return

RET

Procedure 3

Subroutine return

RET

Procedure 4

Subroutine return

RET

procedure ordering
Procedure Ordering
  • Definition
    • Place related procedures near one another.
  • Algorithm
    • Build call graph and assign weight (# call).
    • Select the most heavily weighted edge and merge.
    • Use weights in original graph when merge.
    • Iterate until graph is reduced to a single node.
  • Gain
    • Improve instruction cache behavior
ex procedure ordering

7

7

B

A,C

B,D

A,C

A

8 1

1 1

4 10

1

D

E

E

3

B

C

8 1

1

D

E

2

D,B,A,C

E

Ex) Procedure Ordering

E,D,B,A,C

outline1
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
methodology
Methodology
  • OLTP Workload
    • TPC-B
    • Oracle 8.0.4
  • Collecting Profiles
    • OLTP profile data  Pixie.
    • Kernel profile  Tru64 Unix kprofile tool.
  • Hardware and Simulation Platforms
    • SimOS-Alpha environment
outline2
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
behavior of the db app only
Behavior of the DB App. Only
  • Instruction cache miss
    • X-axis: cache line size
    • Y-axis: # instruction cache miss
    • Reduction of misses is 55~65%.

Baseline OLTP binary

Optimized OLTP binary

experiment cont
Experiment (cont.)
  • Impact of different code layout optimization.
    • Procedure ordering  increase cache misses.
    • Largest benefit comes from basic block chaining.
    • Procedure ordering after splitting  improve performance further.
experiment cont1
Experiment (cont.)
  • Sequentially executed instructions.
    • Optimized binary  7.3 to over 10 instructions.
  • Temporal locality.
    • # instructions reused before eviction
    • Optimized binary  Increase # of instructions reused.
outline3
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
behavior of combined db app os
Behavior of Combined DB App. & OS
  • Instruction cache miss
    • Reduction of misses is 45~60%.
    • Reduction of misses is 55~65% (App. in isolation).

Baseline OLTP binary

Optimized OLTP binary

experiment cont2
Experiment (cont.)
  • Interference between App. and OS
    • Majority of app. misses arise due to self interference.
    • Kernel interferes very little with itself.

Baseline OLTP binary

Optimized OLTP binary

conclusion
Conclusion
  • Profile-driven compiler optimization to improve code layout in OLTP workloads.
  • App in isolation  reduce 55~65% cache misses.
  • With OS  reduce 45~60% cache misses.
  • Overall, these optimizations yield improvement in performance of 1.33 times
ad