code layout optimization for transaction processing workloads n.
Download
Skip this Video
Download Presentation
Code Layout Optimization for Transaction Processing Workloads

Loading in 2 Seconds...

play fullscreen
1 / 21

Code Layout Optimization for Transaction Processing Workloads - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Code Layout Optimization for Transaction Processing Workloads. Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo, Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero. 2006/05/29 KINS Kyuhwan Kim. Introduction. OLTP ( O n L ine T ransaction P rocessing)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Code Layout Optimization for Transaction Processing Workloads' - garret


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
code layout optimization for transaction processing workloads

Code Layout Optimization for Transaction Processing Workloads

Alex Ramirez, Luiz Adnre Barroso, Kourosh Gharachorloo,

Robert Cohn, Josep Larriba-Pey, P.Geoffrey Lowney, and Mateo Valero

2006/05/29

KINS

Kyuhwan Kim

introduction
Introduction
  • OLTP (OnLine Transaction Processing)
    • A form of transaction processing conducted via computer network.
    • Electronic banking, order processing, e-commerce.
    • Large number of clients who continually access and update small portions of the database through short running transactions.
    • Large memory stall  Large instructions and data footprints and high communication miss rates.
introduction cont
Introduction (cont.)
  • Code Layout Optimization
    • Large applications have a particular problem:
      • A lot of instructions.
      • Can’t hold entire application on-chip at any one time.
      • Stalled waiting to fetch new instructions from memory.
    • Hold more useful instructions  improve performance
outline
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
code layout optimizations
Code Layout Optimizations
  • Spike
    • DTKS tool for performing code optimization after linking
    • Profile-driven optimization.
  • Three parts of Spike optimizer algorithm
    • Basic Block Chaining
    • Fine-Grain Procedure Splitting
    • Procedure Ordering
basic block chaining
Basic Block Chaining
  • Definition
    • Order the basic blocks within a procedure.
  • Algorithm
    • Simple greedy algorithm
    • Sort flow edges by weight
    • Chain two block with heaviest weight.
  • Gain
    • Improve instruction cache behavior
ex basic block chaining

Unconditional branch / Fall-through

A1

Conditional branch

10

A1

A1

10 Node weight

10

A2

10

0.6 0.4 Branch probability

A2

10

A3

10

A3

10

A4

6

0.6 0.4

A5

A4

6

4

A7

7.6

0.4 0.6

A7

A6

7.6

2.4

A8

10

A8

10

A5

4

A6

2.4

Ex) Basic Block Chaining
fine grain procedure splitting
Fine-Grain Procedure Splitting
  • Definition
    • Divide the chain into multiple code segments  new procedures.
  • Algorithm
    • Find unconditional branch or return. (just study)
    • Split into hot and cold part. (current available)
  • Gain
    • Extra degree of flexibility for the procedure ordering algorithm.
ex fine grain procedure splitting
Ex) Fine-Grain Procedure Splitting

Procedure 1

Unconditional branch

Procedure 2

Subroutine return

RET

Procedure 3

Subroutine return

RET

Procedure 4

Subroutine return

RET

procedure ordering
Procedure Ordering
  • Definition
    • Place related procedures near one another.
  • Algorithm
    • Build call graph and assign weight (# call).
    • Select the most heavily weighted edge and merge.
    • Use weights in original graph when merge.
    • Iterate until graph is reduced to a single node.
  • Gain
    • Improve instruction cache behavior
ex procedure ordering

7

7

B

A,C

B,D

A,C

A

8 1

1 1

4 10

1

D

E

E

3

B

C

8 1

1

D

E

2

D,B,A,C

E

Ex) Procedure Ordering

E,D,B,A,C

outline1
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
methodology
Methodology
  • OLTP Workload
    • TPC-B
    • Oracle 8.0.4
  • Collecting Profiles
    • OLTP profile data  Pixie.
    • Kernel profile  Tru64 Unix kprofile tool.
  • Hardware and Simulation Platforms
    • SimOS-Alpha environment
outline2
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
behavior of the db app only
Behavior of the DB App. Only
  • Instruction cache miss
    • X-axis: cache line size
    • Y-axis: # instruction cache miss
    • Reduction of misses is 55~65%.

Baseline OLTP binary

Optimized OLTP binary

experiment cont
Experiment (cont.)
  • Impact of different code layout optimization.
    • Procedure ordering  increase cache misses.
    • Largest benefit comes from basic block chaining.
    • Procedure ordering after splitting  improve performance further.
experiment cont1
Experiment (cont.)
  • Sequentially executed instructions.
    • Optimized binary  7.3 to over 10 instructions.
  • Temporal locality.
    • # instructions reused before eviction
    • Optimized binary  Increase # of instructions reused.
outline3
Outline
  • Introduction
  • Code Layout Optimizations
  • Methodology
  • Behavior of the Database Application in Isolation
  • Combined Database Application and O/S Behavior
  • Conclusion
behavior of combined db app os
Behavior of Combined DB App. & OS
  • Instruction cache miss
    • Reduction of misses is 45~60%.
    • Reduction of misses is 55~65% (App. in isolation).

Baseline OLTP binary

Optimized OLTP binary

experiment cont2
Experiment (cont.)
  • Interference between App. and OS
    • Majority of app. misses arise due to self interference.
    • Kernel interferes very little with itself.

Baseline OLTP binary

Optimized OLTP binary

conclusion
Conclusion
  • Profile-driven compiler optimization to improve code layout in OLTP workloads.
  • App in isolation  reduce 55~65% cache misses.
  • With OS  reduce 45~60% cache misses.
  • Overall, these optimizations yield improvement in performance of 1.33 times