Instruction selection presented by huang kuo an lu kuo chang subproject 3
This presentation is the property of its rightful owner.
Sponsored Links
1 / 134

Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3. A. Aho, M. Lam, R. Sethi, J. Ullman, “ Instruction Selection by Tree Rewriting. ” Compilers: Principles, Techniques & Tools”, 2 nd edition, Pearson Education, Inc, 2007. pp 558-563.

Download Presentation

Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Instruction Selection Presented byHuang Kuo-An, Lu Kuo-ChangSubproject 3

A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.” Compilers: Principles, Techniques & Tools”, 2nd edition, Pearson Education, Inc, 2007. pp 558-563.

“The LLVM Target-Independent Code Generator: Instruction Selection.” http://llvm.org/docs/CodeGenerator.html#instselect


Outline

Outline

  • Introducing LLVM

  • Instruction Selection

    • Tree Rewriting

  • Why we use LLVM?

  • Progress


Introducing llvm

Introducing LLVM

  • The LLVM compiler infrastructure

    • Provides modular & reusable components.

    • Reduces the time & cost to build a particular compiler.

    • Those components shared across different compiles.


The steps of the llvm compiler

The Steps of the LLVM Compiler

C

Language Front-end

LLVM

IR

C++


The steps of the llvm compiler1

The Steps of the LLVM Compiler

C

Language Front-end

LLVM

IR

C++

either one


The steps of the llvm compiler2

The Steps of the LLVM Compiler

C

Language Front-end

LLVM

IR

C++

An intermediate representation:

Lower than the high level language (simple instructions, no for loops, etc)

Higher than the machine code

(no opcodes, no registers, etc)


The steps of the llvm compiler3

The Steps of the LLVM Compiler

C

Language Front-end

LLVM

IR

C++

source language independent

An intermediate representation:

Lower than the high level language (simple instructions, no for loops, etc)

Higher than the machine code

(no opcodes, no registers, etc)

target processor independent


The steps of the llvm compiler4

The Steps of the LLVM Compiler

C

Language Front-end

Mid-level Optimizer

LLVM

IR

LLVM

IR

C++


The steps of the llvm compiler5

The Steps of the LLVM Compiler

C

.s file

Language Front-end

Mid-level Optimizer

Code Generation

LLVM

IR

LLVM

IR

C++

executable


The steps of the llvm compiler6

The Steps of the LLVM Compiler

C

.s file

Language Front-end

Mid-level Optimizer

Code Generation

LLVM

IR

LLVM

IR

C++

executable

Instruction Selection

Scheduling

Register Allocation

Machine-specificOptimizations

Code Emission

Target Machine Instructions

LLVM IR


Instruction selection

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

a[i] = b+1


Instruction selection1

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

a[i] = b+1

First Answer: break it into two steps


Instruction selection2

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

The intermediate representation (IR):

=

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

ind

+

+

Mb

C1

a[i] = b+1

+

ind

Ca

Rsp

+

Ci

Rsp

First Answer: break it into two steps


Instruction selection3

Instruction Selection

Into machine code like this:

The intermediate representation (IR):

=

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

ind

+

+

Mb

C1

+

ind

Ca

Rsp

+

Ci

Rsp


Instruction selection4

Instruction Selection

Into machine code like this:

The intermediate representation (IR):

=

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

ind

+

+

Mb

C1

+

ind

Ca

Rsp

+

Ci

Rsp

New question: How to go from IR to machine code?


Instruction selection5

Instruction Selection

Into machine code like this:

The intermediate representation (IR):

=

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

ind

+

+

Mb

C1

+

ind

Ca

Rsp

+

Ci

Rsp

One answer: use tree rewriting


Tree rewriting

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

Ca

Rsp

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting1

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

Ca

Rsp

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting2

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

Ca

Rsp

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting3

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

R0

Rsp

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting4

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

R0

Rsp

LD R0, #a

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting5

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

R0

Rsp

LD R0, #a

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting6

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

+

+

R0

Rsp

LD R0, #a

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting7

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

R0

+

LD R0, #a

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting8

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

R0

+

LD R0, #a

ADD R0, R0, SP

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting9

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

R0

+

LD R0, #a

ADD R0, R0, SP

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting10

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

R0

+

LD R0, #a

ADD R0, R0, SP

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting11

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

+

Ri

ind

ind

R0

+

LD R0, #a

ADD R0, R0, SP

+

Ca

Rj

Ci

Rsp

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting12

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting13

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting14

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting15

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Mb

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting16

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R1

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting17

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R1

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting18

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R1

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting19

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

+

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R1

C1

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting20

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

R1

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting21

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

R1

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting22

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

R1

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting23

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

=

Ca

Rj

ind

R1

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

R0

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting24

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

M

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting25

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

M

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Tree rewriting26

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

Tree Rewriting

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

LD R0, #a

ADD R0, R0, SP

ADD R0, R0, i(SP)

LD R1, b

INC R1

ST *R0, R1

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing

ld Ri, #a

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

But actually, something is missing…

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

The IR immediate value, #a, does not have a size limit, but the actual machine has a limited number of bits for the immediate value (let’s say, 16 bits)

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing1

ld Ri, #a(a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, x

Ri

Mx

{LD Ri, x}

But actually, something is missing…

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

Ri

So we ought to state that this tree rewriting rule only applies when the immediate value can be expressed in 16 bits (ie, a≤FFFF)

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing2

ld Ri, #a(a≤FFFF)

Ri

Ca

{LD Ri, #a}

But actually, something is missing…

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing3

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

But actually, something is missing…

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing4

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, #a (a>FFFF)

But actually, something is missing…

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing5

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, #a (a>FFFF)

But actually, something is missing…

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

The problem is the target processor does not have an instruction for 32-bit immediates. Instead, a set of machine instructions is needed. We call this set a pattern.

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


But actually something is missing6

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, #a (a>FFFF)

Ca

Ri

{LD Ri, low16(#a)

LD Rj, high16(#a)

SHR Rj, Rj, #16

ADD Ri, Ri, Rj}

But actually, something is missing…

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

The problem is the target processor does not have an instruction for 32-bit immediates. Instead, a set of machine instructions is needed. We call this set a pattern.

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Kinds of the tree rewriting rules

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1


Kinds of the tree rewriting rules1

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri


Kinds of the tree rewriting rules2

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a

add Ri ,Ri ,Rx


Kinds of the tree rewriting rules3

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx


Kinds of the tree rewriting rules4

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)


Kinds of the tree rewriting rules5

Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


So what s the point

So, what’s the point?

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.


So what s the point1

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, #a (a>FFFF)

Ca

Ri

{LD Ri, low16(#a)

LD Rj, high16(#a)

SHR Rj, Rj, #16

ADD Ri, Ri, Rj}

So, what’s the point?

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


So what s the point2

ld Ri, #a (a≤FFFF)

Ri

Ca

{LD Ri, #a}

ld Ri, #a (a>FFFF)

Ca

Ri

{LD Ri, low16(#a)

LD Rj, high16(#a)

SHR Rj, Rj, #16

ADD Ri, Ri, Rj}

So, what’s the point?

ld Ri, x

Ri

Mx

{LD Ri, x}

st x, Ri

M

=

{ST x, Ri}

Mx

Ri

st *Ri, Rj

M

=

{ST *Ri, Rj}

ind

Rj

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.

Ri

add Rx, Rj, #a

ld Ri, Rx

ind

Ri

{LD Ri, a(Rj)}

+

Ca

Rj

add Rx, Rj, #a

add Ri, Ri, Rx

+

Ri

{ADD Ri, Ri, a(Rj)}

Ri

ind

Thenuse an existing instruction selection program to apply your set of rules. The LLVM compiler has such a selector.

+

Ca

Rj

add Ri, Ri, Rj

Ri

+

{ADD Ri, Ri, Rj}

Ri

Rj

Ri

+

{INC Ri}

add Ri, Ri, #1

Ri

C1


Instruction selection6

Instruction Selection

Suppose you want to use the LLVM compiler to create PowerPC code.

The PowerPC has a single-precision floating point add instruction:

FADDS T1, X, Y

How can we allow the LLVM compiler to generate FADDS instructions?

We need to create a tree rewriting rule in the LLVM format:


Instruction selection7

Instruction Selection

Suppose you want to use the LLVM compiler to create PowerPC code.

The PowerPC has a single-precision floating point add instruction:

FADDS T1, A, B

Q:How can we allow the LLVM compiler to generate FADDS instructions?

We need to create a tree rewriting rule in the LLVM format:


Instruction selection8

Instruction Selection

Suppose you want to use the LLVM compiler to create PowerPC code.

The PowerPC has a single-precision floating point add instruction:

FADDS T1, A, B

Q:How can we allow the LLVM compiler to generate FADDS instructions?

A:We need to create a tree rewriting rule in the LLVM format:

Instruction Selector

def FADDS:Aform_2<59, 21,

(outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB),

“FADDS $FRT, $FRA, $FRB”,

[(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;

FRT

+

{FADDS FRT, FRA, FRB}

fadd RT, RA, RB

FRA

FRB


Instruction selection9

Instruction Selection

The PowerPC also has a single-precision floating point multiply instruction:

FMULS T1, X, Y

So we need to create a tree rewriting rule for it too:


Instruction selection10

Instruction Selection

The PowerPC also has a single-precision floating point multiply instruction:

FMULS T1, X, Y

So we need to create a tree rewriting rule for it too:

Instruction Selector

def FADDS:Aform_2<59, 21,

(outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB),

“FADDS $FRT, $FRA, $FRB”,

[(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;

def FMULS:Aform_3<59, 25,

(outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB),

“FMULS $FRT, $FRA, $FRB”,

[(set F4RC:$FRT, (fmul F4RC:$FRA, F4RC:$FRB))]>;

FRT

+

{FADDS FRT, FRA, FRB}

fadd RT, RA, RB

FRA

FRA

FRT

*

{FMULS FRT, FRA, FRB}

fmul RT, RA, RB

FRA

FRB


Instruction selection11

Instruction Selection

With these two rules, we could now generate PowerPC code for the following LLVM IR:

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y

FRT

+

{FADDS FRT, FRA, FRB}

fadd RT, RA, RB

FRA

FRB

FRT

*

{FMULS FRT, FRA, FRB}

fmul RT, RA, RB

FRA

FRB


Instruction selection12

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection13

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection14

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection15

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection16

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection17

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection18

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection19

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection20

Instruction Selection

But wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS?

Broken compiler? No.

Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.)

Bad compiler? Yes.

Why? FMADDS will never be used, and its faster than FMULS +FADDS

FMULS t1, X, Y

fmul:f32 X, Y

%t1 = mul float %X, %Y

%t2 = add float %t1, %Z

FADDS t2, t1, Z

fadd:f32 X, Y


Instruction selection21

Instruction Selection

We can add a new rule for the PowerPC’s multiply and add instruction:

FMADDS T1, A, B, C

Instruction Selector

def FADDS:Aform_2<59, 21,

(outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB),

“FADDS $FRT, $FRA, $FRB”,

[(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;

def FMULS:Aform_3<59, 25,

(outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB),

“FMULS $FRT, $FRA, $FRB”,

[(set F4RC:$FRT, (fmul F4RC:$FRA, F4RC:$FRB))]>;

def FMADDS:Aform_1<59, 29,

(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),

“FMADDS $FRT, $FRA, $FRC, $FRB”,

[(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),

F4RC:$FRB))]>;

FRT

+

{FADDS FRT, FRA, FRB}

fadd RT, RA, RB

FRA

FRB

FRT

*

{FMULS FRT, FRA, FRB}

fmul RT, RA, RB

FRA

FRB

FRT

+

{FMADDS FRT, FRA,

FRB, FRC}

fmul RT1, RA, RC

fadd RT2, RT1, RB

FRB

*

FRA

FRC


3 kinds of the tree rewriting rules

3 Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


3 kinds of the tree rewriting rules1

3 Kinds of the tree rewriting rules

FMADDS is a

Many-to-One

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


3 kinds of the tree rewriting rules2

3 Kinds of the tree rewriting rules

FMADDS is not

needed for a

basic compiler

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


3 kinds of the tree rewriting rules3

3 Kinds of the tree rewriting rules

Infact, many-to-

ones can all be

skipped.

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


3 kinds of the tree rewriting rules4

3 Kinds of the tree rewriting rules

One-to-One

add R1,R1,#1 INC Ri

Many-to-One

add Rx,Rj ,#a ADD Ri,Ri,a(Rj)

add Ri ,Ri ,Rx

One-to-Many

ld Ri, #a (a>0xFFFF)LD Ri, low16(#a)

LD Rj, high16(#a)

SHL Rj, #16

ADD Ri, Ri, Rj


We will use the llvm compiler

We will use the LLVM compiler

Because:

It has good optimizations

It has good documentation

It is designed to be a little bit easier to retarget to a new processor

It was the compiler used by subproject 3, year 1 – so there is some infrastructure


But there are some difficulties with the llvm compiler

But there are some difficulties with the LLVM compiler

Because:

It compiles C, not OpenGL 2.0

Though it has backends for several processors, none of them are SIMD

So, the LLVM IR is not SIMD


How we will use the llvm compiler

How we will use the LLVM compiler

Our work is in two parallel paths:

Fast track: uses Subproject 2’s code to convert OPENGL to C

Slow track: use Subproject 3 year 1’s code to generate SIMD instructions in the LLVM IR


A quick reminder

A quick reminder

OpenGL 2.0 code is stored in a string array.

It is not compiled until the game is actually running.

At some point during the running of the game, the game calls glCompileShader, which takes the string array as an input argument and returns an object file.

Maybe the player entered a new level, and the new level has brick walls. But the previous level did not have brick walls, so the graphics processor does not have a rule for how to render bricks.

The brick shader must be compiled, linked, and loaded to the graphics processor.

This is accomplished through 3 operating system calls from within the game

glCompileShader(…)

glLinkProgram(…)

glUseProgram(…)

Our current work is only on the implementation of glCompileShader.

glCompileShader is a program that runs on the ARM processor, when called by the ARM’s OS.

So, our compiler (which is written in C++) is compiled into an ARM executable. But when this compiler executable is run, it generates a shader executable.


A sample opengl code

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

A sample OpenGL code

Here is some shader code:

shader

string array


A sample compilation trigger

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

void AddBrickFragments(GLuint currentProgram) {

GLuint brickFS =

glCreateShader(GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

A sample compilation trigger

And here is a function inside of the game that compiles and loads the shader:

shader

string array


A sample compilation trigger1

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

void AddBrickFragments(GLuint currentProgram) {

GLuint brickFS =

glCreateShader(GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

A sample compilation trigger

game running

on ARM

And here is a function inside of the game that compiles and loads the shader:

shader

string array


The fast track compiler process

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

The fast track compiler process

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

shader

string array


The fast track compiler process1

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

The fast track compiler process

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

shader

string array


The fast track compiler process2

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

The fast track compiler process

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

Proj2

converter

shader

string array

equivalent C code


The fast track compiler process3

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

The fast track compiler process

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

Proj2

converter

shader

string array

equivalent C code


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The fast track compiler process

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

Proj2

converter

LLVM

frontend

shader

string array

equivalent C code

equivalent LLVM IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The fast track compiler process

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

game running

on ARM

So now, as the game runs, this call to glCompileShader happens

Then the ARM processor calls the LLVM compiler, passing in this code for compilation

The LLVM compiler then:

Runs Proj2Converter to make C code

Runs the LLVM front end to create IR

Runs our new LLVM backend to create shader object file

Sends the object file back to the game

Proj2

converter

LLVM

frontend

shader

string array

equivalent C code

equivalent LLVM IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

  • So now, as the game runs, this call to glCompileShader happens

  • Then the ARM processor calls the LLVM compiler, passing in this code for compilation

  • The LLVM compiler then:

    • Runs Proj2Converter to make C code

    • Runs the LLVM front end to create IR

    • Runs our new LLVM backend to create shader object file

    • Sends the object file back to the game

The fast track compiler process

MUL R1, R2, R3

MADD R4,R1,R5

void AddBrickFragments(Gluint

currentProgram) {

GLuint brickFS =glCreateShader(

GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray,

NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

game running

on ARM

Proj2

converter

LLVM

frontend

fast track

backend

shader

string array

equivalent shader object file

equivalent C code

equivalent LLVM IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The fast track compiler process

MUL R1, R2, R3

MADD R4,R1,R5

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

Proj2

converter

LLVM

frontend

fast track

backend

shader

string array

equivalent shader object file

equivalent C code

equivalent LLVM IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The fast track compiler process

MUL R1, R2, R3

MADD R4,R1,R5

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

shader

string array

Proj2

converter

LLVM

frontend

fast track

backend

equivalent shader object file

equivalent C code

equivalent LLVM IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The slowtrack compiler process

It is not good to use subproject 2’s converter:

  • The compiler is run during game execution, so the conversion step adds overhead

  • The conversion destroys vectors, so that you can’t create SIMD code

    • After all, if C was a good fit for 3D shaders, then we wouldn’t need the OpenGL language!


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The slowtrack compiler process

The subproject 3, year 1 team addressed this problem:

  • They modified the LLVM frontend to read OpenGL code instead of C code

    • To handle the SIMD information expressed in the OpenGL (such as variables declared as “vec4”), they added vectors into the LLVM IR

  • The problem is that the LLVM backend was not modified, so their result is a non-standard LLVM IR, that can’t be currently compiled

    • The gist of our slow track development process is modifying the backend to understand the augmented IR


Instruction selection presented by huang kuo an lu kuo chang subproject 3

The fast track compiler process

SQRT R1, R2

RCP R4,R1

MUL R1, R2, R3

MADD R4,R1,R5

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

iform vec3 LightPosition;

const float SpecularContribution = 0.3;

const float DiffuseContribution = 1.0 - SpecularContribution;

varying float LightIntensity;

varying vec2 MCposition;

void main(void) {

vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex);

vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);

vec3 lightVec = normalize(LightPosition - ecPosition);

vec3 reflectVec = reflect(-lightVec, tnorm);

vec3 viewVec = normalize(-ecPosition);

float diffuse = max(dot(lightVec, tnorm), 0.0);

float spec = 0.0;

if (diffuse > 0.0) {

spec = max(dot(reflectVec, viewVec), 0.0);

spec = pow(spec, 16.0);

}

LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec;

MCposition = gl_Vertex.xy;

gl_Position = ftransform();

}

shader

string array

The slow track compiler process

Proj2

converter

LLVM

frontend

Proj3Y1 LLVM

frontend

slow track

backend

fast track

backend

shader

string array

equivalent shader object file

equivalent shader object file

equivalent C code

equivalent LLVM IR

equivalent, aug-mented LLVM IR


Instruction selection summary

Instruction selection summary

There are then 3 steps in our instruction selector

1st cut: fast track selection

- Backend changed to target the shader processors

- Works but has no SIMD operation

2nd cut: slow track selection

- Merge in the second backend change to understand the augmented IR

- Update instruction selector to make SIMD choices

3rd cut: Create tree rewriting rules for the complex processor instructions, like SQRT and LOG


Progress

Progress

The following table shows the Shader Instructions. And our goal is map LLVM Instructions into our Shader Instructions.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Progress

The following table shows the Shader Instructions. And our goal is map LLVM Instructions into our Shader Instructions.

There are some LLVM Instructions that can obviously map into our Shader Instructions.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Progress

We have map some of them, but there are more LLVM IR. If you have a LLVM IR without a tree rewriting rule for it, then you are not going to get a working compiler.


Progress1

Progress

We have map some of them, but there are more LLVM IR. If you have a LLVM IR without a tree rewriting rule for it, then you are not going to get a working compiler.

These are some harder to map, which means we are going to cover these one by one.


Progress2

Progress

  • Some are harder to map, which means one of 2 things:

  • It will require a more complicated mapping

  • It can be skipped (for now), it’s a many-to-one mapping


Progress3

Progress

For example, here is how we map the SHR instruction, which is easy.


Progress4

Progress

For example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;


Progress5

Progress

For example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code:

def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;


Progress6

Progress

For example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code:

def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;

Because this is a simple mapping, we can just change the the string which we can actually see in the assembly file.

For now our target just to get correct assembly, not executables.


Progress7

Progress

For example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code:

def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;

Because this is a simple mapping, we can just change the the string which we can actually see in the assembly file.

For now our target just to get correct assembly, not executables.

But there are some instruction hard to map, for example, the ASHR instruction.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

First, to remind what arithmetic shift right is:

It’s a shift that preserves sign extension.

Consider:

if R0 = 10101010101010101010101010101010

then SHR R0,10 =

00000000001010101010101010101010

but ASHR R0,10 =

11111111111010101010101010101010

The shr was easy to make a rule for, because the shader has an SHR instruction. But it doesn’t have an ASHR.

Q: How then can we make a rule to deal with the LLVM ashr IR instruction?

A: We’ll need to use multiple shader instructions (1 to many)


Instruction selection presented by huang kuo an lu kuo chang subproject 3

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

This part can be different

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

This part can be different

This part is always the same

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Left Part

Right Part

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Left Part

Right Part

This part can be different

This part is always the same

These are always the same

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

So it looks like answer here is to compute the right part with SHR and the left part as: (TopBit << ShiftAmount) - 1

But how to define a pattern of shader instructions?

In the example of previous slide, we see that

SHR R0,10 =

00000000001010101010101010101010

ASHR R0,10 =

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Now we can start to build the ASHR instruction

First we define a pattern call RED. Recall that shader registers have 4 32-bit fields: Red, Green, Blue, and Alpha. Since we are not using SIMD yet, we will only deal with 1 32-bit register. That is what RED does. Here is the LLVM pattern:

def:PAT<(RED Rx),(AND Rx,0xFFFFFFFF)>

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001

After we do the green part (assuming y =10):


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001

After we do the green part (assuming y =10):

00000000000000000000010000000000


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Thisnow computes the left part.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Thisnow computes the left part.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001

After we do the green part (assuming y =10):

00000000000000000000010000000000

After we do the purple part:

00000000000000000000001111111111


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Finally, the sign extension bits shift up to where they go.


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry),

(SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Finally, the sign extension bits shift up to where they go.

For example 10101010101010101010101010101010

After we do the blue part: 00000000000000000000000000000001

After we do the green part (assuming y =10):

00000000000000000000010000000000

After we do the purple part:

00000000000000000000001111111111

After we do the red part:

11111111110000000000000000000000


Instruction selection presented by huang kuo an lu kuo chang subproject 3

Third we define a pattern of shader instructions for merging the left and right parts:

def : PAT <(ASHR, Rx),

(OR (TOPBITS Rx,Ry), (SHR (RED Rx), y))>

From the previous slide the red part is:

11111111110000000000000000000000

It is clear that the lavender part is:

00000000001010101010101010101010

And a bitwise-OR of the two parts yields:

11111111111010101010101010101010


Instruction selection presented by huang kuo an lu kuo chang subproject 3

  • So, we defined 3 patterns:

    def:PAT<(RED Rx),(AND Rx,0xFFFFFFFF)>

    def:PAT<(TOPBITS Rx,Ry),

    (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

    def : PAT <(ASHR, Rx),

    (OR (TOPBITS Rx,Ry), (SHR (RED Rx), y))>

  • As a result, there is now a rewriting rule for ashr

  • Its awkward, but it works

    • Besides its unclear how often shaders would do an ashr

  • We must similarly build patterns for every LLVM IR instruction that does not naturally map to a shader processor instruction


Future work

Future work

All of the above is just for the first-cut compiler

1st cut: fast track selection

- Backend changed to target the shader processors

- Works but has no SIMD operation

2nd cut: slow track selection

- Merge in the second backend change to understand the augmented IR

- Update instruction selector to make SIMD choices

3rd cut: Create tree rewriting rules for the complex processor instructions, like SQRT and LOG


  • Login