Transforming linear algebra libraries from abstraction to parallelism
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Transforming Linear Algebra Libraries: From Abstraction to Parallelism PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Transforming Linear Algebra Libraries: From Abstraction to Parallelism. Ernie Chan. Motivation. Statically. Outline. Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion. Inversion of a Triangular Matrix.

Download Presentation

Transforming Linear Algebra Libraries: From Abstraction to Parallelism

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Transforming linear algebra libraries from abstraction to parallelism

Transforming Linear Algebra Libraries: From Abstraction to Parallelism

Ernie Chan

HIPS 2010


Motivation

Motivation

Statically

HIPS 2010


Outline

Outline

  • Inversion of a Triangular Matrix

  • Requisite Semantic Information

  • Static Generation of a Directed Acyclic Graph

  • Performance

  • Conclusion

HIPS 2010


Inversion of a triangular matrix

Inversion of a Triangular Matrix

  • Formal Linear Algebra Methods Environment (FLAME)

    • High-level abstractions for expressing linear algebra algorithms

  • Triangular Inversion (Trinv)

    R := U-1

HIPS 2010


Inversion of a triangular matrix1

Inversion of a Triangular Matrix

HIPS 2010


Inversion of a triangular matrix2

Inversion of a Triangular Matrix

  • LAPACK-style Implementation

    DO J = 1, N, NB

    JB = MIN( NB, N-J+1 )

    CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’,

    $ JB, N-J-JB+1, -ONE, A( J, J ), LDA,

    $ A( J, J+JB ), LDA )

    CALL DGEMM( ‘No transpose’, ‘No transpose’,

    $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA,

    $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA )

    CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’,

    $ J-1, JB, ONE, A( J, J ), LDA,

    $ A( 1, J ), LDA )

    CALL DTRTI2( ‘Upper’, ‘Non-unit’,

    $ JB, A( J, J ), LDA, INFO )

    ENDDO

HIPS 2010


Inversion of a triangular matrix3

Inversion of a Triangular Matrix

  • FLASH

    • Matrix of matrices

HIPS 2010


Inversion of a triangular matrix4

Inversion of a Triangular Matrix

FLA_Part_2x2( A, &ATL, &ATR,

&ABL, &ABR, 0, 0, FLA_TL );

while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) )

{

FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02,

/* ******** */ /* **************** */

&A10, /**/ &A11, &A12,

ABL, /**/ ABR, &A20, /**/ &A21, &A22,

1, 1, FLA_BR );

/*-------------------------------------------------------*/

FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR,

FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG,

FLA_MINUS_ONE, A11, A12 );

FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE,

FLA_ONE, A01, A12, FLA_ONE, A02 );

FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR,

FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG,

FLA_ONE, A11, A01 );

FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 );

/*-------------------------------------------------------*/

FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02,

A10, A11, /**/ A12,

/* ********** */ /* ************* */

&ABL, /**/ &ABR, A20, A21, /**/ A22,

FLA_TL );

}

HIPS 2010


Inversion of a triangular matrix5

Inversion of a Triangular Matrix

  • Extensible Markup Language (XML)

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <Function name="FLA_Trinv" type="blk" variant="3">

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Declaration>

    <Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

    </Declaration>

    <Loop>

    <Guard>A</Guard>

    <Update>

    <Statement name="FLA_Trsm">

    <Option type="side">FLA_LEFT</Option>

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Option type="trans">FLA_NO_TRANSPOSE</Option>

    <Option type="diag">FLA_NONUNIT_DIAG</Option>

    <Parameter>FLA_MINUS_ONE</Parameter>

    <Parameter partition="11">A<Parameter>

    <Parameter partition="12">A<Parameter>

    <Statement name="FLA_Gemm">

    <Option type="trans">FLA_NO_TRANSPOSE</Option>

    <Option type="trans">FLA_NO_TRANSPOSE</Option>

    <Parameter>FLA_ONE<Parameter>

HIPS 2010


Inversion of a triangular matrix6

Inversion of a Triangular Matrix

  • Extensible Markup Language (XML) Cont.

    <Parameter partition="01">A</Parameter>

    <Parameter partition="12">A</Parameter>

    <Parameter>FLA_ONE</Parameter>

    <Parameter partition="02">A</Parameter>

    </Statement>

    <Statement name="FLA_Trsm">

    <Option type="side">FLA_RIGHT</Option>

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Option type="trans">FLA_NO_TRANSPOSE</Option>

    <Option type="diag">FLA_NONUNIT_DIAG</Option>

    <Parameter>FLA_ONE</Parameter>

    <Parameter partition="11">A</Parameter>

    <Parameter partition="01">A</Parameter>

    </Statement>

    <Statement name="FLA_Trinv">

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Option type="diag">FLA_NONUNIT_DIAG</Option>

    <Parameter partition="11">A</Parameter>

    </Statement>

    </Update>

    </Loop>

    </Function>

HIPS 2010


Outline1

Outline

  • Inversion of a Triangular Matrix

  • Requisite Semantic Information

  • Static Generation of a Directed Acyclic Graph

  • Performance

  • Conclusion

HIPS 2010


Requisite semantic information

Requisite Semantic Information

  • Partitioning Scheme

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <Function name="FLA_Trinv" type="blk" variant="3">

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Declaration>

    <Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

    </Declaration>

    <Loop>

    <Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

    <Update>

    <Statement name="FLA_Trsm“>

    <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

    </Statement>

    <Statement name="FLA_Gemm“>

    <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

    </Statement>

    <Statement name="FLA_Trsm“>

    <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

    </Statement>

    <Statement name="FLA_Trinv“>

    <!–- ‘Upper’, ‘Non-unit’, A11 -->

    </Statement>

    </Update>

    </Loop>

    </Function>

HIPS 2010


Requisite semantic information1

Requisite Semantic Information

  • Problem Size*

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <Function name="FLA_Trinv" type="blk" variant="3">

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Declaration>

    <Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

    </Declaration>

    <Loop>

    <Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

    <Update>

    <Statement name="FLA_Trsm“>

    <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

    </Statement>

    <Statement name="FLA_Gemm“>

    <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

    </Statement>

    <Statement name="FLA_Trsm“>

    <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

    </Statement>

    <Statement name="FLA_Trinv“>

    <!–- ‘Upper’, ‘Non-unit’, A11 -->

    </Statement>

    </Update>

    </Loop>

    </Function>

HIPS 2010


Requisite semantic information2

Requisite Semantic Information

  • Updates

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <Function name="FLA_Trinv" type="blk" variant="3">

    <Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

    <Declaration>

    <Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

    </Declaration>

    <Loop>

    <Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

    <Update>

    <Statement name="FLA_Trsm“>

    <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

    </Statement>

    <Statement name="FLA_Gemm“>

    <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

    </Statement>

    <Statement name="FLA_Trsm“>

    <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

    </Statement>

    <Statement name="FLA_Trinv“>

    <!–- ‘Upper’, ‘Non-unit’, A11 -->

    </Statement>

    </Update>

    </Loop>

    </Function>

HIPS 2010


Requisite semantic information3

Requisite Semantic Information

  • Input and Output Parameters

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <Function name="FLA_Trsm">

    <Declaration>

    <Operand type=“scalar“ inout=“in">alpha</Operand>

    <Operand type="matrix“ inout=“in">A</Operand>

    <Operand type="matrix“ inout=“both“>B</Operand>

    </Declaration>

    </Function>

    <Function name="FLA_Gemm">

    <Declaration>

    <Operand type=“scalar“ inout=“in">alpha</Operand>

    <Operand type="matrix“ inout=“in">A</Operand>

    <Operand type="matrix“ inout=“in">B</Operand>

    <Operand type=“scalar“ inout=“in">beta</Operand>

    <Operand type="matrix“ inout="both">C</Operand>

    </Declaration>

    </Function>

    <Function name="FLA_Trinv">

    <Declaration>

    <Operand type="matrix“ inout="both">A</Operand>

    </Declaration>

    </Function>

HIPS 2010


Outline2

Outline

  • Inversion of a Triangular Matrix

  • Requisite Semantic Information

  • Static Generation of a Directed Acyclic Graph

  • Performance

  • Conclusion

HIPS 2010


Static generation of a dag

Static Generation of a DAG

  • Code Generation

    • Convert XML representation to FLASH code generation intermediary

      • Annotated with input and output information

    • Create directed acyclic graph (DAG) by statically unrolling the loop

      • Operations on submatrix blocks (tasks) are vertices

      • Data dependencies between tasks are edges

HIPS 2010


Static generation of a dag1

Static Generation of a DAG

  • Data Dependencies

    • Flow (read-after-write)

      S1: A = B + C;

      S2: D = A + E;

    • Anti (write-after-read)

      S3: F = A + G;

      S4: A = H + I;

    • Output (write-after-write)

      S5: A = J + K;

      S6: A = L + M;

HIPS 2010


Static generation of a dag2

Static Generation of a DAG

HIPS 2010


Static generation of a dag3

Static Generation of a DAG

  • Problem Size

    • Problem size cannot be determined a priori

    • Fix the block size or loop unrolling factor

      • Balance between instruction footprint and data granularity of tasks

  • Example

    • Trinv on 3x3 matrix of blocks

HIPS 2010


Static generation of a dag4

Static Generation of a DAG

  • Trinv

    • Iteration 1

Trsm0

Trsm1

Trinv2

HIPS 2010


Static generation of a dag5

Static Generation of a DAG

  • Trinv

    • Iteration 2

Trsm5

Gemm4

Trinv6

Trsm3

HIPS 2010


Static generation of a dag6

Static Generation of a DAG

  • Trinv

    • Iteration 3

Trsm7

Trsm8

Trinv9

HIPS 2010


Static generation of a dag7

Static Generation of a DAG

Trsm0

Trsm1

Trinv2

Trsm3

Gemm4

Trsm5

Trinv6

Trsm7

Trsm8

Trinv9

HIPS 2010


Outline3

Outline

  • Inversion of a Triangular Matrix

  • Requisite Semantic Information

  • Static Generation of a Directed Acyclic Graph

  • Performance

  • Conclusion

HIPS 2010


Performance

Performance

  • LabVIEW

    • Graphical, data flow programming language (G)

      • Anti-dependencies cannot exist in G

        • Copies are made when wire is split

HIPS 2010


Performance1

Performance

HIPS 2010


Performance2

Performance

  • Target Architecture

    • 16-core AMD processor

      • 4 socket quad-core Opteron

      • 1.9 GHz

      • 4 GB of RAM per socket

    • LabVIEW 8.6

      • Windows XP

    • Basic Linear Algebra Subprograms (BLAS)

      • MKL 7.2

HIPS 2010


Performance3

Performance

HIPS 2010


Performance4

Performance

  • Results

    • Parallelism

      • Exploit parallelism inherent within DAG

    • Hierarchical matrix storage

      • Spatial locality

    • Overhead

      • Copy matrix from flat row-major storage to hierarchical matrix and back

HIPS 2010


Performance5

Performance

HIPS 2010


Outline4

Outline

  • Inversion of a Triangular Matrix

  • Requisite Semantic Information

  • Static Generation of a Directed Acyclic Graph

  • Performance

  • Conclusion

HIPS 2010


Conclusion

Conclusion

  • Instantiate linear algebra algorithm using a code generation intermediary

  • Statically produce a directed acyclic graph by fixing block size or loop unrolling factor

    XML → FLASH → DAG

HIPS 2010


Acknowledgments

Acknowledgments

  • Jim Nagle, Robert van de Geijn

    • We thank the other members of FLAME team for their support

  • Funding

    • National Instruments

    • NSF Grants

      • CCF—0540926

      • CCF—0702714

HIPS 2010


Conclusion1

Conclusion

  • More Information

    http://www.cs.utexas.edu/~flame

  • Questions?

    [email protected]

HIPS 2010


  • Login