Efficient run time dispatching in generic programming with minimal code bloat
Download
1 / 26

Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat. Lubomir Bourdev Advanced Technology Labs Adobe Systems. Jaakko J ä rvi Computer Science Department Texas A&M University. Agenda. Context & problem statement Background – previous approaches

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat' - chick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Efficient run time dispatching in generic programming with minimal code bloat

Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat

Lubomir Bourdev

Advanced Technology Labs

Adobe Systems

Jaakko Järvi

Computer Science Department

Texas A&M University


Agenda
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Agenda1
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Context image manipulation
Context: Image Manipulation Minimal Code Bloat

  • Images vary in many different ways

  • Writing generic and efficient image processing algorithms is challenging


Image representations

Color space Minimal Code Bloat

(RGB, CMYK…)

optional padding at the end of rows

channel order (RGB vs. BGR)

planar vs. interleaved

channel depth

8-bit, 16-bit…

Image Representations

  • 4x3 image in which the second pixel is hilighted

  • In interleaved form:

  • In planar form:


Generic image library gil
Generic Image Library (GIL) Minimal Code Bloat

  • Adobe’s Open Source Image Library

    http://opensource.adobe.com/gil

  • Abstracts image representations from algorithms on images

  • Allows for writing the algorithm once & having it work on images of any representation, without loss of performance


Problem statement
Problem Statement Minimal Code Bloat

  • How do we write image processing algorithms that are:

    • Generic

    • Efficient

    • Compact

    • Run-Time Flexible


Agenda2
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Image algorithms via inheritance polymorphism
Image algorithms via inheritance & polymorphism Minimal Code Bloat

struct pixel { virtual void invert()=0; };

struct rgb_pixel : public pixel {

virtual void invert();

};

struct gray_pixel : public pixel {

virtual void invert();

};

struct image {

pixel* operator[](size_t i);

};

void invert(image* img) {

for (i=0; i<img.size(); ++i)

img[i]->invert();

}

Generic X

Efficient X

Compact √

Run-Time Flexible √

Performance problem:

dynamic dispatch once per pixel


Image algorithms via generic programming
Image Algorithms via Generic Programming Minimal Code Bloat

struct rgb_pixel {…};

struct gray_pixel {…};

void invert_pixel(rgb_pixel&) {…}

void invert_pixel(gray_pixel&) {…}

template <typename Pixel>

struct image {

Pixel& operator[](size_t i);

};

template <typename Image>

void invert(Image& img) {

for (i=0; i<img.size(); ++i)

invert_pixel(img[i]);

}

Generic √

Efficient √

Compact √

Run-Time Flexible X


Generic code lacks flexibility
Generic Code Lacks Flexibility Minimal Code Bloat

  • We need run-time flexibility:

    typedef boost::mpl::vector<rgb8_image, gray8_image> images;

    gil::any_image<images> runtime_image;

    gil::jpeg_read_image(runtime_image, “test.jpg”);

    invert(runtime_image);

  • How can we do that without loss of performance?

    • Variant construct (see boost::variant)

    • runtime_image holds:

      • index: index to the type of image

      • bits: buffer containing the currently instantiated image

    • To invoke an algorithm, go through a switch statement & cast

    • Efficient: invoke dynamic dispatch only once per algorithm


Variant invocation
Variant invocation Minimal Code Bloat

void invert_image(void* bits, int index) {

switch (index) {

case kLAB: invert(*(image<lab_pixel>*)(bits));

case kRGB: invert(*(image<rgb_pixel>*)(bits));

}

}

Generic version:

template <typename Op>

void apply_operation(void* bits, int index, Op op) {

switch (index) {

case kLAB: op(*(image<lab_pixel>*)(bits));

case kRGB: op(*(image<rgb_pixel>*)(bits));

}

}

Generic √

Efficient √

Compactx

Run-Time Flexible √


Solution template hoisting
Solution: Template Hoisting Minimal Code Bloat

  • Define a class hierarchy:

    template <int k> class k_channel_image {…};

    class rgb_image : public k_channel_image<3> {};

    class lab_image : public k_channel_image<3> {};

  • Define the algorithm at the appropriate level of the hierarchy:

    template <int k> void invert(k_channel_image<k>&) {…}

Genericx

Efficient √

Compact

Run-Time Flexible √

- enforces a specific hierarchy

- different algorithms may need different hierarchies

  • - switch statement overhead remains

  • does not help when the function is inlined


Agenda3
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Type reduction
Type Reduction Minimal Code Bloat

  • Every algorithm partitions the space of its argument types into a set of equivalence classes

  • Members of an equivalence result in the same assembly when instantiated

  • The algorithm is instantiated only with one representative from each equivalence class


Type reduction implementation
Type Reduction Implementation Minimal Code Bloat

  • Metafunction to define the partition:

    template <typename Op, typename T>

    struct reduce {

    typedef T type;

    };

  • Generic algorithm invocation:

    template <typename Op, typename T>

    inline void apply_operation(const T& argument, Op op) {

        typedef typename reduce<Op,T>::type base_t;

        op(reinterpret_cast<const base_t&>(argument));

    }


Example the invert algorithm
Example: The invert algorithm Minimal Code Bloat

  • Define the algorithm as a function object:

    struct invert_op {

    template <typename Image> void operator()(Image&){…}

    };

  • Provide a function overload to invoke it:

    template <typename Image>

    inline void invert(Image& image) {

    apply_operation(image, invert_op());

    }

  • Inverting RGB and LAB images is assembly-level identical:

    template<> struct reduce<invert_op, lab8_image_t> {

    typedef rgb8_image_t;

    };


The technique generalizes to multiple dimensions
The technique generalizes to multiple dimensions Minimal Code Bloat

template <typename Op, typename T1, typename T2>

void apply_operation(T1& arg1, T2& arg2, Op op) {

    typedef typename reduce<Op,T1>::type base1_t;

    typedef typename reduce<Op,T2>::type base2_t;

    typedef std::pair<T1*, T2*> pair_t;

    typedef typename reduce<Op,pair_t>::type base_pair_t;

    std::pair<void*,void*> p(&arg1,&arg2);

    op(reinterpret_cast<base_pair_t&>(p));

}

template <>

struct reduce<copy_pixels_op,lab8_image_t> {…};

template <>

struct reduce<copy_pixels_op,

std::pair<lab8_image_t,lab8_image_t> > {…};


Defining reduce specializations
Defining Reduce Specializations Minimal Code Bloat

  • Reduce dimensions separately, then combine:

    template <typename Image>

    struct reduce<invert_pixels_op, Image> {

    typedef reduce_cs<Image::color_space_t>::type cs;

    typedef reduce_ch<Image::channel_t>::type channel;

    typedef image_type<cs,channel,…>::type type;

    };

  • Reuse structures via metafunction forwarding:

    template <typename T1, typename T2>

    struct reduce<resample_pixels_op, std::pair<T1,T2> >

    : public reduce<copy_pixels_op, std::pair<T1,T2> > {};


Agenda4
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Reduction in variants
Reduction in variants Minimal Code Bloat

Input: a variant of:

input_types: [rgb8_image, lab8_image, cmyk16_image, rgba16_image]

input_index: 2

  • Step 1: Reduce each member of the vector:

    reduced_t: [rgb8_image, rgb8_image, rgba16_image, rgba16_image]

  • Step 2: Remove duplicates:

    output_types_t: [rgb8_image, rgba16_image]

  • Step 3: Create index vector from reduced_t to output_types_t:

    indices_t: [0, 0, 1, 1]

  • Step 4: Use indices_t to map the input index to an output index:

    output_index = indices_t[input_index] = indices[2] = 1

    Invoke the algorithm on a variant of:

    output_types_t: [rgb8_image, rgba16_image]

    output_index: 1


Binary reduction in variants
Binary reduction in variants Minimal Code Bloat

  • Step 1: Perform unary pre-reduction on each argument

    [A1, A2, A3, A4] with index 2 -> [A1, A3] with out_index1 = 1

    [B1, B2, B3] with index 3 -> [B1, B2] with out_index2 = 0

  • Step 2: Compute a vector of the cross-products of types

    [(A1,B1), (A1,B2), (A3,B1), (A3,B2)]

  • Step 3: Apply unary reduction on it:

    output_types_t = [(A1,B1), (A1,B2), (A3,B2)]

  • Step 4: Compute the index in the output vector

    out_index = out_index1 * size(Vec1) + out_index2

    Invoke the algorithm on a single variant of:

    output_types_t = [(A1,B1), (A1,B2), (A3,B2)]

    out_index


Agenda5
Agenda Minimal Code Bloat

  • Context & problem statement

  • Background – previous approaches

  • Our approach to code bloat reduction

  • Code bloat reduction in run-time dispatch

  • Results & conclusion


Tests
Tests Minimal Code Bloat

  • Test sets

    • Set A: 90 types (10 color spaces, 3 channel types, other variations)

    • Set B: 10 types (4 color spaces, other)

    • Set C: 12 types (3 color spaces, planar/interleaved, step/nonstep)

  • Tests

    • Test 1: copy_pixels on Set B (inlined binary algorithm)

    • Test 2: copy_pixels on Set C (inlined binary algorithm)

    • Test 3: resample_pixels on Set B (non-inlined binary algorithm)

    • Test 4: resample_pixels on Set C (non-inlined binary algorithm)

    • Test 5: invert_pixels on Set A (inlined unary algorithm)


Results
Results Minimal Code Bloat

Reduction in code bloat

Effect on compile time


Conclusion
Conclusion Minimal Code Bloat

  • Drawbacks

    • Unsafe

    • Requires intimate knowledge of the types and the algorithm

    • Some compilers can optimize most of the code bloat

  • Benefits

    • Works even when functions are inlined

    • Simplifies code generated by variants (especially double dispatch)

    • Does not impose class hierarchy (essential for generic code!)

    • Works when algorithms differ in requirements