speeding up stl set map usage in c applications sipew 2008 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008 PowerPoint Presentation
Download Presentation
Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008

Loading in 2 Seconds...

play fullscreen
1 / 16

Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008 - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Dibyendu Das, Madhavi Valluri, Michael Wong, Chris Cambly dibyendu.das@in.ibm.com,mvalluri@us.ibm.com, michaelw@ca.ibm.com, ccambly@ca.ibm.com. Software/Systems Tech Group. Rational. Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008 . An idea and implementation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008' - kana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speeding up stl set map usage in c applications sipew 2008

Dibyendu Das, Madhavi Valluri, Michael Wong, Chris Cambly

dibyendu.das@in.ibm.com,mvalluri@us.ibm.com, michaelw@ca.ibm.com, ccambly@ca.ibm.com

Software/Systems Tech Group

Rational

Speeding up STL Set/Map Usage in C++ ApplicationsSIPEW 2008

SPEC CPU 2006

an idea and implementation
An idea and implementation
  • A way to speed up SPEC CPU 2006 dealII
    • that can work for all compiler vendors
  • Without violating C++ Std library rules
    • small increase in memory usage does not change cache
  • IBM’s P5+/P6 shows ~ 20% improvement
    • Delivered on IBM’s xlC C++ compiler V10.1

IBM

c standard template library and generic programming
C++ Standard Template Library and Generic Programming
  • Better Data structure can provide the best speed gain
  • Generic programming is about lifting common algorithms, and data structures
  • C++ Standard Template Library unifies algorithms, with data structures, glued by iterators
    • Effectively match any algorithms with any data structures through the abstractions of iterators
    • Universally supplied by all C++ compiler vendors
    • Vector, dequeue, list, set, map
    • With limits on performance and memory usage
    • Written by the best C++ programmers to be reusable and composable

IBM

the right tool for the right time
The right tool for the right time
  • What data structures are used in each SPEC CPU 2006 C++ benchmark

IBM

which data structure to choose
Which data structure to choose
  • Depends on how to balance the cost of lookups, erasures, insertions, copies, traversals(++/--)
  • Found that dealII slows down due to long traversal time (++/-- is costly ) for set<>
  • in the traditional binary tree search implementation of set<>/map<>
    • Optimized for a mixed combination of insertions, erasure, then some lookups,traversals, then maybe more insertions, etc.

IBM

what are we allowed to do
What are we allowed to do?
  • Can’t change the data structure in SPEC CPU benchmarks
  • However, we are allowed to alter the underlying vendor implementation of libraries if we can sense how data is used
    • Sometimes they are indeed chaotic
    • Sometimes they more organized
      • Setup through insertion
      • Lookup to find information
      • Traversal for doing something applicable to many elements
      • Reorganize to a more suitable set, then return to lookup

IBM

details of normal set implementation
As a balanced binary tree known as red-black trees

O(logn) for insertion and deletion

O(logn) for lookups(find)

O(1) amortized cost for traversals via ++/-- iterators

A set<int> iSet as a red-black tree

Details of normal set implementation

IBM

what does o 1 amortized cost for mean
Starting with sitr=iSet.begin()

Advance (++sitr) will put it on node (2) after 1 link

Advance again will put it on node (5) after 2 links

Advance again will put it on node (7) after 1 link

Advance again will put it on node (8) after 1 link

Advance again will put it on node (11) after 3 links

Advance again will put it on node (12) after 2 links

Advance again will put it on node (14) after 1 link

Advance again will put it on node (15) after 1 link

Total is 12 links after 9 traversals = 1.3 links/traversal

A set<int> iSet as a red-black tree

What does O (1) amortized cost for ++/-- mean?

IBM

our implementation
Our Implementation
  • Add a doubly-linked list on top of the red-black tree
    • Using _Next and _Prev pointers to the next sorted tree node in non-decreasing order and non-increasing order respectively
    • Now it is exactly Θ(1) for ++/-- operations
    • But insert and delete has added O(1) complexity, still within O(logn) needed by C++ Standard
    • Copy adds O(1) for every copied node

IBM

new in ibm xlc 10 1 compiler
New in IBM xlC 10.1 compiler
  • Just released June 2008 with many new features
  • Compiler defined flag to enable
    • -D __IBM_FAST_SET_MAP_ITERATOR
  • Default is to not enable this behavior
  • Entire application must be compiled with this, or we can have erroneous behavior.

IBM

results of our implementation
Results of our implementation
  • dealII, xalancbmk, omnetpp all use set and map
  • Only deallII and xalancbmk will benefit
  • Omnetpp use of set is cold
  • In peak mode (-O5 with profile directed feedback enabled)
  • Verified no cache effect

IBM

other work and future investigation
Other work and future investigation
  • All commercial implementations use some form of red-black tree
  • No commercial implementations use doubly-linked list to augment red-black tree
  • Some research use a B-tree
    • But it slows deletion compared to RB trees
  • Advised dealII author to switch to sorted vector instead of associative container

IBM

insertion
Insertion
  • Inserts a node _Z in a red-black tree
  • if it is left of a node _Y
  • Then update RB_Prev(_Y), _Y, _Z

IBM

erasure
Erasure
  • Delete node 5
  • Need to modify _Left, _Right, _Parent pointers
  • Increment and Decrement only need to follow 1 link instead of multiple links

IBM

slide16
Copy
  • When we use = in C++, it will create a copy
  • Allocate new nodes, copy contents from source to destination tree
  • Scan from first to last node in new tree in sorted order
  • Set up _Prev and _Next pointers
  • Traversal requires multiple links using original Increment and decrement
  • Requires additional O(1) amortized time for every copied node

IBM