Chameleon automatic selection of collections
Download
1 / 24

Chameleon Automatic Selection of Collections - PowerPoint PPT Presentation


  • 238 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Chameleon Automatic Selection of Collections. Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center . Collections. Set. HashSet. LinkedSet. ArraySet. LazySet. Map. HashMap. LinkedMap. ArrayMap. LazyMap. List. LinkedList.

Related searches for Chameleon Automatic Selection of Collections

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Chameleon Automatic Selection of Collections

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chameleon automatic selection of collections l.jpg
ChameleonAutomatic Selection of Collections

Ohad Shacham Martin VechevEran Yahav

Tel Aviv University IBM T.J. Watson Research Center


Collections l.jpg
Collections

Set

HashSet

LinkedSet

ArraySet

LazySet

Map

HashMap

LinkedMap

ArrayMap

LazyMap

List

LinkedList

ArrayList

LazyList

  • Abstract data types

  • Many implementations

  • Different space/time tradeoffs

  • Incompatible selection might lead to

    • runtime degradation

    • Space bloat – wasted space


Collection bloat l.jpg
Collection Bloat

Collection bloat is a non justified space overhead for storing data in collections

List s = new ArrayList();

s.add(1);

Bloat for s is 9

1


Collection bloat4 l.jpg
Collection Bloat

Collection-bloat is a serious problem in practice

Observed to occupy 90% of the heap in real-world applications

Hard to detect and fix

Accumulation: death by a thousand cuts

Correction: Need to correlate bloat to program code

How to pick the right implementation?

Minimize bloat

But without degrading running time


Our vision l.jpg
Our Vision

Programmer declares the ADT to be used

Set s = new Set();

Programmer defines what metric to optimize

e.g. space-time

Runtime automatically selects implementation based on metric

Online: detect application usage of Set

Online: select appropriate implementation of Set

Set

HashSet

ArraySet

LinkedSet


This work l.jpg
This Work

Programmer defines the implementation to be used

Set s = new HashSet();

Programmer defines what metric to optimize

space-time product

Space = Bloat

Runtime suggests implementation based on metric

Online: automatically detect application usage of HashSet()

Online: automatically suggest alternative to HashSet()

Offline: programmer modifies program accordingly

e.g. Set s = new ArraySet();


How can we calculate bloat l.jpg
How Can We Calculate Bloat ?

Data structure Bloat

Occupied Data – Used Data

Example:

List s = new ArrayList();

s.add(1);

Bloat for s is 9

1


How to detect collection bloat l.jpg
How to Detect Collection Bloat?

Each collection maintains a field for used data

Language runtime can find out actually occupied data

Bloat = Occupied Data – Used Data

Solution: Garbage Collector Computes Bloat Online

Reads used data fields from collections

Low-overhead: can work online in production


Semantic maps l.jpg
Semantic Maps

How Collections Communicate Information to GC

Includes size and pointers to actual data fields

Allows for trivial support of Custom Collections

int size

Object[] Array

Used Data

Occupied Data

elementCount

elementData

Used Data

Occupied Data

ArrayList

ArrayList

Semantic map

HashMap

Semantic map

HashMap

GC




Slide12 l.jpg

Example: Collections Bloat in TVLA

Lower bound for bloat


Fixing bloat l.jpg
Fixing Bloat

Must correlate all bloat stats to program point

Need Trace Information

Remember: do not want to degrade time


Correlating code and bloat l.jpg
Correlating Code and Bloat

public final class ConcreteKAryPredicate extends ConcretePredicate {

public void modify() {

values = HashMapFactory.make(this.values);

}

}

public class GenericBlur extends Blur {

public void blur(TVS structure) {

Map invCanonicName =HashMapFactory.make(structure.nodes().size());

}

}

public class HashMapFactory {

public static Map make(int size) {

return new HashMap(size);

}

}

Ctx4 7%

Ctx1 40%

Ctx2 11%

Ctx3 5%

Ctx5 5%

Ctx6 3%

Ctx7 7%

Ctx8 3%

  • Aggregate bloat potential per allocation context

  • Done by the garbage collector


Trace information l.jpg
Trace Information

Track Collection Usage in Library:

Distribution of operations

Distribution of size

Aggregated per allocation context

ctx1

Size = 7

Get = 3

Add = 9

….

ctx2

Size = 1

Contains = 100

Insert = 1

….

ctx3

Size = 103

Contains = 10041

Insert = 140

Remove = 20

ctxi

….

….


But how to choose the new collection l.jpg
But how to choose the new Collection ?

Rule Engine: user defined rules

Input: Heap and Trace Statistics per-context

Output: Suggested Collection for that context

Rules based on trace and heap information

HashMap: #contains < X  CollmaxSize < Y → ArrayMap

HashMap: #contains < X  CollmaxSize < Y+10  %liveHeap > Z→ ArrayMap

Rule Engine

Hashmap: maxSize < X → ArrayMap

LinkedList: NoListOp → ArrayList

Hashmap:(#contains < X  CollmaxSize < Y+10  %liveHeap > Z ) → ArrayMap


Overall picture l.jpg

ctx1

Size = 7

Get = 3

Add = 9

….

ctx2

Size = 1

Contains = 100

Insert = 1

….

Rule Engine

Hashmap: maxSize < X → ArrayMapLinkedList: NoListOp → ArrayList

Hashmap:(#contains < X  CollmaxSize < Y+10  %liveHeap > Z ) → ArrayMap

Overall Picture

Potential report

Recommendations

Semantic Profiler

Program

Rules

Semantic maps


Correct collection bloat typical usage l.jpg
Correct Collection Bloat – Typical Usage

Step 1: Profile for Bloat without Context

Low-overhead, can run in production

If problem detected, go to step 2

Automatic

Step 2: Combine heap information with trace information per context

Can switch automatically to step 2 from step 1

Higher-overhead than step 1

Automatic: prior to Chameleon - a manual step (very hard)

Step 3: Suggest fixes to user based on rules

Automatic

Step 4: Programmer applies suggested fixes

Manual


Chameleon on tvla l.jpg

Potential

Potential

Operations

Operations

Max 15 26 7 7

Avg 11.33 6.31 4.8 4.8

Stddev 1.36 5.05 1.17 1.17

Max 15 26 7 7

Avg 11.33 6.31 4.8 4.8

Stddev 1.36 5.05 1.17 1.17

Size

Size

Chameleon on TVLA

1: HashMap:tvla...HashMapFactory:31

;tvla.core.base.BaseTVS:50

replace with ArrayMap

4: ArrayList:BaseHashTVSSet:112;

tvla...base.BaseHashTVSSet:60

set initial capacity


Implementation l.jpg
Implementation

Built on top of IBM’s JVM

Modifications to Parallel Mark and Sweep GC

Modular changes, readily applicable to other GCs

Modifications to collection libraries

Runtime overhead

Detection Phase: Negligible

Correction Phase: ~2x (due to cost of getting context)

Can Use PCC by Bond & McKinley




Related work l.jpg
Related Work

  • Large volume of work on SETL

    • Automatic data structure selection in SETL [Schonberg et. al., POPL'79]

    • SETL representation sublanguage [Dewar et. al, TOPLAS'79]

  • Bloat

    • The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky, OOPSLA’07]


Summary l.jpg
Summary

  • Collection selection is a real problem

    • Runtime penalty

    • Bloat

  • Chameleon integrates trace and heap information for choosing a collection implementation

    • based on predefined rules

  • Using Chameleon, reduced the footprint of several applications

    • Never degrading running time, often improving it

  • First step towards automatic collection selection as part of the runtime system


ad
  • Login