An Empirical Evaluation of Extendible Arrays

An Empirical Evaluation of Extendible Arrays Stelios Joannou & Rajeev Raman University of Leicester 10th International Symposium on Experimental Algorithms

Introduction • There is increasing use of in-memory RAM dynamic data structures (DS) to store large dynamic data sets. eg.: succinct DS, bloom filters and hash tables • Unlike traditional (pointer-based) dynamic DS, RAM DS can allocate/free variable-sized chunks of memory → memory fragmentation! 10th International Symposium on Experimental Algorithms

Does Fragmentation Matter? • Many recent computers have VM with 64-bit address space = 16 exabytes of addressable memory, but: • Many 32-bit machines still around • Java VM has 2GB limit • Some OS have no VM (Android VM) • Fragmentation can lead to thrashing, even when allocated memory is clearly less than available physical memory. • Many studies regarding fragmentation (from [B. Randell. Comm. ACM’69] to [Brodal et al. Acta Inf.’05]) in general but not about specific DS. 10th International Symposium on Experimental Algorithms

Introduction • Explicit memory management for dynamic DS is infeasible in practice: • Use fragmentation-friendly DS • We consider the extendible array (EA) and collection of EAs (CEA). • CEAs can be used to construct complex DS [eg Raman/Rao ICALP‘03] • Aim of this paper: study implementations for CEA from fragmentation perspective. 10th International Symposium on Experimental Algorithms

EAs and CEAs • Dynamic arrays that can grow/shrink from one side: • grow/shrink () • access (i) • CEA (collection of EAs) similar to EA: • create () • destroy (A) • access (i, A) • grow/shrink (A) 10th International Symposium on Experimental Algorithms

Vector EA • Included in the C++ STL • Data stored in an array • When array full, do “doubling” • Create a new array of double the size • Copy everything to the new array, delete old array • Advantages • Access time is worst case O (1) • Grow/Shrink takes O (1) amortized time • Disadvantages • It can have internal fragmentation of Θ (n) words 10th International Symposium on Experimental Algorithms

Simple EA • Uses constant size of DB, double IB when full • DB and IB size are a power of 2 Index Block (IB) keeps track of DBs Data Blocks(DB) contain data 10th International Symposium on Experimental Algorithms

Simple EA • Advantage • access: O(1) worst-case time. • grow/shrink: O(1) amortized time. • most memory is allocated/freed in fixed-size blocks, so reduced external fragmentation when used in CEA (fragmentation-friendly). • Disadvantages • A small DB size will lead to a huge IB • A big DB size may lead to internal fragmentation • optimal DB size may be data dependent! 10th International Symposium on Experimental Algorithms

Self-tuning • Self-tuning DS choose main parameters automatically and automatically rearrange data accordingly. • Simple is not self-tuning. • We want EAs that are both fragmentation-friendly and self-tuning. 10th International Symposium on Experimental Algorithms

Brodnik EA ([Brodnik et al. WADS '99]) Each SB has DBs Data split in super blocks (SB) of size Index Block (IB) keeps track of DBs Further split in data blocks (DB) of size 10th International Symposium on Experimental Algorithms

Brodnik EA ([Brodnik et al. WADS '99]) • Advantages • access: O(1) worst-case time • grow/shrink: O(1) amortized time • Wasted space is O () • Self-tuning DS • Disadvantages • CPU is heavily used during access (i) • Different DB sizes can lead to fragmentation Bug in the paper 10th International Symposium on Experimental Algorithms

Modified Brodnik EA • Combines Brodnik and Simple • DBs have equal size, DB and IB sizes are a power of 2 • When growing: alternates doubling IB and DB 10th International Symposium on Experimental Algorithms

Modified Brodnik EA • Advantages • Access time for worst-case O (1) • Grow/shrink amortized time O (1) • Wasted space is O ( • After alteration of DB size there is possibility that new DBs will be contiguous • Uses less CPU during access (i) than Brodnik EA • Fragmentation-friendly when only one instance of the EA • Disadvantages • Fragmentation can lead to underuse of physical memory when used as part of a CEA with various EA sizes 10th International Symposium on Experimental Algorithms

Global Brodnik CEA • Keeps the same DB size between individual EAs • Tries to maintain an ideal DB size of across all EAs • t is the number of EAs currently created • N is their total size • Array containing EAs doubles when full 10th International Symposium on Experimental Algorithms

Global Brodnik CEA 10th International Symposium on Experimental Algorithms

Global Brodnik CEA • Advantages • Self-tuning CEA • access: O(1) worst-case time • grow/shrink: O(1) amortized time • Fragmentation-friendly because of equal-sized DBs across EAs. • Disadvantages • Wasted space is O () words 10th International Symposium on Experimental Algorithms

Experimental Results • Speed tests for different ratios of EAs/# of elements • Sequential Access: • Vector is the fastest • Brodnik slowest • Random Access: • Simple Slowest in first test • Vector is faster in 2nd and 3rd case (indirection) 10th International Symposium on Experimental Algorithms

Experimental Results - 2 • 80-20 Test • 20% of DS contain 80% of total elements • Going through CEA, shrink an EA, grow a random one based on the rule • Keep doing that for times equal to the total number of elements 10th International Symposium on Experimental Algorithms

Experimental Results - 3 • Thrashing • Run a speed test after 80-20 test • Measured CPU time and wall time • Used EAs, each of size 1200 elements • Resulting usage close to physical memory • Time measured is in seconds • Thrashing occurred in Brodnik,Mod. Brodand Vector EAs • Simple, Global Brod. kept low memory usage so thrashing • Results verified by examining CPU usage and page faults 10th International Symposium on Experimental Algorithms

Conclusion • Increase in importance of RAM DS makes fragmentation important, demonstrated e.g. that thrashing occurs even in simple DSs, even when memory allocated is well below physical memory. • Introduced and established “self-tuning” and “fragmentation-friendliness” as desirable features for dynamic RAM data structures. For CEAs: • The standard solution (vector) is not efficient. • Fragmentation-friendly and self-tuning DS seem good all-round performers. • Further testing in real-world applications is needed. 10th International Symposium on Experimental Algorithms

An Empirical Evaluation of Extendible Arrays

An Empirical Evaluation of Extendible Arrays

Presentation Transcript

Empirical Issues Portfolio Performance Evaluation

An Empirical Evaluation of Relationship Between Crude Oil and Natural Gas Prices

Changing the Energy Portfolio? An Empirical Evaluation of the Renewable Portfolio Standard

Empirical Evaluation

An Evaluation of

An Extensive Empirical Evaluation of Focus Measures for Digital Photography

Empirical Evaluation of Reliability Improvement in an Evolving Software Product Line

An Empirical Evaluation of Machine Learning Approaches for Angry Birds

An Empirical Evaluation of Machine Learning Approaches for Angry Birds

DETERMINATION OF AN EMPIRICAL FORMULA

XML-  : an extendible framework for manipulating XML data

Empirical Evaluation of Learning Styles Adaptation Language

Arrays of Arrays (Multidimensional Arrays)

An Empirical Evaluation of Wide-Area Internet Bottlenecks

Personality in Teamwork: An Empirical Evaluation of “Big Five” Factors versus Facets

Arrays of Arrays

Empirical Evaluation of innovations in automatic repair

Arrays of Arrays:

Empirical Issues Portfolio Performance Evaluation

Empirical Evaluation

An Empirical Evaluation of Semiconductor File Memory as a Disk Cache