Cache-aware batch scheduling policies for large-scale scientific data processing

Cache-aware batch scheduling policies for large-scale scientific data processing HPC 연구회 2013 동계 워크샵 남범석 Date Intensive Computing Lab School of Electrical and Computer Engineering Ulsan National Institution of Science and Technology, Korea

Multi-dimensional Query for Scientific Data Analysis Applications • Data-intensive Scientific Applications • Large amount of datasets • Common access pattern into Scientific Datasets is Multidimensional Query Ex) SELECT * FROM dataset WHERE xpos = 20 AND ypos = 50

Distributed and Parallel Query Processing Architecture Client Client Client Data Repository • How to maximize overall system throughput? • Load balancing plays an important role • A large cache space available in the back-end applications • How to leverage the cached results to improve system throughput even better? • Need a more intelligent scheduling policy that considers both load balancing and cache hit ratio. Front-end Query Scheduler Back-end Application Servers

Semantic Caching in Back-endApplication Servers Data Repository Front-end Cached Data (120,45) Query for Data (60,85) Cached Data (60,85) Submitted query HIT • Server stores cached data with its semantic meta data for future reuse • E.g.) x,y positions • Hit • Generate query results using the cached object • Miss • Read raw datasets ... Back-end Application Server Cached Data (320,240) Semantic Buffer Cache

Distributed Query SchedulingRound-Robin? Load-based? Round-Robin doesn’t consider which server has what cached data. Cached Data (120,45) Cached Data (80,40) Query Scheduler Query (80,40) Server A Cached Data (60,80) Where to forward? Cached Data (70,30) Server B

Distributed Query SchedulingIs Cache Hit Always Good? popular data Cache-hit aware scheduling policy may hurt load balancing. Cached Data (80,45) Cached Data (90,45) CacheMiss! Query (90,45) Query (80,45) Query (110,30) Cached Data (110,30) ... Query Scheduler Query (80,45) Server A More waiting queries, more cache misses Cached Data (60,80) Cached Data (70,30) Server B

Cache-Aware Scheduling Policy Cache Hit? Load Balancing? Spatial Clustering • DEMA (Distributed Exponential Moving Average) scheduling policy • estimates the cached contents by calculating EMA of past queries. EMAt = α·Center_of_Queryt + (1-α)·EMAt-1 (α : weight factor) Center_of_Query2 Center_of_Query3 EMA1 EMA2 EMA3 • EMA point approximates recently cached data in a back-end application server. • EMA assumes LRU cache replacement policy. • Larger α should be chosen for small caches.

Cache-Aware Scheduling Policy (Cont’d)Cache Hit? Load Balancing? Spatial Clustering • DEMA (Distributed Exponential Moving Average) scheduling policy EMAt[s]= α·Center_of_Queryt + (1-α)·EMAt-1[s](s: server ID, α : weight factor) • The front-end keeps track of an array of EMA points for back-end servers. • For an incoming query, the front-end assigns the query to the back-end server whose EMA point is closest to the query. • Similar queries are clustered (Locality) • More EMA points (servers) in popular region • Hot spot is divided by multiple servers (Load balancing) • DEMA scheduling policy was shownto outperform other traditional scheduling policies. Server A’s EMA dA dC dB Query Server C’s EMA Server B’s EMA

DEMA DEMO • DEMA scheduling policy tries to preserveboth load balancing and spatial locality. • The sizes of Voronoi cells become similar as more queries arrive in uniform distribution

DEMA with Various Static Query Distributions

DEMALoad Balancing • DEMA converges to load balance in the view of min-max optimization I5 E1 I1 0 M5 Local Maximum E5 M1 E2 EMA-Intervals M2 I2 M4 Local Minimum E3 I4 M3 I3 E4 I1 I2 I3 I4 I5

Weakness of DEMA : Load Imbalance Problem • However, DEMA suffers from temporary load imbalance when query arrival pattern suddenly changes. • In this scenario, server 5 may handle all the incoming queries. • EMA points of other servers won’t move unless the hot spot changes gradually. EMA2 EMA4 EMA1 EMA3 EMA5 1-Dimensional Problem Space

BEMA (Balanced DEMA)Improvement for Dynamic Workload New Query q New Query q 1 3 Bound(k,k+1) Bound(k+1,k) Bound(k,k+1) Bound(k+1,k) EMA k EMA k+1 EMA k EMA k+1 # of queries = 30 # of queries = 10 Q: What if server k is loaded with more queries than server k+1 ? A: Boundaries of servers should be chosen considering relative workloads.

Boundaries with BEMA in the form of Apollonius Circle • BEMA scheduler assigns the query to the server whose Apollonius circle encloses the query point. • Better load balancing than DEMA • Similar cache hit ratio with DEMA 66 35 142

DEMB Scheduling Another Policy for Dynamic Workload • DEMB (Distributed Exponential Moving Boundaries) • The front-end server keeps track of the most recent k queries using sliding window queue. • Using Hilbert curve, we transform multi-dimensional queries into one-dimensional values. • We estimate the CDF in the 1D space and partition the space into N sub-ranges so that each one has the same # of queries as other sub-ranges. (N: # of back-end servers) ... incoming query most recent k queries Update CDF

DEMB (Distributed Exponential Moving Boundaries) adjusts the boundaries of all the servers together, unlike DEMA employs Hilbert curve to enumerate the recent multidimensional data determines the boundaries of each server that assigns equal number of recent data DEMB Scheduling Policy Alternative Cache-Aware Scheduling Policy Hilbert space filling curve Boundary using Hilbert curve & CDF

DEMB Cache-Aware Scheduling Algorithm (Cont’d) • Queries in sliding window may have error in query probability distribution. • The number of queries in sliding window is smaller than the size of distributed caches. • Using EMA, we smooth out the short-term fluctuationsof rapid boundary changes and reduce the error in the query CDF estimation. • Boundary[i]t = α·BoundaryUsingCDF[i]t + (1-α)·Boundary[i]t-1 • i: ID of backend server • α: weight factor • t: number of boundary updates

DEMB Cache-Aware Scheduling Automated Parameter Adjustment • Performance factors • α: Weight factor • determines how fast the scheduler loses information about past queries • WS: Window Size • determines how many queries the scheduler stores in queue • UI: Update Interval • determines how many queries to wait before updating the boundaries • How to choose the parameters of DEMB? • Algorithm to adjust the parameters as the workload changes • Compare past CDF with current CDF using Kullback-Leibler divergence

ExperimentsComparative Study • Compared the following scheduling policies • Round robin • Fixed (Hash) • partitions the problem space into sub-spaces that are not adjusted as queries are processed. • DEMA • DEMB • Two workloads • Dynamic (Unpredictable) Query Distribution • Hot spots and query distribution change unpredictably. • Customer Behavior Model Graph (CBMG) • Realistic query workload generation model

ExperimentsBiomedical image analysis application Data Repository • Large scale image • image compression • image resolution change Cached Data (120,45) Cached Data (60,85) HIT ... Back-end Application Server Cached Data (320,240) Semantic Buffer Cache

Although Fixed policy achieves the highest cache hit ratio, it suffers from serious load imbalance, resulting poor query response time. Round robin achieves the best load balance, but has the lowest cache hit rate. Both DEMA and DEMB achieve good balance of cache hit ratio and load balance, yielding the best query response time. ExperimentsComparative Study

ExperimentsComparative Study (Dynamic) • DEMA suffers from load imbalance and low cache hit ratio as it slowly responds to rapid changes in incoming queries. • DEMB achieves the highest cache hit ratio and good load balance as it adjusts the boundaries quickly for the rapid change, yielding the best query response time.

Conclusion • In distributed query processing systems with multiple caching infrastructures, both leveraging cached results and good load balance are important.

BACK-UP

Indexing in Emerging HW Architectures:Parallel R-tree on GPGPU • The fundamental problem of data parallel programming model is to keep the massively large number of cores busy. • Inherently, tree-structured index needs irregular tree traversal. • Irregular tree traversal is not the right search model in GPU. • Parallel exhaustive search is often faster in CUDA • 3-Phase CUDA Index Search: • [1] Leftmost leaf, [2] Rightmost leaf, [3] Parallel Scan Ordinary search path CUDA-enabled KDB-tree search path Tree Node Block tree node A … B1 B2 B3 B4 Bk BN overlap, but no fetch No overlap No overlap … … Leftmost Search Rightmost Search No overlap Rightmost overlap Leftmost overlap Parallel Scanning C2 Ck Candidate Nodes

BACK-UP

ExperimentsWeight factor: α • # of servers: 50 • WS: 200 queries • With αsmaller than 0.1, the boundaries move too slowly. • results in poor load balancing • With αgreater than 0.1 • Flat query response time • 200 recent queries in both workloads seem to be enough to approximately reflect the current query distribution.

# of servers: 50 α: 1 (To see the effect of WS clearly) CBMG Realistic distribution (relatively stable) Both the cache hit rate and load balancing improves as the window size increases. For stationary workloads, longer record of past queries helps better scheduling ExperimentsWindow Size: WS (CBMG)

Dynamic Unpredictable distribution With WS smaller than 1000, both the cache hit rate and load balancing improves. Proper number of queries help the scheduling better. With WS larger than 1000, both the cache hit rate and load balancing suffers from inflexible boundaries. Too large number of WS make the boundaries change slowly. ExperimentsWindow Size: WS (Dynamic)

Cache-aware batch scheduling policies for large-scale scientific data processing

Cache-aware batch scheduling policies for large-scale scientific data processing

Presentation Transcript

Mass Data Processing Technology on Large Scale Clusters

GPU Requirements for Large Scale Scientific Applications

CRUISE : Cache Replacement and Utility-Aware Scheduling

Cache Utilization-Aware Scheduling for Multicore Processors

Large-scale Processing with MapReduce

Sailfish: A Framework For Large Scale Data Processing

Large-Scale Data Processing with MapReduce

Batch Processing

LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases

Energy-aware Hierarchical Scheduling of Applications in Large Scale Data Centers

Large-scale Data Processing Challenges

Large-Scale Iterative Data Processing CS525 Big Data Analytics

Data Indexing for Stateful , Large-scale Data Processing

Semantically-enabled (large-scale) Scientific Data Integration (SESDI)

Large scale data processing

Locality Aware Mechanisms for Large-scale Networks

Data Management Challenges of Large-Scale Data Intensive Scientific Workflows

SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems

Large Scale Data Processing with DryadLINQ

Type and Workload Aware Scheduling of Large-Scale Wide-Area Data Transfers

Social-Aware Collaborative Visualization for Large Scientific Projects