Extend Core UDF Framework for GPU-Enabled Analytical Query Evaluation

Extend Core UDF Framework for GPU-Enabled Analytical Query Evaluation Qiming Chen, Ren Wu, Meichun Hsu, Bin Zhang HP Labs Palo Alto, CA, USA 12 September 2014 IDEAS 2011

Problems • Motivated by pushing-down analytics to DB layer for fast data access and reduced data move • which requires integrating analytic computation into the query pipeline using UDFs • Existing UDF cannot act as a block operator with chunk-wise input, therefore • unable to deal with the application semantics definable on a set of incoming tuples (e.g. representing an object) • unable to leverage external computation engines (e.g. GPU) for efficient batch processing. 12 September 2014 IDEAS 2011

MST relation Graph relation A MST tuple-set A graph tuple-set Computation node GPU SAS server UDF Why need Block UDFs • From semantic point of view, many applications are definable on a set of tuples • Minimal Spanning Ttree (MST) computation is defined on a tuple-set representing a graph and returns a tuple-set representing the MST • From performance point of view, processing data by external engine should be in-batch rather than copying data back and forth on the per-tuple basis 12 September 2014 IDEAS 2011

Solution: Set-In Set-Out (SISO) UDF GPU SISO UDF pooling Materializeresult Pipelined output Pipelined input • Introduce a new kind of UDFs called Set-In Set-Out (SISO) as a block operator for processing the input tuples chunk by chunk from query processing pipeline • pool a chunk of input tuples, • dispatches them to GPUs or an analytic engine for batch computation • materializes the computation results and then streams out tuple by tuple to the query processing pipeline 12 September 2014 IDEAS 2011

Build phase On t1, …t9, do “ETL” but return NULL act like a scalar function • On t10, act like a table function • FIRST CALL: run computation on the 10 tuples • Normal CALLs: return 1 result tuple per call – tupe by tuple pipelined again comp Compute phase Streamout phase SISO Example: select vectorize(x,y,10) from point_table 12 September 2014 IDEAS 2011

Comparison with Scalar, Table UDF • Scalar UDF • 1 tuple in, 1 value/tuple out (tuple as composite value) • Access to per-function state and per-tuple state • Table UDF • 1 tuple in, N tuples out • Access to per-tuple (input) state and per-return state • SISO • N tuple in, M value/tuple out • Access to 4 level states: per-function, per-chunk, per-tuple (input), per-return • runs chunk by chunk; each chunk contains N tuples; return nothing for (1,N-1)th tuple, return a result-set for Nth tuple 12 September 2014 IDEAS 2011

Comparison with UDA • Agg operator or UDA • No general form of set output (except group-by) • No chunk-wise semantics • SISO • Flexible forms of set output • Chunk-wise semantics Comparison with RVF • RVF • Input relation initially as static data • Input relation is loaded entirely rather than by chunks • SISO • Input tuple-set chunk by chunk along query processing • Input tuple-set as dynamic data 12 September 2014 IDEAS 2011

Extending Query Engine to Support SISO UDF • Support SISO as block-operator along the tuple-by-tuple query processing pipeline • With hybrid behavior in processing a chunk of N tuples • for input tuples 1,…,N-1, like a scalar function, 1 call per input tuple, returning nothing • For tuple N, like a table function, multi-calls corresponding to that input tuple, returning a set • Need to extend UDF Accessible States • Need to extend Invocation Pattern 12 September 2014 IDEAS 2011

UDF Memory Context • A UDF is called multiple times in query processing • In the FIRST_CALL a buffer can be initiated • Then each NORMAL_CALL references and updates the buffer – buffer state across multi-calls • After the FINAL_CALL, the buffer is discarded • Multi-call context different for scalar and table UDF • For scalar UDF, 1 call per input • For table UDF, N calls per input • Therefore their memory contexts are different 12 September 2014 IDEAS 2011

Per-function state Per-function state Per-chunk state Per-tuple state Per-tuple state Per-tuple state Per-return state Per-return state Table UDF SISO UDF Scalar UDF Extend UDF Accessible States 12 September 2014 IDEAS 2011

Extend Call Skeleton Global First Call Global First Call Per-chunk First Call Per-tuple single Call (no return) Per-tuple single Call (no return) Per-tuple single Call (no return) : Per-tupleFirst Call Per-tuple First Call Per-tuple Normal Call (1 return) Last-tuple First Call Normal Call (1 return) Normal Call (1 return) Normal Call (1 return) Normal Call (1 return) Normal Call (1 return) Normal Call (1 return) : : : Per-tuple Final Call Per-tuple Final Call Last-tuple Last Call Per-chunk Last Call Per-tuple Normal Call (1 return) : Per-chunk First Call : Per-chunk Final Call Per-chunk First Call : Final call optional (system specific) : Per-chunk Final Call Table UDF Scalar UDF SISO UDF 12 September 2014 IDEAS 2011

SISO Call Skeleton Explained Set up function call global context for chunk-wise invocation (extend from fun-call node Global First Call Per-chunk First Call Per-tuple single Call (no return) Set up chunk-based buffer for pooling data Per-tuple single Call (no return) Per-tuple single Call (no return) : Pool tuples (vectorizing), return null Per-tuple First Call Normal Call (1 return) Normal Call (1 return) Return materialized results one tuple at a time : Per-tuple Final Call Per-chunk Final Call Advance chunk oriented tuple index, return null Per-chunk First Call : Rewind chunk oriented tuple index; Cleanup buffer Per-chunk Final Call Per-chunk First Call Pool last tuple in the chunk, make batch analytic computation : Per-chunk Final Call SISO UDF 12 September 2014 IDEAS 2011

Integrate Query Processing with GPU Computation using SISO UDF • Using General Purpose GPU (GPGPU) to accelerate analytic query processing allows us to leverage SQL’s analysis power and GPU’s computation power • However, their operational patterns are different • GPU computation is a kind of batch–processing with data-parallelism • Query processing is tuple-by-tuple pipelined • We solve this problem by using SISO UDFs in queries • To handle batch GPU computation in query dataflow pipeline 12 September 2014 IDEAS 2011

Init. Centers Assign Center Calc Centers Done Convergence Check Experiment on Accelerating K-Means Clustering of Very Large Data Sets • K-Means clustering is an iterative process, in each iteration • each point is assigned to the nearest cluster center as the member of that cluster • then for each center, its coordinates is re-calculated as the “mean” of the coordinates of its member points • The process is repeated until convergence is achieved. 12 September 2014 IDEAS 2011

cid,xc,yc AVG GROUPBY SISO UDF xp,yp cid,xp,yp cid,xc,yc assign_center() Points chunk-wise initially Centers Single Iteration of K-Means by SQL and SISO UDF SELECT (p).cid, AVG((p).x) AS cx, AVG((p).y) AS cy FROM ( SELECT assign_center_siso(x, y, “SELECT * FROM Centers”, N) AS p FROM Points ) r GROUP BY (p).cid; 12 September 2014 IDEAS 2011

Experiment Results Comparison • We compare performance of • scalar UDF-wrapped, CPU-based implementation • SISO UDF-wrapped, CPU-based implementation • SISO-wrapped, GPU-accelerated implementation Overall end-to-end query performance – Scalar UDF/CPU vs. SISO/CPU vs. SISO/GPUs 12 September 2014 IDEAS 2011

Scalar UDF vs. SISO UDF • the number of clusters set to 1000 • the number of data points from 1M to 100M • the chunk size fixed to 1M • Beyond 1M (1000K), the performance gain gradually diminishes with further increase in chunk size 12 September 2014 IDEAS 2011

Conclusions • In-DB analytics has been extensively investigated, but not yet become a scalable approach • An important reason lies in the lack of block UDFs to deal with the application semantics definable on a set of tuples, and to leverage external computation units such as GPUs for efficient batch processing • To solve this problem, we developed SISO as a new kind of UDFs • Integrating SISO with parallel DB is under further investigation 12 September 2014 IDEAS 2011

Extend Core UDF Framework for GPU-Enabled Analytical Query Evaluation

Extend Core UDF Framework for GPU-Enabled Analytical Query Evaluation

Presentation Transcript

Analytical evaluation

Query Evaluation

The Analytical Framework

GPU-Based Speculative Query Processing for Database Operation

OpenCL Framework for Heterogeneous CPU/GPU Programming

Sociology: An Analytical Core

An Analytical Model for a GPU

Query Evaluation

IMEA Analytical Framework

XML Query Evaluation Using a  –calculus Based Framework

Analytical evaluation

Query Evaluation

Query Evaluation

Xpath Query Evaluation

An Analytical Framework for Managed Lane Facility Performance Evaluation

BRTT Analytical Core Facility

Framework for Evaluation

Query Evaluation