Robert Farber and Harold Trease Pacific Northwest National Laboratory

Massively Parallel Near-Linear Scalability Algorithms with Application to Unstructured Video Analysis Robert Farber and Harold Trease Pacific Northwest National Laboratory Acknowledge: Adam Wynne (PNNL) , Lynn Trease (PNNL), Tim Carlson (PNNL), Ryan Mooney (now at google.com) 1

Image/Video Analysis and Applications“Have we seen this person’s face before?” • Goals: Image/Video content analysis, 1 million frames-per-second of processing capability (~1 TByte/sec) • Streaming, unstructured video data represents high-volume, low information content data • Huge volumes of archival data • Requirement: Scalable algorithms to transform unstructured data into large sparse graphs for analysis • This talk will focus on the Principle Component Analysis of video signatures • The framework is generally applicable to other problems! • Video analysis has many applications • Face recognition (and object recognition) • Social networks • Many others 2

First Task: Isolate the Faces 1 2 3 4 • Original frame • RGB-to-HIS • Sobel edge detection • Only skin pixels The bottom row contains frames of skin pixel patches that ID the three faces in this frame 3

Signatures Workflow Archival Data YouTube (huge!) 10k cameras = ~300,000 fps = ~300 GB/sec Split into frames and calculate entropic measures Algorithms PCA NLPCA MDS Clustering Others • Frames/Faces are separable • Faces form trajectories • Face DB • Derive social networks 4

Signatures • First steps are embarrassingly parallel • Split video into separate frames • Calculate signature of each frame and write to file Workflow Archival Data YouTube (huge!) 10k cameras = ~300,000 fps = ~300 GB/sec Split into frames and calculate entropic measures 5

Split into frames and calculate entropic measures Signatures Workflow Archival Data YouTube (huge!) 10k cameras = ~300,000 fps = ~300 GB/sec Algorithms PCA NLPCA MDS Clustering Others • Frames/Faces are separable • Faces form trajectories • Face DB • Derive social networks 6

Working with large data sets (Think BIG: 108 signatures and greater) • Formulate PCA (NLPCA, MDS, and others) as an objective function • Use your favorite solver (Conjugate Gradient) • Map to massively parallel hardware (SIMD, MIMD, SPMD, etc.) • Ranger, NVIDIA GPUs, others Massive parallelism needed to handle large data sets • 10,000 video cameras = ~300,000 fps = ~300 GB/sec • Consider all of YouTube as a video archive • Our Supercomputing 2005 data set = 2.2M frames • Test YouTube dataset consisted of over 22M frames 7

Formulate PCA as objective function • Calculate the PCA by passing information though a bottleneck layer in a linear feed-forward neural network • Oja, Erkki (November 1982). "Simplified neuron model as a principal component analyzer". Journal of Mathematical Biology 15 (3): 267-273. • Sanger, Terence D. (1989). "Optimal unsupervised learning in a single-layer linear feedforward neural network". Neural Networks 2 (6): 459-473. • Use your favorite solver (Conjugate Gradient …) • Saul A. William H. Press, Brian P. Flannery and William T. Vetterling. “Numerical Recipes in C: The Art of Scientific Computing”. Cambridge University Press, 1993. 8

Pass the information through a bottleneck 9

Map to massively parallel hardware • Large data sets require parallel data load to deliver necessary bandwidth • Use Lustre because it scales: PNNL achieved 136 GB/s sustained read, 86 GB/s sustained write • Broadcast filename plus data size and file offset to each MPI client • Each client opens the data file, seeks to location and reads appropriate data 10

Scales by P Scales by P Scales by P Optimization Routine Optimization Routine Optimization Routine Optimization Routine Optimization Routine Optimization Routine Optimization Routine Optimization Routine (Powell, Conjugate Gradient, or other method) (Powell, Conjugate Gradient, or other method) (Powell, Conjugate Gradient, etcetera) (Powell, Conjugate Gradient, etcetera) (Powell, Conjugate Gradient, etcetera) (Powell, Conjugate Gradient, etcetera) (Powell, Conjugate Gradient, etcetera) (Powell, Conjugate Gradient, etcetera) Objective Function Objective Function Objective Function Objective Function Objective Function Objective Function Energy = func(P Energy = func(P Energy = func(P Energy = func(P Energy = func(P Energy = func(P ,P ,P ,P ,P ,P ,P , , , , , , … … … … … … , P , P , P , P , P , P ) ) ) ) ) ) 1 1 1 1 1 1 2 2 2 2 2 2 N N N N N N Step 1 Step 1 Step 1 Scales by (data/Nproc) Broadcast Broadcast Broadcast Step 1 Step 1 Parameters, P Parameters, P Parameters, P Step 1 Step 2 Step 2 Step 2 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Step 3 Broadcast Broadcast Broadcast Calculate Partial Calculate Partial Calculate Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Sum Partial Parameters, P Parameters, P Parameters, P Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Energies Core 1 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 , P , P P P ,P ,P , , … … P P ,P ,P , , … … , P , P P P ,P ,P , , … … , P , P P P ,P ,P , , … … , P , P 1 1 2 2 N N 1 1 2 2 N N 1 1 2 2 N N 1 1 2 2 N N P ,P , … , P P ,P , … , P P ,P , … , P P ,P , … , P 1 2 N 1 2 N 1 2 N 1 2 N Examples Examples Examples Examples Examples Examples Examples Examples Examples 1,N 1,N N+1,2N 2N+1,3N 3N+1,4N 1,N N+1,2N 2N+1,3N 3N+1,4N O(log2(Nproc)) O(log2(Nproc)) O(log2(Nproc)) Evaluate objective function in massively parallel manner 11

Report Effective Rate • Every evaluation of the objective function requires: • Broadcasting a new set of parameters • Calculating the partial sum of the errors on each node • Obtain the global sum of the partial sum of the errors • Treduce is highly network dependent • Low bandwidth and/or high latency is bad! 12

Very efficient and near-linear scaling on Ranger Note: 32k and 64k runs will occur when possible 13

Reduce operation does affect scaling 14

Objective function performance scaling by data size on Ranger(Synthetic benchmark with no communications) • Without Prefetch • Achieved 8 GF/s per core using SSE • Interesting performance segregation • With Prefetch • Achieved nearly 8 GF/s per core • Bizarre jump at 800k examples 15

Most time (> 90%) is spent in objective function when solving PCA problem Note: data sizes were kept constant per node, which meant each trial trained on different data 16

Mapping works with other problems(and architectures) • SIMD version used by Farber since early 1980s on 64k processor Connection Machines (and other SIMD, MIMD, SPMD, Vector, & Cluster architectures) • R.M Farber, “Efficiently Modeling Neural Networks on Massively Parallel Computers”, Los Alamos National Laboratory Technical Report LA‑UR‑92‑3568. • Kurt Thearling, “Massively Parallel Architectures and Algorithms for Time Series Analysis” , published in the 1993 Lectures in Complex Systems, edited by L. Nadel and D. Stein, Addison-Wesley, 1995. • Alexander Singer, “Implementations of Artificial Neural Networks on the Connection Machine. Technical Report” RL90-2, Thinking Machines Corporation, 245 First Street, Cambridge, MA 02142, January 1990. • Many different applications aside from PCA • Independent Component Analysis • K-means • Fourier approximation • Expectation Maximization • Logistic regression • Gaussian Discriminative Analysis • Locally weighted Linear Regression • Naïve Bayes • Support Vector Machines • Others 17

PCA components form trajectories in 3-space • Separable trajectories – can build face DB! • Different faces form separate tracks • Same faces continuous across cameras • Multiple faces extracted from individual frames - can infer social networks! 18

Preliminary results using PCA • Public ground truth datasets are scarce – work in progress • PCA was first step (funding limited) • NLPCA, MDS and other methods promise to increase accuracy • Using Eucledian distance between points as a recognition metric: • 99.9% accuracy in one data set • 2 false positives in a 2k database of known faces • Each face in the database was compared against the entire database as a self-consistency check. • Social networks have been created and are being evaluated • Again, ground truth data is scarce 19

Summary: High-Performance Video Analysis Streaming video  Face database SC05 Videos Building Social Network Graphs From Face Data and Face DB  Social Network Partitioning face based graphs to discover relationships 20

Two video examples (in conjunction with Blogosphere text analysis by Michelle Gregory and Andrew Cowell) 351 videos, ~3.6 million frames, ~4.4 Tbytes (Each point is a video frame, each color is a different video, coordinates are PCA projection of N-d feature vector into 3-D) 512 YouTube videos, ~22.6 million frames, ~5.2 Tbytes 21

Connecting the points and forming the sparse graph connectivity for analysis Delaunay/Voronoi mesh: Shows the mesh connections where “points” are connected by “edges” which equals a graph Adjacency matrix: Each row represents a frame, columns represent connected frames Clusters and social network defines how one frame (face) relates to another 22

Graph Partitioning (using Voronoi/Delaunay mesh connected graphs) Adjacency before partitioning Adjacency after partitioning Delaunay/Voronoi mesh Point distribution Partitioned Mesh 23

Classification, Characterization and Clustering of High-Dimensional Data 24

Robert Farber and Harold Trease Pacific Northwest National Laboratory

Robert Farber and Harold Trease Pacific Northwest National Laboratory

Presentation Transcript

Biological Sciences Division Pacific Northwest National Laboratory

Pacific Northwest National Laboratory

Content Management Activities at Pacific Northwest National Laboratory

Pacific Northwest

Raja Parasuraman George Mason University Thomas Sanquist Pacific Northwest National Laboratory

Achieving Readiness Pacific Northwest National Laboratory

Pacific Northwest National Laboratories

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory

Cary Bloyd Energy and Environment Directorate Pacific Northwest National Laboratory

Cary Bloyd Energy and Environment Directorate Pacific Northwest National Laboratory

Darrell R. Fisher Pacific Northwest National Laboratory Richland, Washington

Pacific Northwest National Laboratory U.S. Department of Energy

Pacific Northwest National Laboratory Primavera P6 R8 Upgrade

Pacific Northwest

Manoj Krishnan Pacific Northwest National Laboratory

Pollution Prevention Integration at Pacific Northwest National Laboratory

Power Grid Research at Pacific Northwest National Laboratory

Team Members Pacific Northwest National Laboratory: Brady Hanson

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory U.S. Department of Energy