1 / 73

Multimedia Content Analysis on Clusters and Grids

This paper discusses the use of parallel computing in multimedia content analysis (MMCA), focusing on image processing. It explores the challenges of analyzing multimedia content and proposes the use of clusters and grids as a software platform. Realistic problem scenarios such as CCTV analysis and web video search are presented, highlighting the need for efficient analysis techniques. The paper also discusses the potential applications of MMCA in various fields such as healthcare, astronomy, and remote sensing.

rdenn
Download Presentation

Multimedia Content Analysis on Clusters and Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Content Analysison Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl) Computer Systems Group, Faculty of Sciences, Vrije Universiteit, Amsterdam Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  2. Overview (1) • Part 1: What is Multimedia Content Analysis (MMCA)? • Part 2: Why parallel computing in MMCA – and how? • Part 3: Software Platform: Parallel-Horus • Part 4: Example – Parallel Image Processing on Clusters Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  3. Overview (1) • Part 5: ‘Grids’ and their specific problems • Part 6: A Software Platform for MMCA on ‘Grids’? • Part 7: Large-scale MMCA applications on ‘Grids’ • Part 8: Future research directions => Jungle Computing Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  4. Introduction • A Few Realistic Problem Scenarios Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  5. automatic analysis? A Real Problem… • News broadcast - September 21, 2005: • Police Investigation: over 80.000 CCTV recordings • First match found only 2.5 months after attacks Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  6. Sarah Palin Another real problem… • Web Video Search: • Search based on annotations • Known to be notoriously bad (e.g, YouTube) • Instead: search based on video content Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  7. Are these realistic problems? • Beeld&Geluid (Dutch Institute for Sound and Vision, Hilversum): • Interactive access to Dutch national TV history • NFI (Dutch Forensics Institute, Den Haag): • Surveillance Camera Analysis • Crime Scene Reconstruction Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  8. But there are many more: • Healthcare • Astronomy • Remote Sensing • Entertainment (e.g. see: PhotoSynth.net) • …. Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  9. Part 1 • What is Multimedia Content Analysis? Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  10. Multimedia • Multimedia = Text + Sound + Image + Video + …. • Video = image + image + image + …. • In many (not all) multimedia applications: • calculations are executed on each separate video frame independently • So: we focus on Image Processing(+ Computer Vision) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  11. However: • Image dimensionality can range from 1-D to n-D • Example (medical): 5-D = x, y, z, time, emission wavelength • Pixel dimensionality can range from 1-D to n-D • Generally: 1D = binary/grayscale; 3D = color (e.g. RGB) • n-D = hyper-spectral (e.g. remote sensing by satellites) What is a Digital Image? • “An image is a continuous function that has been discretized in spatial coordinates, brightness and color frequencies” • Most often: 2-D with ‘pixels’ as scalar or vector value Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  12. “Pres. Bush stepping off Airforce 1” Impala “Supernova at X,Y,t…” “Blue Car” A K R Z (Parallel-) Horus Low level operations Intermediate level operations High level operations In: image Out: ‘meaning’ful result Image ===> (sub-) Image Image ===> Scalar / Vector Value ===> Feature Vector Image ===> Array of S/V Values Complete A-Z Multimedia Applications Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  13. Binary Pixel Operation (example: addition) = + Template / Kernel / Filter / Neighborhood Operation (example: Gauss filter) = + Low Level Image Processing Patterns (1) Unary Pixel Operation (example: absolute value) = N-ary Pixel Operation… Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  14. 2 1 7 6 4 = N-Reduction Operation (example: histogram) M + = Geometric Transformation (example: rotation) transformation matrix Low Level Image Processing Patterns (2) Reduction Operation (example: sum) = Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  15. Example Application: Template Matching for all images { inputIm = readFile ( … ); unaryPixOpI ( sqrdInIm, inputIm, “set” ); binaryPixOpI ( sqrdInIm, inputIm, “mul” ); for all symbol images { symbol = readFile ( … ); weight = readFile ( … ); unaryPixOpI (filtIm1, sqrdInIm, “set”); unaryPixOpI (filtIm2, inputIm, “set”); genNeighborhoodOp (filtIm1, borderMirror, weight, “mul”, “sum”); binaryPixOpI (symbol, weight, “mul” ); genNeighborhoodOp (filtIm2, borderMirror, symbol, ”mul”, “sum”); binaryPixOpI (filtIm1, filtIm2, “sub”); binaryPixOpI (maxIm, filtIm1, “max”); } writeFile ( …, maxIm, … ); } Input Image Template See: http:/www.cs.vu.nl/~fjseins/ParHorusCode/ Result Image Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  16. Part 2 • Why Parallel Computing in MMCA (and how)? Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  17. The ‘Need for Speed’ in MMCA • Growing interest in international ‘benchmark evaluations’ • Task: find ‘semantic concepts’ automatically • Example: NIST TRECVID (200+ hours of video) • A problem of scale: • At least 30-50 hours of processing time per hour of video • Beeld&Geluid: 20,000 hours of TV broadcasts per year • NASA: over 10 TB of hyper-spectral image data per day • London Underground: over 120,000 years of processing…!!! Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  18. Question: • What type of high-performance hardware is most suitable? • Our initial choice: • Clusters of general purpose CPUs (e.g. DAS-cluster) • For many pragmatic reasons… Accelerators Grids Clusters General Purpose CPUs GPUs High Performance Computing • Solution: • Parallel & distributed computing at a very large scale Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  19. User Transparent Parallelization Tools For non-experts in Parallel Computing? Message Passing Libraries (e.g., MPI, PVM) Effort Shared Memory Specifications (e.g., OpenMP) Parallel Languages (e.g., Occam, Orca) Extended High Level Languages (e.g., HPF) Parallel Image Processing Languages (e.g., Apply, IAL) Automatic Parallelizing Compilers Parallel Image Processing Libraries Efficiency Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  20. ! • Ignore optimization across library calls [all] Existing Parallel Image Processing Libs • Suffer from many problems: • No ‘familiar’ programming model: • Identifying parallelism still the responsibility of programmer (e.g. data partitioning [Taniguchi97], loop parallelism [Niculescu02, Olk95]) • Reduced maintainability / portability: • Multiple implementations for each operation [Jamieson94] • Restricted to particular machine [Moore97, Webb93] • Non-optimal efficiency of parallel execution: • Ignore machine characteristics for optimization [Juhasz98, Lee97] Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  21. Our Approach • Sustainable software library for user-transparent parallel image processing • (1) Sustainability: • Maintainability, extensibility, portability (i.e. from Horus) • Applicability to commodity clusters • (2) User transparency: • Strictly sequential API (identical to Horus) • Intra-operation efficiency & inter-operation efficiency Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  22. Part 3 (a) • Software Platform: Parallel-Horus (parallel algorithms) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  23. What Type(s) of Parallelism to support? • Data parallelism: • “exploitation of concurrency that derives from the application of the same operation to multiple elements of a data structure” [Foster, 1995] • Task parallelism: • “a model of parallel computing in which many different operations may be executed concurrently” [Wilson, 1995] Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  24. Why Data Parallelism (only)? • Natural approach for low level image processing • Scalability (in general: #pixels >> #different tasks) • Load balancing is easy • Finding independent tasks automatically is hard • In other words: it’s just the best starting point… (but not necessarily optimal at all times) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  25. Many Algorithms Embarrassingly Parallel Parallel Operation on Image { Scatter Image (1) Sequential Operation on Partial Image(2) Gather Result Data (3) } • Works (with minor issues) for: unary, binary, n-ary operations & (n-) reduction operations • On 2 CPUs: (1) (2) (3) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  26. SCRATCH SCRATCH Other only marginally more complex (1) Parallel Filter Operation on Image { Scatter Image (1) Allocate Scratch (2) Copy Image into Scratch (3) Handle / Communicate Borders (4) Sequential Filter Operation on Scratch(5) Gather Image (6) } • Also possible: ‘overlapping’ scatter • But not very useful in iterative filtering • On 2 CPUs (without scatter / gather): Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  27. Other only marginally more complex (2) Parallel Geometric Transformation on Image { Broadcast Image (1) Create Partial Image (2) Sequential Transform on Partial Image (3) Gather Result Image (4) } • Potential faster implementations for special cases • On 2 CPUs (without broadcast/gather shown): RESULT IMAGE RESULT IMAGE Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  28. Equivalent: + = … followed by … Template / Kernel / Filter / Neighborhood Operation (example: Gauss filter) = + + = Challenge: Separable Recursive Filtering Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  29. Challenge: Separable Recursive Filtering • Separable filters: • 1 x 2D becomes 2 x 1D • Drastically reduces sequential computation time • Recursive filtering: • result of each filter step (a pixel value) stored back into input image • So: a recursive filter uses (part of) its output as input • For parallelization: • In each step, newly calculated/stored data may be located on another node • In each step, horizontal OR vertical data dependencies with ‘on the fly’ updates of the data Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  30. Parallel Recursive Filtering: Solution 1 • Drawback: transpose operation is very expensive (esp. when nr. CPUs is large) (SCATTER) (FILTER X-dir) (TRANSPOSE) (FILTER Y-dir) (GATHER) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  31. P0 P1 P2 P0 P1 P2 P0 P1 P2 Parallel Recursive Filtering: Solution 2 • Loop carrying dependence at final stage (sub-image level) • minimal communication overhead • full serialization • Loop carrying dependence at innermost stage (pixel-column level) • high communication overhead • fine-grained wave-front parallelism • Tiled loop carrying dependence at intermediate stage (image-tile level) • moderate communication overhead • coarse-grained wave-front parallelism Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  32. Wavefront parallelism • Drawback: • partial serialization • non-optimal use of available CPUs CPU 0 CPU 2 CPU 3 CPU 1 Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  33. Parallel Recursive Filtering: Solution 3 • Multipartitioning: • Skewed cyclic block partitioning • Each CPU owns at least one tile in each of the distributed dimensions • All neighboring tiles in a particular direction are owned by the same CPU CPU 0 CPU 2 CPU 3 CPU 1 Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  34. Parallel Recursive Filtering: Solution 3 • Full Parallelism: • First in one direction… • And then in other… • Border exchange at end of each sweep • Communication at end of sweep always with same node CPU 0 CPU 2 CPU 3 CPU 1 Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  35. Part 3 (b) • Software Platform: Parallel-Horus (platform design) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  36. Parallel Extensions MPI Parallel-Horus: Parallelizable Patterns • Minimal intrusion: • Re-use as much as possible the original sequential Horus library codes • Parallelization localized • Easy to implement extensions Horus Sequential API Parallelizable Patterns Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  37. Pattern implementations (old vs. new) template<class …, class …, class …> inline DstArrayT* CxPatUnaryPixOp(… dst, … src, … upo) { if (dst == 0) dst = CxArrayClone<DstArrayT>(src); CxFuncUpoDispatch(dst, src, upo); return dst; } template<class …, class …, class …> inline DstArrayT* CxPatUnaryPixOp(… dst, … src, … upo) { if (dst == 0) dst = CxArrayClone<DstArrayT>(src); if (!PxRunParallel()) { // run sequential CxFuncUpoDispatch(dst, src, upo); } else { // run parallel PxArrayPreStateTransition(src, …, …); PxArrayPreStateTransition(dst, …, …); CxFuncUpoDispatch(dst, src, upo); PxArrayPostStateTransition(dst); } return dst; } Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  38. Don’t do this: Scatter ImageOp Gather Scatter ImageOp Gather Do this: Scatter ImageOp Avoid Communication ImageOp Gather On the fly! Inter-Operation Optimization • Lazy Parallelization: Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  39. Finite State Machine • Communication operations serve as state transition functions between distributed data structure states • State transitions performed only when absolutely necessary • State transition functions allow correct conversion of legal sequential code to legal parallel code at all times • Nice features: • Requires no a priori knowledge of loops and branches • Can be done on the fly at run-time (with no measurable overhead) Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  40. Part 4 • Example – Parallel Image Processing on Clusters Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  41. Example: Curvilinear Structure Detection • Apply anisotropic Gaussian filter bank to input image • Maximum response when filter tuned to line direction • Here 3 different implementations • fixed filters applied to a rotating image • rotating filters applied to fixed input image • separable (UV) • non-separable (2D) • Depending on parameter space: • few minutes - several hours Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  42. Sequential = Parallel (1) for all orientations theta { geometricOp ( inputIm, &rotatIm, -theta, LINEAR, 0, p, “rotate” ); for all smoothing scales sy { for all differentiation scales sx { genConvolution ( filtIm1, mirrorBorder, “gauss”, sx, sy, 2, 0 ); genConvolution ( filtIm2, mirrorBorder, “gauss”, sx, sy, 0, 0 ); binaryPixOpI ( filtIm1, filtIm2, “negdiv” ); binaryPixOpC ( filtIm1, sx*sy, “mul” ); binaryPixOpI ( contrIm, filtIm1, “max” ); } } geometricOp ( contrIm, &backIm, theta, LINEAR, 0, p, “rotate” ); binaryPixOpI ( resltIm, backIm, “max” ); } IMPLEMENTATION 1 Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  43. Sequential = Parallel (2 & 3) for all orientations theta { for all smoothing scales sy { for all differentiation scales sx { genConvolution (filtIm1, mirrorBorder, “func”, sx, sy, 2, 0 ); genConvolution (filtIm2, mirrorBorder, “func”, sx, sy, 0, 0 ); binaryPixOpI (filtIm1, filtIm2, “negdiv”); binaryPixOpC (filtIm1, sx*sy, “mul”); binaryPixOpI (resltIm, filtIm1, “max”); } } } IMPLEMENTATIONS 2 and 3 Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  44. Measurements (DAS-1) • 512x512 image • 36 orientations • 8 anisotropic filters • => Part of the efficiency of parallel execution always remains in the hands of the application programmer! Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  45. Measurements (DAS-2) • 512x512 image • 36 orientations • 8 anisotropic filters • So: lazy parallelization (or: optimization across library calls) is very important for high efficiency! Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  46. Part 5 • ‘Grids’ and their Specific Problems Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  47. Compare electrical power grid: The ‘Promise of The Grid’ • 1997 and beyond: • efficient and transparent (i.e. easy-to-use) wall-socket computing over a distributed set of resources Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  48. Grid Problems (1) • Getting an account on remote compute clusters is hard! • Find the right person to contact… • Hope he/she does not completely ignore your request… • Provide proof of (a.o.) relevance, ethics, ‘trusted’ nationality… • Fill in and sign NDA’s, Foreign National Information sheets, official usage documents, etc… • Wait for account to be created, & username to be sent to you… • Hope to obtain an initial password as well… • Getting access to an existing international Grid-testbed is easier • But only marginally so… Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  49. Grid Problems (2) • Getting your C++/MPI code to compile and run is hard! • Copying your code to the remote cluster (‘scp’ often not allowed)… • Setting up your environment & finding the right MPI compiler (mpicc, mpiCC, … ???)… • Making the necessary include libraries available… • Find out how to use the cluster reservation system… • Finding the correct way to start your program (mpiexec, mpirun, … and on which nodes ???)… • Getting your compute nodes to communicate with other machines (generally not allowed)… • So: • Nothing is standardized yet (not even Globus) • A working application in one Grid domain will generally fail in all other Parallel Computing 2010 – Vrije Universiteit, Amsterdam

  50. Grid Problems (3) • Keeping an application running (efficiently) is hard! • Grids are inherently dynamic: • Networks and CPUs are shared with others, causing fluctuations in resource availability • Grids are inherently faulty: • compute nodes & clusters may crash at any time • Grids are inherently heterogeneous: • optimization for run-time execution efficiency is by-and-large unknown territory • So: • An application that runs (efficiently) at one moment should be expected to fail a moment later Parallel Computing 2010 – Vrije Universiteit, Amsterdam

More Related