Meandering based parallel 3drs algorithm for the multicore era
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Meandering Based Parallel 3DRS Algorithm for The Multicore Era PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Meandering Based Parallel 3DRS Algorithm for The Multicore Era. Ghiath Al- kadi ‡ , Jan Hoogerbrugge ‡ , Surendra Guntur‡ , Andrei Terechko *, Marc Duranton ‡ and Onno Eerenberg ‡ ‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands.

Download Presentation

Meandering Based Parallel 3DRS Algorithm for The Multicore Era

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Meandering based parallel 3drs algorithm for the multicore era

Meandering Based Parallel 3DRS Algorithm for TheMulticore Era

GhiathAl-kadi‡ , Jan Hoogerbrugge‡ , Surendra Guntur‡ , Andrei Terechko*, Marc Duranton‡ and OnnoEerenberg‡

‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands

This paper appears in:Consumer Electronics (ICCE), 2010 Digest of Technical Papers International Conference on


Meandering based parallel 3drs algorithm for the multicore era

  • I. INTRODUCTION

    • True motion estimation

    • 3DRS

  • II. THE SCALABLE MEANDERING BASED 3DRS

  • III. EVALUATION, RESULTS AND CONCLUSION


Introduction

introduction

  • true motion estimation --a method for finding objects motion

    • the motion vectors should represent true motion of the objects in the video sequence


Introduction1

introduction

For video compression applications it is enough to get a motion vector corresponding to best match. This in turns results in lower residual energy and better compression.

Using traditional ME to find true motion vectors can only be estimated for blocks containing enough texture


Difficulty of true motion estimation

difficulty of true motion estimation

  • When the video sequence is complex, especially having small objects and fast moving objects, motion vector is not easy to estimation

  • blocking artifact

    • object movement result in cover/uncover criterions


Main application of true motion estimation

Main Application of true motion estimation

  • frame rate up-conversion(FRC)

    • Add frame rate to 120 frames per second is becoming increasingly necessary with the advent of advanced high resolution display technologies such as LCD and Plasma

    • Motion estimation is an integral part of FRC

      • The quality of the motion vector based interpolated

      • need true motion vector


How to find true motion vector

How to find true motion vector

  • 3-Dimensional Recursive Search(3DRS) algorithm is one of the most widely used methods to find true motion

  • The 3DRS algorithm is based on block matching and in order to find true motion the algorithm makes two assumptions

    • (i) Objects are larger than a block of pixels;

    • (ii)Objects have inertia.


Meandering based parallel 3drs algorithm for the multicore era

3DRS

  • For all other blocks, we will have to rely on motion vector already estimated.

    • construct a small set of candidate vectors based on spatial relations

    • Motion vector can be refined according to the motion of neighboring blocks gradually pass by pass, and then true motion can be found with the spatial correlation of motion vectors


Meandering based parallel 3drs algorithm for the multicore era

3DRS

  • However, since the picture is processed in a block based fashion according a specified scanning order, the motion information is only available for the blocks that have already been processed according to the scan order

  • those processed in a previous field are called temporal candidates.


Meandering based parallel 3drs algorithm for the multicore era

3DRS

  • For one block, have these candidate motion vectors:

    • <1>spatial prediction candidate set:

      • :relative position of current block x and current frame n

    • <2>Temporal candidate set( estimated from previous frame):


Meandering based parallel 3drs algorithm for the multicore era

3DRS

  • <3>Update candidates set : generated by adding small random vectors (u) to spatial candidate set, i.e.

    • Update vector relative small

    • theoretically ,update vector can be random variable e.g. Gussian or uniform probability distribution

    • These (random) update vectors are essential for the convergence of the motion field and to correctly track variable object motion


General recursive process

General recursive process

"True-Motion Estimation with 3-D Recursive Search Block Matching"

Gerard de Haan, Paul W. A. C. Biezen, HenkHuijgen, and Olukayode A. Ojo


Relative position of spatial and temporal predictor

Relative position of spatial and temporal predictor


Relative position of spatial and temporal predictor1

Relative position of spatial and temporal predictor

r = 2 has been experimentally found to be best for a block size of 8*8 pixels.


Meandering based parallel 3drs algorithm for the multicore era

3DRS

  • Each pass in 3DRS motion estimation is presented as follows:

    • : candidate vector in the i-1 pass candidate vector set

    • update vectors are randomly selected from the update set, US


Convergence

convergence


Meandering based parallel 3drs algorithm for the multicore era

I. INTRODUCTION

II. THE SCALABLE MEANDERING BASED 3DRS

III. EVALUATION, RESULTS AND CONCLUSION


The scalable meandering based 3drs

THE SCALABLE MEANDERING BASED 3DRS

  • The scan order of 3DRS algorithms could either follow a “ raster” or a “meandering” pattern as shown in Fig. 1. One possible method involves processing Macro Blocks (MB) in scan order.

    • While the raster scanning pattern is easily parallelizable it has inferior convergence properties compared to the meandering scan pattern.

    • the meandering scan is quite challenging to parallelize due to the frequently changing scan direction as shown in Fig.1(A).

  • This paper addresses the above problem and presents a scalablemulti-(co)processor friendly method to parallelize the meanderingbased 3DRS motion estimation algorithm withoutcompromising picture quality.


The scalable meandering based 3drs1

THE SCALABLE MEANDERING BASED 3DRS


The scalable meandering based 3drs2

THE SCALABLE MEANDERING BASED 3DRS

  • An analysis of this algorithm allows to make the following observations:

    • (i) each meandering scan is composed of two raster scans operating on odd rows or even rows as depicted in Fig. 1(B);

    • (ii) the two raster scans depend on each other;

    • (iii) the relative position and temporal (spatial) nature of the candidates constantly change based on the current direction of the scan in progress.


The scalable meandering based 3drs3

THE SCALABLE MEANDERING BASED 3DRS

If MB(i,j) is the current MB under consideration, then the spatial and temporal MBs available for candidate selection in the traditional 3DRS algorithm are S1ij and T1ij respectively.


The scalable meandering based 3drs4

THE SCALABLE MEANDERING BASED 3DRS

S1

S3

S2

T1,T2

T3


The scalable meandering based 3drs5

THE SCALABLE MEANDERING BASED 3DRS

The variables α and β are presented for a left to right scan order, changing the scan order implies swapping the content of these variables.


The scalable meandering based 3drs6

THE SCALABLE MEANDERING BASED 3DRS

the motion information in the neighboring blocks that are processed in the same iteration (i.e. spatial candidates) is more accurate than the ones available from the previous scan iteration.

With reference to the two raster scans shown in Fig.1(B), the currently processed block MB(i,j) denoted as B has only the MB denoted as A as a direct neighboring spatial candidate. All other direct neighboring candidates are temporal.


The scalable meandering based 3drs7

THE SCALABLE MEANDERING BASED 3DRS


The scalable meandering based 3drs8

THE SCALABLE MEANDERING BASED 3DRS

  • In order to maintain motion detection accuracy

    • the selection of spatial candidates is replaced to include MBs from the set S2ij instead of S1ij.

    • The temporal candidates are unchanged.

  • Thus, the parallel 3DRS algorithm constructs its candidate set from S2ij and T2ij.


Parallelization of raster scan the 2d wave

Parallelization of Raster Scan “ The 2D Wave”

in Fig. 2, MB(i,j) can be processed as soon as MB(i-2,j+1) completes. This results in processing MBs in a diagonal wave front manner which is referred to as “ 2D-Wave”


The runtime execution of the parallel 3drs algorithm

The runtime execution of the parallel 3DRS algorithm

However, the quality of the motion detection can be compromised because the neighboring MBs are not used (other than α or β) as spatial candidates.

To prevent the quality lost while still being able to find small objects, both raster scans can be simultaneously executed as shown in Fig. 1(B). This is done by assigning a Motion Estimator (ME) (co)processor to each row.


The runtime execution of the parallel 3drs algorithm1

The runtime execution of the parallel 3DRS algorithm


The runtime execution of the parallel 3drs algorithm2

The runtime execution of the parallel 3DRS algorithm

Fig. 3, for example depicts a system in which the parallel

3DRS algorithm is mapped to four cores.

The simultaneous execution of the 2D-wave processing of the two raster scans can be viewed as two distinct phases:


The runtime execution of the parallel 3drs algorithm3

The runtime execution of the parallel 3DRS algorithm

Phase One: The execution from the start position of each row to around the middle of the row

each raster scan executes the 2D wave with ME1 using the (S1ij, T1ij) candidate set for block matching while the other processors use the (S2ij, T2ij) set (α and β are swapped according to the scan direction).


The runtime execution of the parallel 3drs algorithm4

The runtime execution of the parallel 3DRS algorithm

Phase Two: The execution from around the middle of the row to the end of the row (see Fig.3-right).

The processors executing would have overlapped the eight neighboring MBs are spatial.

Thus, ME1, ME2 and ME3 use the (S3ij, T3ij) candidate set while ME4 uses the (S1ij, T1ij) set for block matching.


Meandering based parallel 3drs algorithm for the multicore era

I. INTRODUCTION

II. THE SCALABLE MEANDERING BASED 3DRS

III. EVALUATION, RESULTS AND CONCLUSION


Evaluation

EVALUATION

The proposed parallel 3DRS algorithm is evaluated for various video streams by performing simulations on the NeXVP architecture

The underlying architecture consists of 2 homogenous 4 issue slot Trimedia cores with a subset static interleaved multithreading (two foreground and two background threads)


Evaluation1

EVALUATION

3DRS motion estimation performs 125 scans/second for Full HD 1920x1080 stream compared to 29 scans/second on a single core running the parallel 3DRS code.

For Quad HD 4096x2160 video, a rate of 100 scans/second was obtained on a similar architecture having 3 additional cores.


Result

RESULT

Qualitative evaluation of the picture quality indicates that the parallel implementation of the algorithm performs as well as the traditional 3DRS algorithm with no visible degradation in picture quality.


Result1

RESULT


Conclusion

conclusion

This paper presents a method to parallelize the meandering based 3D recursive search (3DRS) motion estimation algorithm used in scan-rate up-conversion.

The proposed algorithm is scalable and can easily be mapped to multiple processing units such as multithreaded processors, multicores and/or co-processors in order to cope up with the increasingly hard to meet real time requirements of next generation video devices.


Conclusion1

conclusion

Experiments show that the picture quality of the proposed parallel 3DRS algorithm is as good as the original nonparallelized algorithm for most video sequences.


  • Login