Meandering Based Parallel 3DRS Algorithm for The Multicore Era. Ghiath Al- kadi ‡ , Jan Hoogerbrugge ‡ , Surendra Guntur‡ , Andrei Terechko *, Marc Duranton ‡ and Onno Eerenberg ‡ ‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands.
GhiathAl-kadi‡ , Jan Hoogerbrugge‡ , Surendra Guntur‡ , Andrei Terechko*, Marc Duranton‡ and OnnoEerenberg‡
‡NXP Semiconductors, Eindhoven, the Netherlands. *Vector Fabrics, Eindhoven, the Netherlands
This paper appears in:Consumer Electronics (ICCE), 2010 Digest of Technical Papers International Conference on
For video compression applications it is enough to get a motion vector corresponding to best match. This in turns results in lower residual energy and better compression.
Using traditional ME to find true motion vectors can only be estimated for blocks containing enough texture
"True-Motion Estimation with 3-D Recursive Search Block Matching"
Gerard de Haan, Paul W. A. C. Biezen, HenkHuijgen, and Olukayode A. Ojo
r = 2 has been experimentally found to be best for a block size of 8*8 pixels.
If MB(i,j) is the current MB under consideration, then the spatial and temporal MBs available for candidate selection in the traditional 3DRS algorithm are S1ij and T1ij respectively.
The variables α and β are presented for a left to right scan order, changing the scan order implies swapping the content of these variables.
the motion information in the neighboring blocks that are processed in the same iteration (i.e. spatial candidates) is more accurate than the ones available from the previous scan iteration.
With reference to the two raster scans shown in Fig.1(B), the currently processed block MB(i,j) denoted as B has only the MB denoted as A as a direct neighboring spatial candidate. All other direct neighboring candidates are temporal.
in Fig. 2, MB(i,j) can be processed as soon as MB(i-2,j+1) completes. This results in processing MBs in a diagonal wave front manner which is referred to as “ 2D-Wave”
However, the quality of the motion detection can be compromised because the neighboring MBs are not used (other than α or β) as spatial candidates.
To prevent the quality lost while still being able to find small objects, both raster scans can be simultaneously executed as shown in Fig. 1(B). This is done by assigning a Motion Estimator (ME) (co)processor to each row.
Fig. 3, for example depicts a system in which the parallel
3DRS algorithm is mapped to four cores.
The simultaneous execution of the 2D-wave processing of the two raster scans can be viewed as two distinct phases:
Phase One: The execution from the start position of each row to around the middle of the row
each raster scan executes the 2D wave with ME1 using the (S1ij, T1ij) candidate set for block matching while the other processors use the (S2ij, T2ij) set (α and β are swapped according to the scan direction).
Phase Two: The execution from around the middle of the row to the end of the row (see Fig.3-right).
The processors executing would have overlapped the eight neighboring MBs are spatial.
Thus, ME1, ME2 and ME3 use the (S3ij, T3ij) candidate set while ME4 uses the (S1ij, T1ij) set for block matching.
The proposed parallel 3DRS algorithm is evaluated for various video streams by performing simulations on the NeXVP architecture
The underlying architecture consists of 2 homogenous 4 issue slot Trimedia cores with a subset static interleaved multithreading (two foreground and two background threads)
3DRS motion estimation performs 125 scans/second for Full HD 1920x1080 stream compared to 29 scans/second on a single core running the parallel 3DRS code.
For Quad HD 4096x2160 video, a rate of 100 scans/second was obtained on a similar architecture having 3 additional cores.
Qualitative evaluation of the picture quality indicates that the parallel implementation of the algorithm performs as well as the traditional 3DRS algorithm with no visible degradation in picture quality.
This paper presents a method to parallelize the meandering based 3D recursive search (3DRS) motion estimation algorithm used in scan-rate up-conversion.
The proposed algorithm is scalable and can easily be mapped to multiple processing units such as multithreaded processors, multicores and/or co-processors in order to cope up with the increasingly hard to meet real time requirements of next generation video devices.
Experiments show that the picture quality of the proposed parallel 3DRS algorithm is as good as the original nonparallelized algorithm for most video sequences.