speeding up virtualdub n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speeding up VirtualDub PowerPoint Presentation
Download Presentation
Speeding up VirtualDub

Loading in 2 Seconds...

play fullscreen
1 / 14
clayland

Speeding up VirtualDub - PowerPoint PPT Presentation

152 Views
Download Presentation
Speeding up VirtualDub
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Speeding up VirtualDub Presented by: Shmuel Habari Advisor: Zvika Guz Software Systems Lab Technion

  2. What is VirtualDub? • VirtualDub is an incredibly popular open source video processing tool. • It is capable of merging videos, cutting scenes, adding subtitles and applying a wide variety of filters. It also supports third party video compression (i.e. DivX) • VirtualDub is constantly being refined, expanded and adapted by it’s original creator, Avery Lee . http://www.virtualdub.org/

  3. VirtualDub’s Benchmark • The benchmark chosen was to use the Resize filter, an often-used cpu heavy filter. • Choosing a vibrant color animation video, so every flaw, if any, will be visible. • The result video was made w/o audio filtering, and with no third party compression utilities.

  4. VTune Performance Analyzer • Analyzing the benchmark using VTune: • First step - VDFastMemcpyPartialMMX2

  5. Fast Memory Copy • This functions handles copying large quantities of data from a memory source address, into a memory destination address. • Each cycle copies the data into the registers, and then into the specified address. • Moving to the next 64 bytes, the loop continues, till all the data has been copied. • From observations, the function was called to read 2048 bytes every time.

  6. Clockticks Samples • Again using VTune it was seen that predictably, the most clockticks were when reading from the memory.

  7. Dummy Loop • Seeing that, the solution was to fill the cache before beginning to copy the data • I’ve added a dummy loop, a.k.a. @mainloop, reading 1024 bytes ahead, before running blastloop. • When the cache empties – if we did not reach the end of the source data, another 1024 bytes would be read. • Using the Dummy loop, a speedup of 4.21% was gained.

  8. Threads • As stated before, the original VirtualDub is a project in development. • The original creator had access to code optimizing programs – VTune included – allowing him to improve the code himself, removing many pitfalls and errors common to non-optimized code. • Also, VirtualDub proved to be multithreaded, to a point:

  9. Threads • The 1st thread is the processing thread - however, the 2nd thread is the audio thread – since we specifically disabled the audio, It did not contain almost any activity: • Therefore – theoretically, Multithreading the Process thread was still possible

  10. Threads • At first I had high hopes for multithreading VirtualDub – studying the code I came to the conclusion that it processed the video frame by frame, and in each frame it scanned line by line. • Two approachs I decided to try were: • Processing two frames in parallel • Cutting a frame in half, and processing the top and bottom in parallel.

  11. Threads • At first I had high hopes for multithreading VirtualDub – studying the code I came to the conclusion that it processed the video frame by frame, and in each frame it scanned line by line. • Two approachs I decided to try were: • Processing two frames in parallel • Cutting a frame in half, and processing the top and bottom in parallel.

  12. Threads • However, All my attempts at hyper threading VirtualDub’s processing failed. • At first believing that I’ve encountered global variables being addressed, I’ve discovered them to be private variables to a much higher level class. • Attempts to duplicate said class in order to split the workload failed.

  13. Threads • Lastly, I’ve turned to OpenMP, hoping to use it’s innate capabilities to duplicate the variables into each thread. • VirtualDub’s complexity made it impossible for me to covert it to Intel Compiler – every change resulted in a staggering amount of errors, each requiring many small code changes, and still more that couldn’t be solved. • Limiting the use of Intel compiler into the only necessary projects did not show an improvement.

  14. Conclusion • A lot of time and effort were put into this project. • To my dismay, it is not evident in percent of speedup, but rather as error messages and various versions of code, each a bit closer to a working version, but never quite there. • The bottom line, is that despite the promise initially shown by VirtualDub, ultimately too much had already been originally done in it – leaving it optimized, monstrously big and intricate for my optimization.