optimizing ogg vorbis performance using architectural considerations adir abraham and tal abir n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir PowerPoint Presentation
Download Presentation
Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir

Loading in 2 Seconds...

play fullscreen
1 / 11

Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir' - delano


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
optimizing ogg vorbis performance using architectural considerations adir abraham and tal abir

Optimizing Ogg Vorbis performance using architectural considerationsAdir AbrahamandTal Abir

slide2

Ogg Vorbis is a fully open, non-proprietary, patent-and-royalty-free, general-purpose compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channel. This places Vorbis in the same competitive class as audio representations such as MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3, MPEG-4 audio (TwinVQ), WMA and PAC.

slide3

Strategies used to increase Ogg Vorbis’ performance* We looked for architectural pitfalls, and created an alternative, optimized code instead.* We used threading in order to use HyperThreading capabilities of the processor.* We used SSE programming, in order to make faster, parallelized calculations.

slide4

Cleaning architectural pitfallsSerialized instructionsAfter using VTune to analyze the results, we found that every conversion from float to int (masking), uses “_ftol”. _ftol uses “fldcw”, which causes serialization, and it causes memory stalls. We avoided using _ftol by writing an alternative code for the masking.We also found _ctrlfp, which is used as part of the C function rint. _ctrlfp uses “fldcw”, and we avoided using it, by writing an alternative code for rint, as well.

slide5

64K Aliasing64k aliasing happens when a procedure works on two data segments that are placed on cache lines that have exactly (n)mod(64k) between them. The problem is that memory addresses with the same lower 16 bits will be mapped into the same place in the cache. Since both pieces of memory cannot occupy the same cache line simultaneously, the cache thrashes.We found out that some data, which is called and used many times in Ogg Vorbis was not congruent.Ogg Vorbis had a great problem with 64K aliasing. We mapped the data correctly (using different banks) and got better results.

slide6

ThreadingHyper-Threading Technology enables multi-threaded software applications to execute threads in parallel.We looked at the first two time consuming functions and found out that they can be parallelized.

slide7

SIMD Single Instruction Multiple Data (SIMD) method enables the programmer to develop algorithms that can mix packed, single-precision, floating-point and integer using both SSE and MMX instructions respectively.We looked for loop sequences that contain linear calculations with arrays within the hottest functions.

slide8

Yeild gained from each strategyRemoving architectural pitfallsBy writing an alternative code to _ftol, called FLT2INT, we succeeded to gain 4% of performance.By writing an alternative code to rint, we succeeded to gain 4% of performance.By dropping the 64K aliasing, we succeded to gain 6% of performance.That makes a total of 14% gain of performance for the pitfall strategy.

slide9

ThreadingWe parallelized the noise masker and the tone masker, which had no dependency between each other (functional decomposition).No special profit was given by doing this optimization, and the total speedup of this optimization was 2%SSETuning is still in progress. No profit was seen yet.

slide10

Main achievementsArchitectural pitfalls:By writing the alternative code, we succeeded to remove most of the architectural pitfalls that we found.Threading:Parallelized two functions which were not dependant on each other.SIMD:We translated the loops from using instructions that work with architectural registers into instructions that work with SIMD registers.

slide11

Performance boostThe total performance gained from using all the 3 strategies, was 16%. A sample file of 100MB which was encoded at 50 seconds before the optimization, was encoded at 42 seconds afterwards.