Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir

Optimizing Ogg Vorbis performance using architectural considerationsAdir AbrahamandTal Abir

Ogg Vorbis is a fully open, non-proprietary, patent-and-royalty-free, general-purpose compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channel. This places Vorbis in the same competitive class as audio representations such as MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3, MPEG-4 audio (TwinVQ), WMA and PAC.

Strategies used to increase Ogg Vorbis’ performance* We looked for architectural pitfalls, and created an alternative, optimized code instead.* We used threading in order to use HyperThreading capabilities of the processor.* We used SSE programming, in order to make faster, parallelized calculations.

Cleaning architectural pitfallsSerialized instructionsAfter using VTune to analyze the results, we found that every conversion from float to int (masking), uses “_ftol”. _ftol uses “fldcw”, which causes serialization, and it causes memory stalls. We avoided using _ftol by writing an alternative code for the masking.We also found _ctrlfp, which is used as part of the C function rint. _ctrlfp uses “fldcw”, and we avoided using it, by writing an alternative code for rint, as well.

64K Aliasing64k aliasing happens when a procedure works on two data segments that are placed on cache lines that have exactly (n)mod(64k) between them. The problem is that memory addresses with the same lower 16 bits will be mapped into the same place in the cache. Since both pieces of memory cannot occupy the same cache line simultaneously, the cache thrashes.We found out that some data, which is called and used many times in Ogg Vorbis was not congruent.Ogg Vorbis had a great problem with 64K aliasing. We mapped the data correctly (using different banks) and got better results.

ThreadingHyper-Threading Technology enables multi-threaded software applications to execute threads in parallel.We looked at the first two time consuming functions and found out that they can be parallelized.

SIMD Single Instruction Multiple Data (SIMD) method enables the programmer to develop algorithms that can mix packed, single-precision, floating-point and integer using both SSE and MMX instructions respectively.We looked for loop sequences that contain linear calculations with arrays within the hottest functions.

Yeild gained from each strategyRemoving architectural pitfallsBy writing an alternative code to _ftol, called FLT2INT, we succeeded to gain 4% of performance.By writing an alternative code to rint, we succeeded to gain 4% of performance.By dropping the 64K aliasing, we succeded to gain 6% of performance.That makes a total of 14% gain of performance for the pitfall strategy.

ThreadingWe parallelized the noise masker and the tone masker, which had no dependency between each other (functional decomposition).No special profit was given by doing this optimization, and the total speedup of this optimization was 2%SSETuning is still in progress. No profit was seen yet.

Main achievementsArchitectural pitfalls:By writing the alternative code, we succeeded to remove most of the architectural pitfalls that we found.Threading:Parallelized two functions which were not dependant on each other.SIMD:We translated the loops from using instructions that work with architectural registers into instructions that work with SIMD registers.

Performance boostThe total performance gained from using all the 3 strategies, was 16%. A sample file of 100MB which was encoded at 50 seconds before the optimization, was encoded at 42 seconds afterwards.

Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir

Optimizing Ogg Vorbis performance using architectural considerations Adir Abraham and Tal Abir

Presentation Transcript

Optimizing Network Performance

Optimizing System Performance

Optimizing Cost and Performance for Multihoming

Optimizing Cost and Performance for Multihoming

Optimizing Performance Using Code Profilers

Optimizing Lustre Performance Using Stripe-Aware Tools

Optimizing Performance

Optimizing Performance

CUDA Lecture 10 Architectural Considerations

Optimizing Herbicide Performance

DX10, Batching, and Performance Considerations

An Introduction to the “Thor-like” Power of Ogg Vorbis !

Embedded Ogg Vorbis Audio Player

Optimizing Performance 2

70-432 – Optimizing Performance

Optimizing Pipeline Performance Market

Architectural Glass Manufacturing Considerations

Optimizing Distributed Application Performance Using Logistical Networking

Optimizing System Performance