Entropy Slices for Parallel Entropy Coding K. Misra, J. Zhao and A. Segall

Entropy Slices for Parallel Entropy CodingK. Misra, J. Zhao and A. Segall

Entropy Slices • Introduction: Entropy Slice • Introduce partitioning of slices into smaller “entropy” slices • Entropy slice • Reset context models • Restrict definition for neighborhood • Process identical to current slice by entropy decoder • Key difference: reconstruction uses information from neighboring entropy slices

Entropy Slices • We now introduce major advantages for the entropy slice concept • Advantage #1 - Parallelization: • Entropy slices do not depend on information outside of the entropy slice and can be decoded independently • Allows for parallelization of entire entropy decoding loop – including context adaptation and bin coding • Advantage #2 - Generalization • Entropy slices can be used for all entropy coding engines currently under study in the TMuC and TMuC software • Moreover, we have software available for PIPE and CABAC • V2V • CABAC • PIPE • UVLC

Entropy Slices • Advantages #3 – No impact on single thread/core: • Parallelization capability does not come at the expense of single thread/core applications • A single thread/core process may • Decode all entropy slices prior to reconstruction OR • Decode entropy slice and then reconstruct without neighbourhood reset • This is friendly to any architecture

Entropy Slices • Advantage #4 –Easy Adaptation to Decoder Design • Bit-stream can be partitioned into a large number of entropy slices with little overhead • For example, we will show performance of 32 entropy slices for 1080p on next slide – this would translate to ~128 slices for 4k. • Decoder can schedule N entropy decoders easily, where N is arbitrary • One example: for 32 slices, architecture with parallelization of 4 (N=4) would assign 8 slices per decoder. • Another example: for 32 slices, architecture with N=8 would assign 4 slices per decoder • Additionally, for large resolutions (4k,8k) possible to scale to 100s of decoders for GPU implementations

Advantage #5 –Coding Efficiency Insertion of Entropy Slices results in negligible impact on coding efficiency. For example, if configure the encoder for a parallelization factor of 32, we get: Entropy Slices

Entropy Slices • Advantage #6 –Specification • Entropy slices allow simple and direct specification of parallelization at the Profile and Level stage • This is accomplished by: • Specifying the maximum number of bins in an Entropy Slice • Specifying the maximum number of Entropy Slices per picture • Allows addition specification of PIPE/V2V configurations • Maximum number of bins per bin coder in an Entropy Slice • Additional advantage: straightforward to determine conformance at encoder

Entropy Slices • Syntax • Slice header • Indicate slice is “entropy slice” • Send only information necessary for entropy decoding

Conclusions • We have presented the concept of an “entropy slice” for the HEVC system • Advantages include: • Parallel entropy decoding (both context adaptation and/or bin coding) • Generalization to any entropy coding system under study • No impact on serial implementations • Easy adaptation to different parallelization factors at the decoder • Negligible impact on coding efficiency (<0.2%) • Direct path for specifying parallelization at the profile/level stage • Software is available

Entropy Slices • In the last meeting, two topics were discussed • Size of entropy slice headers • Extension to potential architectures that do not decouple parsing and reconstruction • We address these in the next slides…

Entropy Slices • Header Size • Very small (as asserted previously) • Quantitative • 2 bytes + NALU (1 byte) for 1080p • Scales for resolutions due to first_lctb_in_slice

Entropy Slices • Extension to additional architectures • Previous meeting there was interest in extending the method to architectures that do no buffer symbols between parsing and reconstruction • This anticipates “joint-wave-front” processing of both parsing and reconstruction loops • We investigated this issue and concluded the following: • In the current TMuC design, we observe that it is not possible to do wavefront processing of the parsing stage. • If we configure the TMuC to support wavefront parsing, the extension of entropy slices is straightforward

Entropy Slices Our approach: provide additional entry-points without neighbor restriction “Entropy slice” entry-points EC Init EC Init EC Init EC Init EC Init : Use cabac_init_idc to initialize entropy coder Confidential 13

Entropy Slices Entropy + Reconstruction steps : 16 Confidential 14

Syntax Signal that the bin coding engine will be reset at start of each LCU row Allow signaling cabac_init_idc for the reset Entropy Slices

Entropy Slices 4x parallelism: Maintain initial 32x parallelism Additionally: Four entry points in the ES (aligned with LCU rows; result 4x speedup) RD performance - .3% Max parallelism: Maintain initial 32x parallelization Additionally: one entry point for every LCU row 17x for 1080p RD performance - .5-1% • Performance

Entropy Slices • Conclusion • Entropy slices well tested and flexible • Demonstrated in multiple environments (JM, JMKTA, TMuC) • Demonstrated with CABAC and CAV2V • Friendly to serial and parallel architectures (including both decoupled and coupled parsing/reconstruction architectures) • From the last meeting: “The basic concept of desiring enhanced high-level parallelism of the entropy coding stage to be in the HEVC design is agreed.” • We propose • Adoption of the entropy slice technology into the TM • Evaluation of the “joint-wavefront” extension in a CE

Entropy Slices for Parallel Entropy Coding K. Misra, J. Zhao and A. Segall

Entropy Slices for Parallel Entropy Coding K. Misra, J. Zhao and A. Segall

Presentation Transcript

Entropy

Entropy

ENTROPY

Entropy

Entropy

Entropy

Entropy

Entropy coding

Entropy

CE6.d: Parallel Intra Coding JCTVC-F605 Jie Zhao and Andrew Segall

Entropy

Entropy

Entropy coding (Lempel/Ziv)

Entropy

Entropy (?)

Entropy

Entropy

Entropy

Entropy

Entropy