170 likes | 409 Views
Entropy Slices for Parallel Entropy Coding K. Misra, J. Zhao and A. Segall. Entropy Slices. Introduction: Entropy Slice Introduce partitioning of slices into smaller “entropy” slices Entropy slice Reset context models Restrict definition for neighborhood
E N D
Entropy Slices for Parallel Entropy CodingK. Misra, J. Zhao and A. Segall
Entropy Slices • Introduction: Entropy Slice • Introduce partitioning of slices into smaller “entropy” slices • Entropy slice • Reset context models • Restrict definition for neighborhood • Process identical to current slice by entropy decoder • Key difference: reconstruction uses information from neighboring entropy slices
Entropy Slices • We now introduce major advantages for the entropy slice concept • Advantage #1 - Parallelization: • Entropy slices do not depend on information outside of the entropy slice and can be decoded independently • Allows for parallelization of entire entropy decoding loop – including context adaptation and bin coding • Advantage #2 - Generalization • Entropy slices can be used for all entropy coding engines currently under study in the TMuC and TMuC software • Moreover, we have software available for PIPE and CABAC • V2V • CABAC • PIPE • UVLC
Entropy Slices • Advantages #3 – No impact on single thread/core: • Parallelization capability does not come at the expense of single thread/core applications • A single thread/core process may • Decode all entropy slices prior to reconstruction OR • Decode entropy slice and then reconstruct without neighbourhood reset • This is friendly to any architecture
Entropy Slices • Advantage #4 –Easy Adaptation to Decoder Design • Bit-stream can be partitioned into a large number of entropy slices with little overhead • For example, we will show performance of 32 entropy slices for 1080p on next slide – this would translate to ~128 slices for 4k. • Decoder can schedule N entropy decoders easily, where N is arbitrary • One example: for 32 slices, architecture with parallelization of 4 (N=4) would assign 8 slices per decoder. • Another example: for 32 slices, architecture with N=8 would assign 4 slices per decoder • Additionally, for large resolutions (4k,8k) possible to scale to 100s of decoders for GPU implementations
Advantage #5 –Coding Efficiency Insertion of Entropy Slices results in negligible impact on coding efficiency. For example, if configure the encoder for a parallelization factor of 32, we get: Entropy Slices
Entropy Slices • Advantage #6 –Specification • Entropy slices allow simple and direct specification of parallelization at the Profile and Level stage • This is accomplished by: • Specifying the maximum number of bins in an Entropy Slice • Specifying the maximum number of Entropy Slices per picture • Allows addition specification of PIPE/V2V configurations • Maximum number of bins per bin coder in an Entropy Slice • Additional advantage: straightforward to determine conformance at encoder
Entropy Slices • Syntax • Slice header • Indicate slice is “entropy slice” • Send only information necessary for entropy decoding
Conclusions • We have presented the concept of an “entropy slice” for the HEVC system • Advantages include: • Parallel entropy decoding (both context adaptation and/or bin coding) • Generalization to any entropy coding system under study • No impact on serial implementations • Easy adaptation to different parallelization factors at the decoder • Negligible impact on coding efficiency (<0.2%) • Direct path for specifying parallelization at the profile/level stage • Software is available
Entropy Slices • In the last meeting, two topics were discussed • Size of entropy slice headers • Extension to potential architectures that do not decouple parsing and reconstruction • We address these in the next slides…
Entropy Slices • Header Size • Very small (as asserted previously) • Quantitative • 2 bytes + NALU (1 byte) for 1080p • Scales for resolutions due to first_lctb_in_slice
Entropy Slices • Extension to additional architectures • Previous meeting there was interest in extending the method to architectures that do no buffer symbols between parsing and reconstruction • This anticipates “joint-wave-front” processing of both parsing and reconstruction loops • We investigated this issue and concluded the following: • In the current TMuC design, we observe that it is not possible to do wavefront processing of the parsing stage. • If we configure the TMuC to support wavefront parsing, the extension of entropy slices is straightforward
Entropy Slices Our approach: provide additional entry-points without neighbor restriction “Entropy slice” entry-points EC Init EC Init EC Init EC Init EC Init : Use cabac_init_idc to initialize entropy coder Confidential 13
Entropy Slices Entropy + Reconstruction steps : 16 Confidential 14
Syntax Signal that the bin coding engine will be reset at start of each LCU row Allow signaling cabac_init_idc for the reset Entropy Slices
Entropy Slices 4x parallelism: Maintain initial 32x parallelism Additionally: Four entry points in the ES (aligned with LCU rows; result 4x speedup) RD performance - .3% Max parallelism: Maintain initial 32x parallelization Additionally: one entry point for every LCU row 17x for 1080p RD performance - .5-1% • Performance
Entropy Slices • Conclusion • Entropy slices well tested and flexible • Demonstrated in multiple environments (JM, JMKTA, TMuC) • Demonstrated with CABAC and CAV2V • Friendly to serial and parallel architectures (including both decoupled and coupled parsing/reconstruction architectures) • From the last meeting: “The basic concept of desiring enhanced high-level parallelism of the entropy coding stage to be in the HEVC design is agreed.” • We propose • Adoption of the entropy slice technology into the TM • Evaluation of the “joint-wavefront” extension in a CE