Looking Under the Hood

Looking Under the Hood

Objectives After completing this module, you will be able to: • Understand what is happening behind the scenes and how your decisions can impact the quality of the result • Use good design habits to achieve the best solution

Outline • Quantization and Overflow • The Costs of Hardware of System Abstraction • Bit Picking • Tips for Good Designs Using System Generator

Quantization and Overflow

Full Precision 1 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0 0 FIX_12_9 -Truncate - Round 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 0 0 FIX_12_9 Quantization • Occurs if the number of fractional bits is insufficient to represent the fractional portion of a value • Users can choose to: • Truncate - Discard bits to the right of the least significant bit • Round - Round to the nearest representable value or to the value farthest from zero if there are two equidistant nearest representable values -2.26171875 -2.26171875

0 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 Quantization • A signed full precision number will have a different output depending on whether truncation or rounding is employed Full Precision 0 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0 0 1.7392578125 FIX_12_9 -Truncate - Round 1.73828125 FIX_12_9 1.740234375

1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 Quantization • An unsigned full precision number will have a different output depending on whether truncation or rounding is employed Full Precision 1 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0 0 5.7392578125 UFIX_12_9 -Truncate - Round 5.73828125 UFIX_12_9 5.740234375

Full Precision output (13.6875) 0 1 1 0 1 1 0 1 1 - Saturate (3.9375) - Wrap (-2.3125) FIX_7_4 0 1 1 1 1 1 1 FIX_7_4 1 0 1 1 0 1 1 Overflow • Occurs if a value lies outside the representable range • Users can choose to: • Saturate to the largest positive (or maximum negative) value • Wrap the value (i.e., discard any significant bits beyond the most significant bit in the fixed point number) • Flag an overflow as a Simulink error during simulation

Quantization and Overflow • Whatever option is selected, the generated HDL model and Simulink model will behave identically • This also means rounding and saturation will use FPGA resources

MultGen v6.0 IP Wrappers Every IP core has a wrapper to interface between Simulink and hardware • Each SysGen block has an RTLVHDL wrapper to • Extend IP core functionality • Simplify IP core interface • Support fixed-point arithmetic • Number of bits, binary point • Overflow and quantization • Valid bit control on some cores • Note: a Simulink parameter may not be identical to the corresponding COREGen parameter • Wrapper file will have xl<core-name> in its filename without core keyword xlMult.vhd COREGen IP Core (xmult_x_0_core.vhd) SysGen VHDL IP Core Wrapper (xmult_x_0.vhd)

Implications of IP Wrappers • Saturation arithmetic and rounding requires hardware (full adder) • Excess latency is implemented with a shift register (SRL16) on the core output • Some SysGen blocks perform implicit conversion of inputs • Unsigned to signed • Sign extension • Zero padding • SysGen mask parameters may not be identical to COREGen parameters • Valid bit pipeline parallels to the data path, typically implemented using SRL16 System level abstraction is very expressive and powerful, but comes at some expense in hardware. Be aware!

Lab 6 Looking Under the Hood You are given a simple design of adder circuit: • Make changes to a System Generator model to see what the results would be in hardware by changing saturation arithmetic and rounding • Use the Resource Estimator block to estimate the resources usage in each case • Use an RTL viewer to see the changes, and • Use the Xilinx implementation results to determine the cost of these system-level decisions

Picking Bits: Why We Do It • To combine two data buses together to form a new bus • To force a conversion of data type including the number of bits and binary bits • To reinterpret unsigned data as signed, or the converse • To extract certain bits of data, especially when there is bit growth

The Xilinx Blocks • Convert • Available in basic elements, data types, math, and index libraries • Concat • Available in basic elements, data types, and index libraries • Slice • Available in basic elements, control logic, data types, and index libraries • Reinterpret • Available in basic elements, math, and index libraries

The Concat Block • Performs a concatenation of two bit vectors • Both inputs must be unsigned integers • i.e., two unsigned numbers with binary points at position zero • Reinterpret block provides signed to unsigned conversion capabilities that can extend the functionality of the concat block • Does not use Xilinx LogiCORE and hardware resources

The Convert Block • The Xilinx convert block converts each input sample to a number of a desired arithmetic type • A number can be converted to a signed (twos complement) or unsigned value • Total number of bits and binary point are specified by the user • Rounding and quantization options apply to the output value • Does not use Xilinx LogiCORE but may use additional hardware depending on the overflow and quantization options

The Convert Block • What is it doing? • User specifies the total number of bits, where the binary point is, and the arithmetic type (signed or unsigned) • First it lines up the binary point between input and output port types • Next, the total number of bits and binary point the user specifies are used, and depending if overflow and quantization options are used the output may change, as opposed to dropping bits

FIX_10_8 0 0 1 1 0 0 0 FIX_7_4 The Convert Block • The following through the convert block would result in the same value using a different number of bits and binary point

The Convert Block • Saturating the overflow may change the fractional number to get the saturated value • Rounding the quantization may also affect the value to the left of the binary point (the whole number)

The Convert Block • When we convert to a Fix_6_0, how do we get two different values? OVERFLOW QUANTIZATION - Wrap - Saturate - Flag Error - Truncate - Round FIX_10_8 Round to decimal +2 Add ‘1’ to round FIX_6_0 Truncate to decimal +1 Drop the bits FIX_6_0

The Reinterpret Block • Forces its output to a new type without any regard for retaining the numerical value represented by the input • Total number of bits in = total number of bits out • Allows for unsigned data to be reinterpreted as signed data, and the converse • Also allows scaling of the data through the repositioning of the binary point • Does not use Xilinx LogiCORE and hardware resources

The Reinterpret Block • Reinterpret the UFIX_10_8 number and force the binary point to position 5 +1.5 FIX_10_8 +12 FIX_10_5

The Slice Block • The Xilinx slice block allows you to slice off a sequence of bits from your input data and create a new data value • The output data type is unsigned with its binary point at zero

+1.5 0 1 1 0 0 0 0 0 0 0 12 6 0 1 1 0 12 The Slice Block • Take a slice of the FIX_10_8 number by taking a 4-bit slice and offsetting the bottom bit of the slice by 5 bits • Upper Bit Location + Width: Offset of top bit from MSB = 0 and width = 4 • Two Bit Locations: Offset of top bit from MSB of Input = -1 and Offset of Bottom bit from LSB of Input = 5

What Values Do You Expect? Signed Data Truncate and Wrap Signed Data Output Binary Point of 3 Total Number of Bits 3 Bottom Slice offset by 5 from the LSB

SysGen Design Tips • Remember that saturation arithmetic and rounding have area and performance costs. Use only as necessary • Register inputs and outputs, and register (rather than ignore) invalid data when possible

Looking Under the Hood