Loading in 2 Seconds...
Loading in 2 Seconds...
A Case for Globally Shared-Medium On-Chip Interconnect Enhancing Effective Throughput for Transmission Line-Based Bus. Aaron Carpenter, Jianyun Hu , Jie Xu, Michael Huang, Hui Wu University of Rochester. Motivation: e.g. 5x5 mesh. Worse case: 4+4 = 8 hops
Aaron Carpenter, JianyunHu, Jie Xu,
Michael Huang, Hui Wu
University of Rochester
Worse case: 4+4 = 8 hops
Per hop = pipeline delay + queue delay
Example: 5 + 10 = 15 clock cycles/hop
WC 15 * 8 = 120 clock cycles
@ 1G Hz clock = 120 ns
Much slower than DRAM access
Serpentine routing through every hubs
1. A setup step is performed to “wake up” the transmitter i.
2. In the background, the arbiter passes on the grant to node j
3. Need the time to drain the signal (waiting for the last bit is transmitted).
4. Arbiter can process next task.
Put arbiter in the middle?
1. Increasing raw link throughput.
2. Increasing the utilization efficiency.
3. Optimization on the use of buses.
First, we turn to 4-PAM which double the data rate compared to OOK. The additional circuit has a DAC for transmitter and ADC for receiver. These elements increase energy and latency, we use it only for data packet bus to minimize latency impact.
Then we use Frequency Division Multiplexing (FDM), it allows us to use higher frequency band. The attenuation in these band increase with frequency and can be high. When it used as global bus, the higher band becomes lossy. The higher frequency channel are intended for shorter communication instead of in long transmission lines.
We also have a circuit support includes mixer for transmitter and receiver side and a filter for receiver end.
But it is challenging to estimate the power cost of support circuitry. We use a simplify analysis to estimate the minimum power cost to support frequency-division and multi-band transmission.
1. Long lines means it take long time to drain from transmission line.
2. Packet destined for near neighbor structure are poor match to the global line structure.
Partitioning, wave-based arbitration, segmentation
1:Pass gate is a passive, bi-directional connection. It will add a little bit attenuation and signal distortion, but it can be accepted.
2: Two separate uni-directional amplifiers. The cost of this approach is the power consumption for the amplifier. But with these amplifiers, source transmitter power can be lower since signal can travel at most the length of one segment.
With a packet-switched network, protocols rely explicit invalidation acknowledgement to provide completion.
The explicit acknowledgement can be avoided if the interconnect offers certain capability to infer the deliver.
Transmission line can allow multicast operation. It is easy to support small number of receiver operating. But there is a acceptable attenuation. Even though it may not reduce traffic dramatic, it cut latency and queuing delay.
a total pitch of 45μm and a line width of 10μm
The transmission lines are of a serpentine shape and measure about 7.5cm in total length
· Traffic and Performance Analysis
The L1 miss rate of these applications ranges up to 61 misses per thousand instructions (MPKI).
a. Percentage of L2 accesses that are remote
b. Speedup due to clustering
left is for 1 core per node, the right bar is for 2 cores per node. The baseline in this case is a 16-core mesh