National Sun Yat-sen University Embedded System Laboratory Efficient Network Interface Architecture for Network-on-Chips - PowerPoint PPT Presentation

National Sun
1 / 15

  • Uploaded on
  • Presentation posted in: General

National Sun Yat-sen University Embedded System Laboratory Efficient Network Interface Architecture for Network-on-Chips. Presenter : Cheng_Ta Wu. Masoumeh Ebrahimi , Masoud Daneshtalab , N P Sreejesh , Pasi Liljeberg , Hannu Tenhunen

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

National Sun Yat-sen University Embedded System Laboratory Efficient Network Interface Architecture for Network-on-Chips

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

National sun yat sen university embedded system laboratory efficient network interface architecture for network on chips

National Sun Yat-sen University Embedded System LaboratoryEfficient Network Interface Architecture for Network-on-Chips

Presenter : Cheng_Ta Wu

MasoumehEbrahimi, MasoudDaneshtalab, N P Sreejesh, PasiLiljeberg, HannuTenhunen

Department of Information Technology, University of Turku, Turku, Finland




  • Abstract

  • What’s the problem

  • Related works

  • The proposed method

  • Experiment Results



In this paper, we present novel network interface architecture for on-chip networks to increase memory parallelism and to improve the resource utilization. The proposed architecture exploits AXI transaction based protocol to be compatible with existing IP cores. Experimental results with synthetic test case demonstrate that the proposed architecture outperforms the conventional architecture in term of latency.

What s the problem

What’s the problem

  • According to our observation, the utilization of reorder buffer in NIs is significantly low. Therefore, the traditional buffer management is not efficient enough for NIs.

Related works

Related works


Supporting shared memory abstraction and flexible network configuration


NISAR (network interface architecture supporting adaptive routing)


Transaction ID renaming

Increasing latency

Low buffer utilization, and no support burst transaction


Moving the reorder buffer resources from NI into network routers

Using global synchronization the performance might be degraded, and the cost of hardware overhead is too high

The proposed method

The proposed method

  • Master-side NI architecture

  • Slave-side NI architecture

Master side ni architecture

Master-side NI architecture

  • Both NI are partitioned into two paths

    • Forward path: transferring the requests to the network

      • AXI-Queue, Packetizer unit, Reorder unit

    • Reverse path: receiving the responses from the network

      • Packet-Queue, Depacketizer unit, Reorder unit

Introduce of the axi queue and packetizer unit

Introduce of the AXI-Queue and Packetizer unit

  • AXI-Queue:

    • Performs the arbitration between the write and read transaction channels and stores requests into write or read requests buffers.

    • If admitted by the reorder unit the request message will be sent to the packetizer unit.

  • Packetizer:

    • Convert incoming messages from the AXI-Queue into header and data flits.

Introduce of the packet queue and depacketizer unit

Introduce of the Packet-Queue and Depacketizer unit

  • Packet-Queue:

    • Receives packets from the router.

    • If the packet is out of order(according to the sequence number), it is transmitted to the reorder buffer, otherwise it will be delivered to the Depacketizer unit directly.

  • Depacketizer:

    • restore packets coming from either the reorder buffer or Packet-Queue into the original data format of the AXI master core.

Introduce of the reorder unit

Introduce of the Reorder unit

  • Including a Status-Register, a Status-Table, a Reorder buffer, and a Reorder-Table

  • In the forward path:

    • Preparing the sequence number for corresponding transaction ID, and avoiding overflow of the reorder buffer by the admittance mechanism are provided by this unit.

  • In the reverse path:

    • Determines where the outstanding packets from the packet-queue should be transmitted(recorder buffer or Depacketizer), and when the packets in the reorder buffer could released to the depacketizer

Introduce of the reorder unit cont

Introduce of the Reorder unit (cont.)

  • Status-Register and Status-Table:

    • Status-Register:

      • It’s an n-bit register where each bit corresponds to one of the AXI transaction IDs. This register records whether there are one or more messages with the same transaction ID being issued or not.

    • Status-Table:

      • Each entry of this table is considered for messages with the same transaction ID, and includes valid tag (v), Transaction ID (T-ID), Number of outstanding Transactions (N-T), and the Expecting Sequence number (E-S).

Size_nm: size of new message

Size_AOM: size of all outstanding messages

Introduce of the reorder unit cont1

Introduce of the Reorder unit (cont.)

  • Reorder-table and reorder-buffer

    • Each row of the reorder table corresponds to an out-of-order packet stored in the reorder buffer.

    • Reorder-Table includes the valid tag (v), the transaction ID (T-ID), the sequence number (S-N),and the head pointer (P).

    • Whenever an in-order packet delivered to the depacketizer unit, the depacketizer controller checks the reorder table for the validity of any stored packet with the same transaction ID and next sequence number. If so, the stored packet will be released from the reorder unit to the depacketizer unit.

Slave side ni architecture

Slave-side NI architecture

  • To avoid losing the order of header information carried by arriving requests, a FIFO has been considered

Experimental results

Experimental results

  • In the first configuration (A), out of 25 nodes, ten nodes are assumed to be processor (master cores-with master NI) and other fifteen nodes are memories (slave cores-with slave NI).

  • For the second configuration (B), each node is considered to have a processor and a memory (master core with master-NI, and slave cores with slave-NI).

  • Latency defined as the number of cycles between the initiation of a request operation issued by a master and the time when the response is completely delivered to the master from the memory.

  • And the request rate is defined as the ratio of the successful read/write request injections into the NI over the total number of injection attempts.

Baseline architecture is according to the reference [5][6]

Research tree

Research tree

Efficient Network Interface Architecture for Network-on-chip

Protocol Transducer Synthesis using Divide and Conquer approach

Automatic Interface Synthesis based on the Classification of Interface Protocols of IPs

  • Login