Ece 526 network processing systems design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

ECE 526 – Network Processing Systems Design PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

ECE 526 – Network Processing Systems Design. IXP XScale and Microengines Chapter 18 & 19: D. E. Comer. Overview. Recalled Packet processing functions (forwarding, queuing…) Traditional network processing systems (CPU + NICs) General network processor architecture and tradeoffs

Download Presentation

ECE 526 – Network Processing Systems Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ece 526 network processing systems design

ECE 526 – Network Processing Systems Design

IXP XScale and Microengines

Chapter 18 & 19: D. E. Comer


Overview

Overview

  • Recalled

    • Packet processing functions (forwarding, queuing…)

    • Traditional network processing systems (CPU + NICs)

    • General network processor architecture and tradeoffs

    • Intel IXP network processors overall architecture

  • Focus on individual components of Intel IXP chip

    • Control processor (slow path): XScale core

      • Overall architecture

      • Typical functions

      • Processor features

    • Packet processing processor (fast path): Microengines

      • Architecture and features

      • Differences to conventional processors

      • Pipelining and multi-threading

ECE 526


Purpose of control processor

Purpose of Control Processor

  • Functions typically executed by embedded control proc:

    • Bootstrapping

    • Exception handling

    • Higher-layer protocol processing

    • Interactive debugging

    • Diagnostics and logging

    • Memory allocation

    • Application programs (if needed)

    • User interface and/or

      interface to the GPP

    • Control of packet processors

    • Other administrative functions

ECE 526


Xscale memory architecture

XScale Memory Architecture

  • Memory architecture

    • Uses 32-bit linear address space

    • configurable endian mode

    • Byte addressable

  • Memory Mapping

    • Allocation of address space (2^32) to different system components

    • Accesses to memory is translated into access to component

    • Needs to be carefully crafted

  • XScale assumes byte addressable memory

    • Underlying memory uses different size (SDRAM)

    • How does this work?

  • Support for Virtual Memory

    • For demand paging to secondary storage

ECE 526


Shared memory address issues

Shared Memory Address Issues

  • Memory is shared between XScale and Microengines

  • Same data, but different addresses

  • What impact does this have?

    • Pointers need to be translated

    • Data structures with pointers can not be shared

ECE 526


Microengines

Microengines

  • Microengines are data-path packet processors IXP

  • IXP 2400 have 8 Microengines

  • Simpler than XScale

  • Low level device

    as a micro-sequencer

  • Optimized for

    packet processing

  • More complex to use

  • Often abbreviated as uE

ECE 526


Ue functions

uE Functions

  • uEs handle ingress and egress packet processing:

    • Packet ingress from physical layer hardware

    • Checksum verification

    • Header processing and classification

    • Packet buffering in memory

    • Table lookup and forwarding

    • Header modification

    • Checksum computation

    • Packet egress to physical layer hardware

ECE 526


Ue architecture

uE Architecture

  • uE characteristics:

    • Programmable microcontroller

    • RISC design

    • 256 general-purpose registers

    • 512 transfer registers

    • 128 next neighbor registers

    • Hardware support for 8 threads and context switching

    • 640 words of local memory

    • Control of an Arithmetic and Logic Unit

    • Direct access to various functional units

    • A unit to compute a Cyclic Redundancy Check (CRC)

ECE 526


Ue as micro sequencer

uE as Micro-sequencer

  • Micro-sequencer does not contain native instructions for possible operations

    • Instead of using instructions, uE invokes functional units to perform operations

    • Control unit is much “simpler”

  • Example 1:

    • uE does not have ADD R2,R3 instruction

    • Instead: ALU ADD R2, R3

    • “ALU” indicates that ALU should be used

    • “ADD” is a parameter to ALU

  • Example 2:

    • Memory access not by simple LOAD R2, 0xdeadbeef

    • Instead: SRAM LOAD R2, 0xdeadbeef

  • Altogether similar to normal processor, but more basic

ECE 526


Ue instruction set

uE Instruction Set

  • General

    • ALU and etc

  • Brach and Jump

    • BR: branch unconditionally

  • CAM

    • CAM_CLEAR: clear all entries in local memories

  • I/O and context swap

    • SCRATCH (read and write)

  • For detail see Figure 19.1, 19.2, Comer.

ECE 526


Ue memories

uE Memories

  • uEs: viewing memories differently than XScale does

    • Does not map memories and I/O devices into a liner address space

    • Does not view memories as a seamless, uniform repository

  • uE ISA: requiring a separate instruction for each type of memory and I/O device

    • SRAM[read, $$x, address1, address2…]

  • Programmer: required binding of data items to specific type of memory permanently.

ECE 526


Execution pipeline

Execution Pipeline

  • What is pipeline?

  • Why pipeline is employed?

    • One instruction is executed per cycle if pipeline is proper designed

  • uEs use five-stage or six-stage pipeline:

ECE 526


Pipelining

Pipelining

ECE 526


Pipelining problems

Pipelining Problems

  • Possible sources of pipelining problems

    • Data dependencies

    • Control dependencies

    • Resource dependencies

    • Memory accesses

  • How pipelining problem impact system performance

  • How these impact can be removed or reduced

    • Remove the sources so that no stall happened

    • Hide the impact of pipelining stall

ECE 526


Pipeline stalls

Pipeline Stalls

  • K: ALU ADD R2, R1, R2

  • K+1 ALU ADD R3, R2, R3

  • Control dependencies, memory have even bigger impact

ECE 526


Threading illustration

Threading Illustration

ECE 526


Hardware threads

Hardware Threads

  • uEs support 8 hardware thread contexts

    • One thread can execute at any given time

    • When stall occurs, uE can switch to other thread (if not stalled)

  • Very low overhead for context switch

    • “Zero-cycle context switch”

    • Effectively can take around three cycles due to pipeline flush

  • Switching rules

    • If thread stalls, check if next is ready for processing

    • Keep trying until ready thread is found

    • If none is available, stall uE and wait for any thread to unblock

  • Improves overall throughput

  • Questions:

    • Why not 16, 32 threads

    • why not have 48 uEs with 1 thread?

ECE 526


Summary

Summary

  • Control processor (slow path): XScale core

    • Overall architecture

    • Typical functions

    • Processor features

  • Packet processing processor (fast path): Microengines

    • Architecture and features

    • Differences to conventional processors

    • Pipelining and multi-threading

  • ECE 526


    Lab3 brief

    Lab3 Brief

    • Intel Reference Systems

    • SDK Tutorial

    • Lab 3

    ECE 526


    Intel reference systems

    Intel Reference Systems

    • Hardware Testbed

      • IXP2400 network processors

      • QDRM-SRAM, Flash ROM and other memories

      • 1G optical ethernet ports

      • 100M ethernet management port

      • Serial interface

      • PCI interfaces

    • SDK (software development kit)

      • Compiler

      • Assembler, linker

      • Simulator

      • Reference codes

    ECE 526


    Lab3 forwarding counting classification

    Lab3: Forwarding, Counting & Classification

    • Goal: to explore the basic functionalities of the IXP2400 software development kit and Microengines.

    • 3 parts:

      • Part I: collecting a number of workload statistics from the IXP SDK simulator. Follow steps of lab instruction.

      • Part II: adding one counting block to count the number of packets.

      • Part III: implementing a simple packet classification mechanism.

    • Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA.

    ECE 526


    Part i forwarding simulation

    Part I: Forwarding Simulation

    • run an implementation of IP forwarding on the IXP2400 simulator. All the code is provided to you.

    • collect a set of workload statistics that are reported by the simulator.

    ECE 526


    Part ii forwarding and counting

    Part II: Forwarding and Counting

    • modify above applications by adding counter block

    • store how many packets are received.

    ECE 526


    Part iii classification and counting

    Part III: Classification and Counting

    • classifying packets based on the packet header information. There are four types of traffic that are considered in this lab:

      • Web traffic over TCP over IPv4

      • Non-Web traffic over TCP over IPv4

      • UDP over IPv4

      • IPv6

    • modifying the code to report the number of packets

      in each type.

    ECE 526


    How to do lab3

    How to do Lab3

    • Windows machine with SDK installed

    • Download lab instructions and source code from blackboard

    • Start early.

    • Very exciting lab.

    • Due day

      • Part I and Part II 10/13

      • Part III 10/20

    ECE 526


  • Login