andes embedded processor andescore tm n1213 s n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Andes Embedded Processor AndesCore TM N1213-S PowerPoint Presentation
Download Presentation
Andes Embedded Processor AndesCore TM N1213-S

Loading in 2 Seconds...

play fullscreen
1 / 64

Andes Embedded Processor AndesCore TM N1213-S - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Andes Embedded Processor AndesCore TM N1213-S. Agenda. Computer architecture AndesCore TM Pipeline Cache MMU DMA BIU Interruption AICE. Computer architecture taxonomy. von Neumann architecture. Computer architecture taxonomy (1/3). von Neumann architecture Features of each:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Andes Embedded Processor AndesCore TM N1213-S' - melissa-becker


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • Computer architecture
  • AndesCoreTM
    • Pipeline
    • Cache
    • MMU
    • DMA
    • BIU
    • Interruption
    • AICE
computer architecture taxonomy
Computer architecture taxonomy
  • von Neumann architecture
computer architecture taxonomy 1 3
Computer architecture taxonomy (1/3)
  • von Neumann architecture
    • Features of each:
      •  Execution in multiple cycles
    • Serial fetch instructions & data
    • Single memory structure
      • Can get data/program mixed
      • Data/instructions same size 
    • Examples, von Neumann: PCs (Intel 80x86/Pentium, Motorola 68000, Mot 68xx uC families
computer architecture taxonomy 2 3
Computer architecture taxonomy (2/3)
  • Harvard architecture

address

CPU

data memory

PC

data

address

program memory

data

computer architecture taxonomy 3 3
Computer architecture taxonomy (3/3)
  • Harvard architecture
    • Features of each: Execution in 1 cycle                     
    • Parallel fetch instructions & data   
    • More Complex H/W                          
      • Instructions and data always separate      
      • Different code/data path widths      
    • Harvard: 8051, Microchip PIC families, Atmel AVR, AndeScore
architectures cisc vs risc 1 2
Architectures: CISC vs. RISC (1/2)
  • CISC - Complex Instruction Set Computers
    • Emphasis on hardware
    • Includes multi-clock complex instructions
    • Memory-to-memory
    • Sophisticated arithmetic (multiply, divide, trigonometry etc.).
    • Special instructions are added to optimize performance with particular compilers.
architectures cisc vs risc 2 2
Architectures: CISC vs. RISC (2/2)
  • RISC - Reduced Instruction Set Computers
    • A very small set of primitive instructions
    • Fixed instruction format
    • Emphasis on software
    • All instructions execute in one cycle (Fast!).
    • Register to register (except Load/Store instructions)
    • Pipeline architecture
single dual multi many cores
Single-, Dual-, Multi-, Many- Cores
  • Single-core:
    • Most popular today.
  • Dual-core, multi-core, many-core:
    • Forms of multiprocessors in a single chip
  • Small-scale multiprocessors (2-4 cores):
    • Utilize task-level parallelism.
    • Task example: audio decode, video decode, display control, network packet handling.
  • Large-scale multiprocessors (>32 cores):
    • nVidia’s graphics chip: >128 core
    • Sun’s server chips: 64 threads
andescore n1213 s
AndesCore™ N1213-S
  • CPU Core
    • 32bit CPU
    • Single issue with 8-stage pipeline
    • Andestar™ ISA with 16-/32-bit intermixable instructions to reduce code size
    • Dynamic branch prediction to reduce branch penalties
      • 32/64/128/256 BTB
  • Configurability for customers
    • Configuration options for power, performance and area requirements
andescore n1213 s1
AndesCore™ N1213-S
  • MMU
    • fully-associative iTLB/dTLB: 4 or 8 entries
    • 4-way set-associative main TLB: 32/64/128 entries
    • Two groups of pages size support: (4K,1M) and (8K,1M)
    • Locking support for TLB
  • I & D cache
    • Virtual index and physical tag (for faster context switching)
    • Cache size: 8KB/16KB/32KB/64KB
    • Cache line size: 16B/32B
    • 2/4-way set associative
    • I Cache locking support
andescore n1213 s2
AndesCore™ N1213-S
  • I&D Localmemory
    • wide range support for internal /external local memory
      • 4KB~1024KB
    • Provide fixed access latencies for internal local memory
    • Double buffer mode for D local memory
    • Optional external local memory interface
  • Bus
    • Synchronous/Asynchronous AHB
      • 1 or 2 port configuration
    • Synchronous HSMP
      • AXI like
      • 1 or 2 port configuration
andescore n1213 s3
AndesCore™ N1213-S
  • For performance
    • Improved memory accesses:
      • 1D/2D DMA, load/store multiple
    • Efficient synchronization without locking the whole bus
      • Load lock, store conditional instructions
    • Vectored interrupt to improve real-time performance
      • 6 interrupt signals
    • MMU
      • Optional HW page table walker
      • TLB management instructions
  • For flexibility
    • Memory-mapped IO space
    • PC-relative jumps for position independent code
    • JTAG-based debug support
    • Optional embedded program trace interface
    • Performance monitors for performance tuning
    • Bi-endian modes to support flexible data input
instruction fetch stage
Instruction Fetch Stage
  • F1 – Instruction Fetch First
    • Instruction Tag/Data Arrays
    • ITLB Address Translation
    • Branch Target Buffer Prediction
  • F2 – Instruction Fetch Second
    • Instruction Cache Hit Detection
    • Cache Way Selection
    • Instruction Alignment

IF1

IF2

ID

RF

AG

DA1

DA2

WB

EX

MAC1

MAC2

instruction issue stage
Instruction Issue Stage
  • I1 – Instruction Issue First / Instruction Decode
    • 32/16-Bit Instruction Decode
    • Return Address Stack prediction
  • I2 – Instruction Issue Second / Register File Access
    • Instruction Issue Logic
    • Register File Access

IF1

IF2

ID

RF

AG

DA1

DA2

WB

EX

MAC1

MAC2

execution stage
Execution Stage
  • E1 – Instruction Execute First / Address Generation / MAC First
    • Data Access Address Generation
    • Multiply Operation (if MAC presents)
  • E2 –Instruction Execute Second / Data Access First / MAC Second / ALU Execute
    • ALU
    • Branch/Jump/Return Resolution
    • Data Tag/Data arrays
    • DTLB address translation
    • Accumulation Operation (if MAC presents)
  • E3 –Instruction Execute Third / Data Access Second
    • Data Cache Hit Detection
    • Cache Way Selection
    • Data Alignment

IF1

IF2

ID

RF

AG

DA1

DA2

WB

EX

MAC1

MAC2

write back stage
Write Back Stage
  • E4 –Instruction Execute Fourth / Write Back
    • Interruption Resolution
    • Instruction Retire
    • Register File Write Back

IF1

IF2

ID

RF

AG

DA1

DA2

WB

EX

MAC1

MAC2

branch prediction overview
Branch Prediction Overview
  • Why is branch prediction required?
    • A deep pipeline is required for high speed
  • Why dynamic branch prediction?
    • Static branch prediction
    • Dynamic branch prediction
branch prediction unit
Branch Prediction Unit
  • Branch Target Buffer (BTB)
    • 128 entries of 2-bit saturating counters
    • 128 entries, 32-bit predicted PC and 26-bit address tag
  • Return Address Stack (RAS)
    • Four entries
  • BTB and RAS updated by committing branches/jumps
btb instruction prediction
BTB Instruction Prediction
  • BTB predictions are performed based on the previous PC instead of the actual instruction decoding information, BTB may make the following two mistakes
    • Wrongly predicts the non-branch/jump instructions as branch/jump instructions
    • Wrongly predicts the instruction boundary (32-bit -> 16-bit)
  • If these cases are detected, IFU will trigger a BTB instruction misprediction in the I1 stage and re-start the program sequence from the recovered PC. There will be a 2-cycle penalty introduced here
ras prediction
RAS Prediction
  • When return instructions present in the instruction sequence, RAS predictions are performed and the fetch sequence is changed to the predicted PC.
  • Since the RAS prediction is performed in the I1 stage. There will be a 2-cycle penalty in the case of return instructions since the sequential fetches in between will not be used.
branch miss prediction
Branch Miss-Prediction
  • In N12 processor core, the resolution of the branch/return instructions is performed by the ALU in the E2 stage and will be used by the IFU in the next (F1) stage. In this case, the misprediction penalty will be 5 cycles.
cache and cpu
Cache and CPU

address

data

cache

main

memory

CPU

cache

controller

address

data

data

multiple levels of cache
Multiple levels of cache

L2 cache

CPU

L1 cache

slide30

Cache data flow

I-Cache

I Cache refill

I Fetches

Uncached Instruction/data

CPU

Ext Memory

Uncached write/write-through

Write back

Load & Store

D-Cache

D-Cache refill

cache operation
Cache operation
  • Many main memory locations are mapped onto one cache entry.
  • May have caches for:
    • instructions;
    • data;
    • data + instructions (unified).
replacement policy
Replacement policy
  • Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location.
  • Two popular strategies:
    • Random.
    • Least-recently used (LRU).
write operations
Write operations
  • Write-through: immediately copy write to main memory.
  • Write-back: write to main memory only when location is removed from cache.
improving cache performance
Improving Cache Performance
  • Goal: reduce the Average Memory Access Time (AMAT)
    • AMAT = Hit Time + Miss Rate * Miss Penalty
  • Approaches
    • Reduce Hit Time
    • Reduce or Miss Penalty
    • Reduce Miss Rate
  • Notes
    • There may be conflicting goals
    • Keep track of clock cycle time, area, and power consumption
tuning cache parameters
Tuning Cache Parameters
  • Size:
    • Must be large enough to fit working set (temporal locality)
    • If too big, then hit time degrades
  • Associativity
    • Need large to avoid conflicts, but 4-8 way is as good a FA
    • If too big, then hit time degrades
  • Block
    • Need large to exploit spatial locality & reduce tag overhead
    • If too large, few blocks ⇒ higher misses & miss penalty

Configurable architecture allows designers to makethe best performance/cost trade-offs

mmu functionality
MMU Functionality
  • Memory management unit (MMU) translates addresses

logical

address

memory

management

unit

physical

address

CPU

mmu architecture

M-TLB Tag

M-TLB data

M-TLB Tag

M-TLB data

MMU Architecture

M-TLB entry index

IFU

LSU

N(=32) sets k(=4) ways =128-entry

4/8 I-uTLB

4/8 D-uTLB

6

4

Set number

0

5

Way number

Log2(N*K)-1

Log2(N)

Log2(N)-1

0

M-TLB arbiter

32x4 M-TLB

HPTWK

Bus interface unit

mmu functionality1
MMU Functionality
  • Virtual memory addressing
    • Better memory allocation, less fragmentation
    • Allows shared memory
    • Dynamic loading
  • Memory protection (read/write/execute)
    • Different permission flags for kernel/user mode
    • OS typically runs in kernel mode
    • Applications run in user mode
  • Cache control (cached/uncached)
    • Accesses to peripherals and other processors needs to be uncached.
dma overview
DMA overview
  • Two channels
  • One active channel
  • Programmed using physical addressing
  • For both instruction and data local memory
  • External address can be incremented with stride
  • Optional 2-D Element Transfer (2DET) feature which provides an easy way to transfer two-dimensional blocks from external memory.

Local Memory

DMA

Controller

Ext. Memory

lmdma double buffer mode

Local MemoryBank 0

Local MemoryBank 1

LMDMA Double Buffer Mode

CorePipeline

ExternalMemory

DMA Engine

Computation

Data Movement

Bank Switch between core and DMA engine

Width byte stride (in DMA Setup register)=1

n1213 s bus
N1213-S BUS
  • AMBA 2.0 AHB bus
    • 1 port
    • 2 port
      • ICU/MMU (read only) for port 1
      • LSU/DMA/EDM (read/write) for port 2
  • HSMP
    • High speed memory port
    • Same frequency with CPU core
    • AMBA 3.0 (AXI) protocol compliant, but with reduced I/O requirements
    • 1 and 2 port configuration
biu introduction
BIU introduction
  • Bus Interface unit is responsible for off-CPU memory access which includes
    • System memory access
    • Instruction/data local memory access
    • Memory-mapped register access in devices.
slide49

Bus Interface

  • Compliance to AHB/AHB-Lite/APB
  • High Speed Memory Port
  • Andes Memory Interface
  • External LM Interface
hsmp high speed memory port
HSMP – High speed memory port
  • N12 also provides a high speed memory port interface which has higher bus protocol efficiency and can run at a higher frequency to connect to a memory controller.
  • The high speed memory port will be AMBA3.0 (AXI) protocol compliant, but with reduced I/O requirements.
types of interruptions
Types of Interruptions
  • Reset/NMI
    • Cold Reset
    • Non-Maskable Interrupt (NMI)
  • Interrupt
    • External interrupt
    • Software interrupt
  • Debug exception
    • Instruction breakpoint
    • Data address & value break
    • BREAK exception
  • TLB fill exception (Instruction/Data)
  • PTE not present exception (Instruction/Data)
    • Non-Leaf PTE not present
    • Leaf PTE not present
  • TLB miscellaneous exception
  • System Call exception
  • General exception
  • Machine error exception (Instruction/Data)
support inter ext vector interrupt
Support Inter./ext. vector interrupt
  • Internal vector interrupt
    • where interrupts are prioritized inside an AndeScore™
    • Hw0 has highest priority
  • External Vectored Interrupt
    • where interrupts are prioritized outside AndeScore using an external interrupt controller.
  • The size of the vectored entry point can be from 4 bytes to 16/64/256 bytes.
internal and external vector interrupt
Internal and external vector interrupt
  • Internal vector interrupt
    • where interrupts are prioritized inside an AndeScore using the internal interrupt controller logic.
  • External Vectored Interrupt
    • where interrupts are prioritized outside AndeScore using an external interrupt controller.
  • Interruption vector base register
    • EVIC=1, external vector interrupt controller mode
    • EVIC=0, internal vector interrupt controller mode
    • ESZ (Entry size) =4B/16B/64B/256B
    • IVBASE : base address of the interruption vector table (64KB aligned)
entry points for internal vic mode
16 entry points

9 exception

7 interrupt

The size of the vectored entry point can be from 4 bytes to 16/64/256 bytes.

Entry Points for Internal VIC Mode

High

Low

n1213 s debug environment
N1213-S Debug environment

AICE

N1213-S

External ICE

CPU core

EDM

USB

In circuit emulator

edm embedded debug module block diagram
EDM (Embedded Debug Module) block diagram

N1213-S

BCU: Breakpoint compare unit

TAP:JTAG style interface

DIMU:Store debug program

clock ratio
Clock ratio
  • The clock bus ratio between CPU core and AMBA bus clock are 1/1,2/1,3/1,4/1,5/1,6/1,3/2,5/2,8/1,10/1,12/1,14/1,15/1,18/1,20/1.
    • Clock divider is not part of AndeScore
  • While the high speed memory bus clock is the same with CPU core clock.
eda tools
EDA tools
  • Synthesizer
    • Synopsys Design Compiler
  • Simulator
    • Cadence Incisive
  • Formal verification
    • Cadence Formality
  • STA
    • Synopsys PrimeTime
  • FPGA
    • Synplicity +Xilink
n1213 on umc 0 13hs process
N1213 on UMC 0.13HS process

* 1.08V 125C slow silicon

thank you

Thank You

www.andestech.com