11 multicore processors
Download
1 / 64

11. Multicore Processors - PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on

11. Multicore Processors. Dezső Sima Fall 2006.  D. Sima, 2006. Overview. 1 Overview of MCPs. 2 Attaching L2 caches. 3 Attaching L 3 caches. 4 Connecting memory and I/O. 5 Case examples. 1. Overview of MCPs (1). Figure 1.1 : Processor power density trends.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 11. Multicore Processors' - kale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
11 multicore processors

11. Multicore Processors

Dezső Sima

Fall 2006

 D. Sima, 2006


Overview

1 Overview of MCPs

2 Attaching L2 caches

3 Attaching L3 caches

4Connecting memory and I/O

5Case examples


1. Overview of MCPs (1)

Figure 1.1: Processor power density trends

Source: D. Yen: Chip Multithreading Processors Enable Reliable High Throughput Computing

http://www.irps.org/05-43rd/IRPS_Keynote_Yen.pdf


1. Overview of MCPs (2)

Figure 1.2: Single-stream performance vs. cost

Source: Marr T.T. et al. „Hyper-Threading Technology Architecture and MicroarchitectureIntel Technology Journal, Vol. 06, Issue 01, Febr 14, 2002, pp. 4-16


1. Overview of MCPs (2)

Figure 1.2: Dual/multi-core processors (1)


1. Overview of MCPs (3)

Figure 1.3: Dual/multi-core processors (2)


1. Overview of MCPs (4)

Macro architecture of dual/multi-core processors (MCPs)

Layout of the

cores

Layout of the I/O and

memory architecture

Attaching of L2 caches

Attaching of L3 caches (if available)


2. Attaching L2 caches

2.1 Main aspects of attaching L2 caches to MCPs (1)

Attaching L2 caches to MCPs

Allocation to the cores

Use by instructions/data

Integration of L2 caches to the proc. chip

Inclusion policy

Banking policy


Allocation of L2 caches to the cores

Shared L2 cache for all cores

Private L2 cache for each core

UltraSPARC IV (2004)

UltraSPARC T1 (2005)

Yonah (2006)

Smithfield (2005)

Core Duo (2006)

Athlon 64 X2 (2005)

POWER4 (2001)

POWER5 (2005)

Montecito (2006?)

Expected trend


2.1 Main aspects of attaching L2 caches to MCPs (2)

Attaching L2 caches to MCPs

Allocation to the cores

Use by instructions/data

Integration of L2 caches to the proc. chip

Inclusion policy

Banking policy


Inclusion policy of L2 caches

Exclusive L2

Inclusive L2

L1

L2

L1

Memory

L2

Memory

Lines replaced (victimized) in the L1 are

written into the L2

References to data in the L2 initiate reloading

that cache line into the L1,

L2 operates usually as write back cache

(only modified data that is replaced in the L2

is written back to the memory),

Unmodified data that is replaced in

the L2 is deleted.


Figure 1.1: Implementation of exclusive L2 caches

Source: Zheng, Y., Davis, B.T., Jordan, M.: “ Performance evaluation of exclusive cache hierarchies”, 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2004, pp. 89-96.


Inclusion policy of L2 caches

Exclusive L2

Inclusive L2

Most implementations

Athlon 64X2 (2005)

Expected trend


2.1 Main aspects of attaching L2 caches to MCPs (3)

Attaching L2 caches to MCPs

Allocation to the cores

Use by instructions/data

Integration of L2 caches to the proc. chip

Inclusion policy

Banking policy


Use by instructions/data

Unified instr./data cache(s)

Split instr./data caches

UltraSPARC IV (2004)

Montecito (2006?)

UltraSPARC T1 (2005)

POWER4 (2001)

POWER5 (2005)

Smithfield (2005)

Yonah (2006)

Core Duo (2006)

Athlon 64 X2 (2005)

Expected trend


2.1 Main aspects of attaching L2 caches to MCPs (4)

Attaching L2 caches to MCPs

Allocation to the cores

Use by instructions/data

Integration of L2 caches to the proc. chip

Inclusion policy

Banking policy


Banking policy

Single-banked implementation

Multi-banked implementation


2.1 Main aspects of attaching L2 caches to MCPs (5)

Attaching L2 caches to MCPs

Allocation to the cores

Use by instructions/data

Integration of L2 caches to the proc. chip

Inclusion policy

Banking policy


Integration to the processor chip

On chip L2 tags/contr.,

off chip data

Entire L2 on chip

UltraSPARC IV (2004)

UltraSPARC V (2005)

POWER4 (2001)

POWER5 (2005)

Smithfield (2005)

Presler (2005)

Athlon 64 X2(2005)

Expected trend


2.2 Examples of attaching L2 caches to MCPs (1)

L2 data

L2 data

L2

L2

Core

Core

Core

Core

L2 tags/contr.

L2 tags/contr.

Core

Core

HT-bus

Mem

contr.

contr.

Syst. if.

Mem. contr.

Fire Plane

Memory

Memory

bus

Private L2 caches for each core

Unified instruction / data caches

Split instruction/data caches

Entire L2 on-chip

Entire L2 on-chip

On-chip L2 tags/contr.,

off-chip data

On-chip L2 t/c

off-chip data

Examples:

Montecito (2006?)

UltraSPARC IV (2004)

Smithfield (2005)

Athlon 64 X2 (2005)

Presler (2005)

(Exclusive L2)

Core

Core

Interconn.

L2 I

L2 D

L2 I

L2 D

network

L2

L2

System Request Queue

L3

L3

Xbar

Syst. if.

Syst. if.

FSB

FSB

HT-bus


2.2 Examples of attaching L2 caches to MCPs (2)

Core

Core

Core

Core

L2

Memory

Memory

L3 tags/

contr.

7

6

0

Addr.

196

128

S

Modulo 3

64

0

1

0

2

256

Shared L2 caches for all cores

Multi core/multi banked L2

Dual core/single banked L2

Dual core/multi banked L2

UltraSPARC T1 (2005)

Yonah Duo (2006)

POWER4 (2001)

(Niagara)

Examples:

Core (2006)

POWER5 (2005)

(8 cores/4xL2 banks)

Core

Core

X-bar

X-bar

L2 contr.

L2

L2

L2

L2

L2

Fabric Bu SContr.

Fabric Bus Contr.

System if.

Mem. contr.

Mem. contr.

GX

FSB

contr.

GX bus

Mapping of addresses to the banks:

Mapping of addresses to the banks:

The 128-byte long L2 cache lines are hashed across

The four L2 modules are interleaved at 64-byte blocks.

the 3 modules. Hashing is performed by modulo 3

arithmetric applied on a large number of real address bits.


3. Attaching L3 caches

Macro architecture of dual/multi-core processors (MCPs)

Layout of the

cores

Layout of the I/O and

memory architecture

Attaching of L2 caches

Attaching of L3 caches (if available)


3.1 Main aspects of attaching L3 caches to MCPs (1)

Attaching L3 caches to MCPs

Allocation to the L2 cache(s)

Use by instructions/data

Integration of L3 caches to the proc. chip

Inclusion policy

Banking policy


Allocation of L3 caches to the L2 caches

Shared L3 cache for all L2s

Private L3 cache for each L2

POWER5 (2005)

POWER4 (2001)

UltraSPARC IV+ (2004)

Montecito (2006?)


3.1 Main aspects of attaching L3 caches to MCPs (2)

Attaching L3 caches to MCPs

Allocation to the L2 cache(s)

Use by instructions/data

Integration of L3 caches to the proc. chip

Inclusion policy

Banking policy


Inclusion policy of L3 caches

Exclusive L3

Inclusive L3

L2

L3

L2

Memory

L3

Memory

Lines replaced (victimized) in the L2 are

written into the L3

References to data in the L3 initiate reloading

that cache line into the L2,

L3 operates usually as write back cache

(only modified data that is replaced in the L3

is written back to the memory),

Unmodified data that is replaced in

the L3 is deleted.


Inclusion policy of L3 caches

Exclusive L3

Inclusive L3

POWER4 (2001)

POWER5 (2005)

UltraSPARC IV+ (2004)

Montecito (2006?)

Expected trend


3.1 Main aspects of attaching L3 caches to MCPs (3)

Attaching L3 caches to MCPs

Allocation to the L2 cache(s)

Use by instructions/data

Integration of L3 caches to the proc. chip

Inclusion policy

Banking policy


Use by instructions/data

Unified instr./data cache(s)

Split instr./data caches

All multicore processors

unveiled until now hold both instruction and data


3.1 Main aspects of attaching L3 caches to MCPs (4)

Attaching L3 caches to MCPs

Allocation to the L2 cache(s)

Use by instructions/data

Integration of L3 caches to the proc. chip

Inclusion policy

Banking policy


Banking policy

Single-banked implementation

Multi-banked implementation


3.1 Main aspects of attaching L3 caches to MCPs (5)

Attaching L3 caches to MCPs

Allocation to the L2 cache(s)

Use by instructions/data

Integration of L3 caches to the proc. chip

Inclusion policy

Banking policy


Integration to the processor chip

On chip L3 tags/contr.,

off chip data

Entire L3 on chip

UltraSPARC IV+ (2005)

POWER4 (2001)

POWER5 (2005)

Montecito (2006?)

Expected trend


3.2 Examples of attaching L3 caches to MCPs (1)

L2

L2

L2

L3 data

Memory

Inclusive L3 cache

Private L3 caches

for each L2 cache banks

Shared L3 cache

for all cache banks

Entire L3 on-chip

On-chip L3 tags/contr.,

off-chip data

On-chip L3 tags/contr.,

off-chip data

Entire L3 on-chip

POWER4 (2001)

Examples:

Montecito (2006?)

L2 I

L2 D

L2 I

L2 D

Fabric Bus Contr.

L3

L3

L3 tags/contr.

Arbiter

System if.

Mem. contr.

FSB


3.2 Examples of attaching L3 caches to MCPs (2)

L3 data

L3 data

L3 tags/contr.

L2

L3 tags/contr.

L2

L3 tags/contr.

L3 data

L3 tags/contr.

L3 data

L2

Core

Core

Syst. if.

Mem. contr.

Fire Plane

Memory

bus

Exclusive L3 cache

Private L3 caches

for each L2 cache banks

Shared L3 cache

for all cache banks

Entire L3 on-chip

On-chip L3 tags/contr.,

off-chip data

Entire L3 on-chip

On-chip L3 tags/contr.,

off-chip data

Examples:

POWER5 (2005):

UltraSPARC IV+ (2005):

L2

Interconn.

network

Fabric Bus Contr.

Memory contr.

Memory


4. Connecting memory and I/O

Macro architecture of dual/multi-core processors (MCPs)

Layout of the

cores

Layout of the I/O and

memory architecture

Attaching of L2 caches

Attaching of L3 caches (if available)


4.1 Overview

Layout of the I/O and memory architecture in dual/multi-core processors

Integration of the memory controller to the processor chip

Connection policy of I/O and memory


4.2 Connection policy (1)

Connection policy of I/O and memory

Dedicated connection of I/O and memory

Connecting both I/O and memory via the system bus

Symmetric connection of I/O and memory

Asymmetric connection of I/O and memory

POWER4 (2001)

PA-8800 (2004)

POWER5 (2005)

PA-8900 (2005)

UltraSPARC T1 (2005)

UltraSPARC IV (2004)

UltraSPARC IV+ (2005)

Smithfield (2005)

Presler (2005)

Athlon64 X2 (2005)

Yonah Duo (2006)

Core (2006)

Montecito (2006?)


4.2 Connection policy (2)

Syst. bus if.

Syst. bus if.

L2

Core

Core

Syst. bus if.

Connecting both I/O and memory via the system bus

Examples:

Smithfield/Presler (2005/2005)

Yonah Duo/Core (2006/2006)

L2

L2

L2

FSB

FSB

Montecito (2006)

PA-8800 (2004)

PA-8900 (2005)

L2 I/

L2 I/

L2 D

L2 D

L2

contr.

L3

L3

Syst. bus if.

FSB

FSB


4.2 Connection policy (3)

Connection policy of I/O and memory

Dedicated connection of I/O and memory

Connecting both I/O and memory via the system bus

Symmetric connection of I/O and memory

Asymmetric connection of I/O and memory

(Connecting I/O via the internalinterconnection network,and memory via the L2/L3 cache)

(Connecting both I/O and memory via the internal interconnection network

POWER4 (2001)

PA-8800 (2004)

POWER5 (2005)

PA-8900 (2005)

UltraSPARC T1 (2005)

UltraSPARC IV (2004)

UltraSPARC IV+ (2005)

Smithfield (2005)

Presler (2005)

Athlon64 X2 (2005)

Yonah Duo (2006)

Core (2006)

Montecito (2006?)


4.2 Connection policy (4)

Memory

L2

X

Memory

L2

b

a

Memory

L2

r

Memory

L2

L2

L2

L2

L3 dir./

GX

contr.

contr.

L3 data

Memory

Asymmetric connection of I/O and memory

POWER4 (2001)

UltraSPARC T1 (2005)

Core 0

L2

M. contr.

Chip-to-chip/

M. contr.

Fabric Bus Contr.

Mem.-to-Mem.

interconn.

M. contr.

M. contr.

Core 7

GX-bus

Bus if.

Mem. contr.

JBus


4.2 Connection policy (5)

Connection policy of I/O and memory

Dedicated connection of I/O and memory

Connecting both I/O and memory via the system bus

Symmetric connection of I/O and memory

Asymmetric connection of I/O and memory

(Connecting I/O via the internalinterconnection network,and memory via the L2/L3 cache)

(Connecting both I/O and memory via the internal interconnection network

POWER4 (2001)

PA-8800 (2004)

POWER5 (2005)

PA-8900 (2005)

UltraSPARC T1 (2005)

UltraSPARC IV (2004)

UltraSPARC IV+ (2005)

Smithfield (2005)

Presler (2005)

Athlon64 X2 (2005)

Yonah Duo (2006)

Core (2006)

Montecito (2006?)


4.2 Connection policy (6)

L2 data

L2 data

L2

L2

L2

L3

Chip-chip/

L2 tags/contr.

L2 tags/contr.

Mem.-Mem.

interconn.

Core

Core

GX

Mem

contr.

contr.

Syst. if.

Mem. contr.

Memory

Fire Plane

Memory

bus

Symmetric connection of I/O and memory (1)

POWER5 (2005)

UltraSPARC IV (2004)

Fabric Bus Contr.

Interconn.

network

GX. bus


4.2 Connection policy (7)

L3 data

L2

L2

L3 tags/contr.

Core

Core

HT-bus

Mem

contr.

contr.

Syst. if.

Mem. contr.

Memory

Fire Plane

Memory

bus

Symmetric connection of I/O and memory (2)

Athlon 64 X2 (2005)

UltraSPARC IV+ (2005)

System Request Queue

L2

Xbar

Interconn.

network

HT-bus


4.3 Integration of the memory controller to the processor chip

Integration of the memory controller to the processor chip

On-chip memory controller

Off-chip memory controller

POWER4 (2001)

POWER5 (2005)

UltraSPARC IV (2004)

PA-8800 (2004)

PA-8900 (2005)

UltraSPARC IV+ (2005)

UltraSPARC T1 (2005)

Smithfield (2005)

Presler (2005)

Athlon 64 X2(2005)

Yonah Duo (2006)

Core (2006)

Montecito (2006?)

Expected trend


5. Case examples chip

5.1 Intel MCPs (1)

Figure 5.1: The move to Intel multi-core

Source: A. Loktu: Itanium 2 for Enterprise Computing http://h40132.www4.hp.com/upload/se/sv/Itanium2forenterprisecomputing.pps


5.1 Intel MCPs (2) chip

Figure 5.2: Processor specifications of Intel’s Pentium D family (90 nm)

Source: http://www.intel.com/products/processor/index.htm


5.1 Intel MCPs (3) chip

ED: Execute Disable Bit

Malicious buffer overflow attacks pose a significant security threat.

In a typical attack, a malicious worm creates a flood of code that overwhelms the processor,

allowing the worm to propagate itself to the network, and to other computers.

It can help prevent certain classes of malicious buffer overflow attacks

when combined with a supporting operating system.

Execute Disable Bit allows the processor to classify areas in memory

by where application code can execute and where it cannot.

When a malicious worm attempts to insert code in the buffer,

the processor disables code execution, preventing damage and worm propagation.

VT: Virtualization Technology

It is a set of hardware enhancements to Intel’s server and client platforms

that can improve the performance and robustness of traditional software-based virtualization solutions. Virtualization solutions will allow a platform to run multiple

operating systems and applications in independent partitions.

Using virtualization capabilities, one computer system can function as multiple "virtual" systems.

EIST: Enhanced Intel SpeedStep Technology

First delivered in Intel’s mobile and server platforms,

It allows the system to dynamically adjust processor voltage and core frequency,

which can result in decreased average power consumption

and decreased average heat production.


5.1 Intel MCPs (4) chip

Figure 5.3: Processor specifications of Intel’s Pentium D family (65 nm)

Source: http://www.intel.com/products/processor/index.htm


5.1 Intel MCPs (5) chip

Figure 5.4 Specifications of Intel’s Pentium Processor Extrem Edition models 840/955/965

Source: http://www.intel.com/products/processor/index.htm


5.1 Intel MCPs (6) chip

Figure 5.5: Procesor specifications of Intel’s Yonah Duo (Core Duo) family

Source: http://www.intel.com/products/processor/index.htm


5.1 Intel MCPs (7) chip

Figure 5.6 Specifications of Intel’s Core Processors

Source: http://www.intel.com/products/processor_number/chart/core2duo.htm


5.1 Intel MCPs (8) chip

Figure 5.7: Future 65 nm processors (overview)

Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered


5.1 Intel MCPs (9) chip

Figure 5.8: Future 45 nm processors (overview)

Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered


5.2 Athlon 64 X2 chip

Figure 5.9: AMD Athlon 64 X2 dual-core processor architecture

Source: AMD Athlon 64 X2 Dual-Core Processor for Desktop – Key Architecture Features,

http:///www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_13041.00.html


5.3 Sun’s UltraSPARC IV/IV+ (1) chip

ARB: Arbiter

Figure 5.10: UltraSPARC IV (Jaguar)

Source: C. Boussard: Architecture des processeurs

http://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf


5.3 Sun’s UltraSPARC IV/IV+ (2) chip

Figure 5.11: UltraSPARC IV+ (Panther)

Source: C. Boussard: Architecture des processeurs

http://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf


5.4 POWER4/POWER5 (1) chip

Core interface Unit

(crossbar)

Service Processor

Power On Reset

Built-In-SelfTest

Non-Cacheable

Unit

MultiChip Module

Figure 5.12: POWER4 chip logical view

Source: J.M. Tendler, S. Dodson, S. Fields, H. Le, B. Sinharoy: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001

http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf


5.4 POWER4/POWER5 (2) chip

Figure 5.13: POWER4 chip

Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003

http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf


5.4 POWER4/POWER5 (3) chip

Fabric

Controller

Figure 5.14: POWER4 and POWER5 system structures

Source: R. Kalla, B. Sinharoy, J.M. Tendler: IBM Power5 chip: A Dual-core multithreaded Processor, IEEE. Micro, Vol. 24, No.2, March-April 2004, pp. 40-47.


5.5 Cell (1) chip

SPE: SynergisticProcessing Element

EIB: Element Interface Bus

MFC: Memory Flow Controller

PPE: Power Processing Element

AUC: Atomic Update Cache

Figure 5.15: Cell (BE) microarchitecture

Source: IBM: „Cell Broadband Engine™processor – based systems”, IBM corp. 2006


5.5 Cell (2) chip

Figure 5.16: Cell SPE architecture

Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html


5.5 Cell (3) chip

Figure 5.17: Cell floorplan

Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html


ad