1 / 33

CS 230: Computer Organization and Assembly Language

CS 230: Computer Organization and Assembly Language. Aviral Shrivastava. Department of Computer Science and Engineering School of Computing and Informatics Arizona State University. Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB. Announcements.

Download Presentation

CS 230: Computer Organization and Assembly Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

  2. Announcements • This Lecture: Caches • Next Lecture: More Caches, Virtual Memory • Finals • Tuesday, Dec 08, 2009 • Please come on time (You’ll need all the time) • Open book, notes, and internet • No communication with any other human

  3. Time, Time, Time • Making a Single Cycle Implementation is very easy • Difficulty and excitement is in making it fast • Two fundamental methods to make Computers fast • Pipelining • Caches Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr

  4. Kinds of Memory faster Flipflops CPU Registers 100s Bytes <10s ns SRAM K Bytes 10-20 ns $.00003/bit SRAM DRAM M Bytes 50ns-100ns $.00001/bit DRAM Disk G Bytes ms 10-6 cents Disk Tape infinite sec-min Tape larger

  5. Memory Hierarchy: Insights • Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor • Spatial Locality (Locality in Space): => Move blocks consists of contiguous words to the upper levels Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y

  6. Memory Hierarchy: Terminology Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y • Hit: data appears in some block in the upper level (Block X) • Hit Rate: fraction of memory accesses found in the upper level • Hit Time: Time to access the upper level which consists of • RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y) • Miss Rate = 1 - (Hit Rate) • Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor • Hit Time << Miss Penalty

  7. Memory Hierarchy: Show me numbers • Consider application • 30% instructions are load/stores • Suppose memory latency = 100 cycles • Time to execute 100 instructions • = 70*1 + 30*100 = 3070 cycles • Add a cache with latency 2 cycle • Suppose hit rate is 90% • Time to execute 100 instructions • = 70*1 + 27*2 + 3*100 = 70+54+300 = 424 cycles

  8. Direct-Mapped Cache (1/2) • In a direct-mapped cache, each memory address is associated with one possible block within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache • Block is the unit of transfer between cache and memory

  9. Direct-Mapped Cache (2/2) 4 Byte Direct Mapped Cache Cache Index Memory Address Memory 0 0 1 1 2 2 3 3 4 5 6 7 8 9 A B C D E F Cache Location 0 can be occupied by data from: • Memory location 0, 4, 8, ... • 4 blocks => any memory location that is multiple of 4

  10. Addressing in Direct-Mapped Cache tttttttttttttttttiiiiiiiiiioooo tagindexbyteto checktooffsetif haveselect withincorrect blockblockblock • Since multiple memory addresses map to same cache index, how do we tell which one is in there? • What if we have a block size > 1 byte? • Answer: divide memory address into three fields

  11. Direct-Mapped Cache Terminology • All fields are read as unsigned integers. • Index: specifies the cache index (which “row” of the cache we should look in) • Offset: once we’ve found correct block, specifies which byte within the block we want • Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

  12. Direct-Mapped Cache Example (1/3) • Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks • Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture • Offset • need to specify correct byte within a block • block contains 4 words = 16 bytes = 24 bytes • need 4 bits to specify correct byte

  13. Direct-Mapped Cache Example (2/3) • Index: (~index into an “array of blocks”) • need to specify correct row in cache • cache contains 16 KB = 214 bytes • block contains 24 bytes (4 words) • # blocks/cache = bytes/cache bytes/block = 214 bytes/cache24 bytes/block = 210 blocks/cache • need 10 bits to specify this many rows

  14. Direct-Mapped Cache Example (3/3) • Tag: use remaining bits as tag • tag length = addr length – offset - index = 32 - 4 - 10 bits = 18 bits • so tag is leftmost 18 bits of memory address • Why not full 32 bit address as tag? • All bytes within block need same address (4b) • Index must be same for every address within a block, so it’s redundant in tag check, thus can leave off to save memory (here 10 bits)

  15. TIO TagIndexOffset AREA (cache size, B)= HEIGHT (# of blocks) * WIDTH (size of one block, B/block) 2(H+W) = 2H * 2W WIDTH (size of one block, B/block) HEIGHT(# of blocks) AREA(cache size, B)

  16. Accessing data in a direct mapped cache 00000010 a 00000014 b 00000018 c 0000001C d 00000030 e 00000034 f ... ... 00000038 g 0000003C h 00008010 i 00008014 j 00008018 k 0000801C l ... ... ... ... ... ... Memory • Ex.: 16KB of data, direct-mapped, 4 word blocks • Read 4 addresses • 0x00000014 • 0x0000001C • 0x00000034 • 0x00008014 • Memory values on right: • only cache/ memory level of hierarchy Value of Word Address (hex)

  17. Accessing data in a direct mapped cache • 4 Addresses: • 0x00000014, 0x0000001C, 0x00000034, 0x00008014 • 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields 000000000000000000 0000000001 0100 000000000000000000 0000000001 1100 000000000000000000 0000000011 0100 000000000000000010 0000000001 0100 Tag Index Offset

  18. 16 KB Direct Mapped Cache, 16B blocks Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 • Valid bit:determines whether anything is stored in that row (when computer initially turned on, all entries invalid) Index

  19. 1. Read 0x00000014 Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 • 000000000000000000 0000000001 0100 Tag field Index field Offset Index

  20. So we read block 1 (0000000001) Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 • 000000000000000000 0000000001 0100 Tag field Index field Offset Index

  21. No valid data Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 • 000000000000000000 0000000001 0100 Tag field Index field Offset Index

  22. Load data into cache, setting tag & valid Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  23. Read from cache at offset, return word b Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  24. 2. Read 0x0000001C = 0…00 0..001 1100 Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000001 1100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  25. Index is Valid Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000001 1100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  26. Index valid, Tag Matches Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 0000000000000000000000000001 1100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  27. Index Valid, Tag Matches, return d Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 00000000000000000000000000011100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  28. 3. Read 0x00000034 = 0…00 0..011 0100 Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  29. So read block 3 Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  30. No valid data Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  31. Load that cache block, return word f Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 • 000000000000000000 00000000110100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0

  32. A Direct Mapped Cache for MIPS • What is the block size • 32-bits, 4 bytes • How many blocks in cache • 1024 • How long is index • 10-bits • How many blocks in memory • 2^32/4 = 2^30 • How many memory blocks map to the same cache block • 2^30/1024 = 2^20 • How long is tag • 20-bits

  33. Yoda says… You will find only what you bring in

More Related