Flash translation layer ftl
Download
1 / 25

Flash Translation Layer (FTL) - PowerPoint PPT Presentation


  • 799 Views
  • Uploaded on

Flash Translation Layer (FTL). March 28, 2011 Sungjoo Yoo Embedded System Architecture Lab. Agenda. Introduction to FTL LAST. [Source: J. Lee, 2007]. Typical Flash Storage. Both # of Flash i/o ports and controller technology determine Flash performance. Host (PC). Intel SSD.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Flash Translation Layer (FTL)' - azura


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Flash translation layer ftl

Flash Translation Layer (FTL)

March 28, 2011

Sungjoo Yoo

Embedded System Architecture Lab.

ESA, POSTECH, 2011


Agenda
Agenda

  • Introduction to FTL

  • LAST

ESA, POSTECH, 2011


Typical flash storage

[Source: J. Lee, 2007]

Typical Flash Storage

  • Both # of Flash i/o ports and controller technology determine Flash performance

Host (PC)

Intel SSD

I/O Interface(USB, IDE, PCMCIA)

NANDFlash

Controller

FTL runs

on the controller

FTL runs

on the controller

ESA, POSTECH, 2011


Flash translation layer ftl1

[Source: J. Lee, 2007]

Flash Translation Layer (FTL)

  • A software layer emulating standard block device interface Read/Write

  • Features

    • Sector mapping

    • Garbage collection

    • Power-off recovery

    • Bad block management

    • Wear-leveling

    • Error correction code (ECC)

ESA, POSTECH, 2011


Single page write case
Single Page Write Case

  • Remember “erase-before-write” means “no overwrite”!

(tR + tRC + tWC + tPROG )*(# pages/block) + tERASE

= (25us + 105.6us*2 + 300us)*64 + 2ms

= 36.32ms for a single-page (2KB) write operation

ESA, POSTECH, 2011


Replacement block scheme ban 1995
Replacement Block Scheme [Ban, 1995]

  • In-place scheme

    • Keep the same page index in data and update blocks

Called data (D) block

Called update (U) block (or log block)

D block

U block 1

Previously, two single-page write operations take

2 x 36.32ms = 72.63ms

~90X

reduction!

Two single page-write operations take

2 x (tWC + tPROG )

= 2 x (105.6us + 300us)

= 0.81ms

ESA, POSTECH, 2011


Replacement block scheme ban 19951
Replacement Block Scheme [Ban, 1995]

  • In-place scheme

    • Keep the same page index in data and log blocks

D block

U block 1

U block 2

Advantage

Simple

Disadvantages

Utilization is low

Violate the sequential write

constraint

ESA, POSTECH, 2011


Log buffer based scheme kim 2002
Log Buffer-based Scheme [Kim, 2002]

  • In-place (linear mapping) vs. out-of-place (remapping) schemes

D block

U block 1

U block 2

D block

U block 1

U block 2

In-place scheme

+No need to manage complex mapping info

- Low U block utilization

- Violation of sequential write constraint

Out-of-place scheme

+ High U block utilization

+ Sequential write

- Mapping information

needs to be maintained

ESA, POSTECH, 2011


Garbage collection gc
Garbage Collection (GC)

D block

U block 1

U block 2

No more U block!

 Perform garbage collection

to reclaim U block(s) by erasing blocks with

many invalid pages

ESA, POSTECH, 2011


Three types of garbage collection

[Kang, 2006]

Three Types of Garbage Collection

  • Which one will be the most efficient?

ESA, POSTECH, 2011


Garbage collection overhead
Garbage Collection Overhead

Full merge cost calculation

Assumptions

64 page block

tRC = tWC = 100us

tPROG = 300us

tERASE = 2ms

Max # of valid page copies = 64

# block erases = 3

Full merge operations

D block

Free block

U block 1

U block 2

Runtime cost

= 64*(tRC+tWC+tPROG)+3*tERASE

= 64*(100us*2+300us)+3*2ms

= 38ms

X

X

X

Valid page copies may dominate runtime cost

 minimize # valid page copies

ESA, POSTECH, 2011


Three representative methods of flash translation layer
Three Representative Methods of Flash Translation Layer

  • FAST [Lee, 2007]

    • Two types of log block

      • A sequential write log block to maximize switch merges

      • Random write log blocks cover the other write accesses

  • Superblock [Kang, 2006]

    • A group of blocks is managed as a superblock

    • Linear address mapping is broken within a superblock to reduce # of valid page copies in GC

  • LAST [Lee, 2008]

    • Two partitions in random write log blocks

      • Hot partition  more dead blocks  reduction in full merge

ESA, POSTECH, 2011


[Lee, 2008]

LAST

  • Observations

    • Typical SSD accesses have both random and sequential traffics

    • Random traffics can be classified into hot and cold

ESA, POSTECH, 2011


Last scheme

[Lee, 2008]

LAST Scheme

ESA, POSTECH, 2011


Locality detection random vs sequential

[Lee, 2008]

Locality Detection:Random vs. Sequential

  • Observations

    • Short requests are very frequent (a)

    • Short requests tend to access random locations (b)

    • Long requests tend to access sequential locations (b)

    • Threshold of randomness

      • 4KB from experiments

ESA, POSTECH, 2011


Last scheme1

[Lee, 2008]

LAST Scheme

< 4KB

>= 4KB

ESA, POSTECH, 2011


Why hot and cold

[Lee, 2008]

Why Hot and Cold?

  • Observation

    • A large amount of invalid pages (>50%) occupy random log buffer space

    • They are mostly caused by hot pages

  • Problem

    • Invalid pages are distributed over random log buffer space, which causes full merges (expensive!)

ESA, POSTECH, 2011


Aggregating invalid pages due to hot pages
Aggregating Invalid Pages due to Hot Pages

  • An example trace

    • 1,4,3,1,2,7, 8, 2, 1, …

  • Single random buffer partition suffer from distributed invalid pages

  • In LAST method, Hot partition aggregates invalid pages --> full merges can be reduced. In addition, full merges are delayed

ESA, POSTECH, 2011


Temporal locality hot or cold
Temporal Locality: Hot or Cold?

  • Update interval (calculated for each page access)

    = Current page access time – last page access time

  • If update interval < k (threshold)

    • Hot (means frequent writes)

ESA, POSTECH, 2011


Last scheme2

[Lee, 2008]

LAST Scheme

< 4KB

>= 4KB

ESA, POSTECH, 2011


Garbage collection in last step 1 victim partition selection
Garbage Collection in LAST:Step 1 Victim Partition Selection

  • Basic rule

    • If there is a dead block in Hot partition, we select Hot partition as the victim partition

    • Else, we select Cold partition as the victim

  • Demotion from Hot to Cold page

    • If there is a log block whose updated time is smaller than a certain threshold time, age threshold (i.e., old enough), then we select Hot partition as the victim

ESA, POSTECH, 2011


Garbage collection in last step 2 victim block selection
Garbage Collection in LAST:Step 2 Victim Block Selection

  • Case A: Victim partition = Hot partition

    • If there is a dead block, select it

    • Else, select a least recently updated block

  • Case B: Victim partition = Cold partition

    • Choose the block with the lowest (full) merge cost (in the merge cost table)

      • Na: associativity degree, Np: # valid page copies

      • Cc: page copy cost, Ce: erase cost

ESA, POSTECH, 2011


Adaptiveness in last
Adaptiveness in LAST

  • Hot/cold partition size (Sh, Sc), temporal locality threshold (k), age threshold, etc. are adjusted at runtime depending on the given traffics

    • Nd = # dead blocks in Hot partition

    • Uh = utilization of Hot partition (# valid pages / # total pages)

  • One example of runtime policy

    • If Nd is increasing, then reduce Sh since too many log blocks are assigned to Hot partition

  • There are several more policy examples in the paper

    • Comments: They do not seem to be extensive. Thus, they can be improved further

ESA, POSTECH, 2011


Experimental results
Experimental Results

  • Full merge cost is significantly reduced by LAST

  • Many dead blocks are created  GC with the lowest cost (only erase is needed)


Reference
Reference

  • [Ban, 1995] A. Ban, Flash File System, US Patent, no. 5,404,485, April 1995.

  • [Kim, 2002] J. Kim, et al., “A Space-Efficient Flash Translation Layer for compactflash Systems”, IEEE Transactions on Consumer Electronics, May 2002.

  • [Kang, 2006] J. Kang, et al., “A Superblock-based Flash Translation Layer for NAND Flash Memory”, Proc. EMSOFT, Oct. 2006.

  • [S. Lee, 2007] A Log Buffer-Based Flash Translation Layer Using Fully-Associative Sector Translation, ACM TECS, 2007

  • [Lee, 2008] S. Lee, et al., “LAST: locality-aware sector translation for NAND flash memory-based storage systems”, ACM SIGOPS Operating Systems Review archive, Volume 42 ,  Issue 6, October 2008.

ESA, POSTECH, 2011


ad