Flash translation layer ftl
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Flash Translation Layer (FTL) PowerPoint PPT Presentation


  • 592 Views
  • Uploaded on
  • Presentation posted in: General

Flash Translation Layer (FTL). March 28, 2011 Sungjoo Yoo Embedded System Architecture Lab. Agenda. Introduction to FTL LAST. [Source: J. Lee, 2007]. Typical Flash Storage. Both # of Flash i/o ports and controller technology determine Flash performance. Host (PC). Intel SSD.

Download Presentation

Flash Translation Layer (FTL)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Flash translation layer ftl

Flash Translation Layer (FTL)

March 28, 2011

Sungjoo Yoo

Embedded System Architecture Lab.

ESA, POSTECH, 2011


Agenda

Agenda

  • Introduction to FTL

  • LAST

ESA, POSTECH, 2011


Typical flash storage

[Source: J. Lee, 2007]

Typical Flash Storage

  • Both # of Flash i/o ports and controller technology determine Flash performance

Host (PC)

Intel SSD

I/O Interface(USB, IDE, PCMCIA)

NANDFlash

Controller

FTL runs

on the controller

FTL runs

on the controller

ESA, POSTECH, 2011


Flash translation layer ftl1

[Source: J. Lee, 2007]

Flash Translation Layer (FTL)

  • A software layer emulating standard block device interface Read/Write

  • Features

    • Sector mapping

    • Garbage collection

    • Power-off recovery

    • Bad block management

    • Wear-leveling

    • Error correction code (ECC)

ESA, POSTECH, 2011


Single page write case

Single Page Write Case

  • Remember “erase-before-write” means “no overwrite”!

(tR + tRC + tWC + tPROG )*(# pages/block) + tERASE

= (25us + 105.6us*2 + 300us)*64 + 2ms

= 36.32ms for a single-page (2KB) write operation

ESA, POSTECH, 2011


Replacement block scheme ban 1995

Replacement Block Scheme [Ban, 1995]

  • In-place scheme

    • Keep the same page index in data and update blocks

Called data (D) block

Called update (U) block (or log block)

D block

U block 1

Previously, two single-page write operations take

2 x 36.32ms = 72.63ms

~90X

reduction!

Two single page-write operations take

2 x (tWC + tPROG )

= 2 x (105.6us + 300us)

= 0.81ms

ESA, POSTECH, 2011


Replacement block scheme ban 19951

Replacement Block Scheme [Ban, 1995]

  • In-place scheme

    • Keep the same page index in data and log blocks

D block

U block 1

U block 2

Advantage

Simple

Disadvantages

Utilization is low

Violate the sequential write

constraint

ESA, POSTECH, 2011


Log buffer based scheme kim 2002

Log Buffer-based Scheme [Kim, 2002]

  • In-place (linear mapping) vs. out-of-place (remapping) schemes

D block

U block 1

U block 2

D block

U block 1

U block 2

In-place scheme

+No need to manage complex mapping info

- Low U block utilization

- Violation of sequential write constraint

Out-of-place scheme

+ High U block utilization

+ Sequential write

- Mapping information

needs to be maintained

ESA, POSTECH, 2011


Garbage collection gc

Garbage Collection (GC)

D block

U block 1

U block 2

No more U block!

 Perform garbage collection

to reclaim U block(s) by erasing blocks with

many invalid pages

ESA, POSTECH, 2011


Three types of garbage collection

[Kang, 2006]

Three Types of Garbage Collection

  • Which one will be the most efficient?

ESA, POSTECH, 2011


Garbage collection overhead

Garbage Collection Overhead

Full merge cost calculation

Assumptions

64 page block

tRC = tWC = 100us

tPROG = 300us

tERASE = 2ms

Max # of valid page copies = 64

# block erases = 3

Full merge operations

D block

Free block

U block 1

U block 2

Runtime cost

= 64*(tRC+tWC+tPROG)+3*tERASE

= 64*(100us*2+300us)+3*2ms

= 38ms

X

X

X

Valid page copies may dominate runtime cost

 minimize # valid page copies

ESA, POSTECH, 2011


Three representative methods of flash translation layer

Three Representative Methods of Flash Translation Layer

  • FAST [Lee, 2007]

    • Two types of log block

      • A sequential write log block to maximize switch merges

      • Random write log blocks cover the other write accesses

  • Superblock [Kang, 2006]

    • A group of blocks is managed as a superblock

    • Linear address mapping is broken within a superblock to reduce # of valid page copies in GC

  • LAST [Lee, 2008]

    • Two partitions in random write log blocks

      • Hot partition  more dead blocks  reduction in full merge

ESA, POSTECH, 2011


Flash translation layer ftl

[Lee, 2008]

LAST

  • Observations

    • Typical SSD accesses have both random and sequential traffics

    • Random traffics can be classified into hot and cold

ESA, POSTECH, 2011


Last scheme

[Lee, 2008]

LAST Scheme

ESA, POSTECH, 2011


Locality detection random vs sequential

[Lee, 2008]

Locality Detection:Random vs. Sequential

  • Observations

    • Short requests are very frequent (a)

    • Short requests tend to access random locations (b)

    • Long requests tend to access sequential locations (b)

    • Threshold of randomness

      • 4KB from experiments

ESA, POSTECH, 2011


Last scheme1

[Lee, 2008]

LAST Scheme

< 4KB

>= 4KB

ESA, POSTECH, 2011


Why hot and cold

[Lee, 2008]

Why Hot and Cold?

  • Observation

    • A large amount of invalid pages (>50%) occupy random log buffer space

    • They are mostly caused by hot pages

  • Problem

    • Invalid pages are distributed over random log buffer space, which causes full merges (expensive!)

ESA, POSTECH, 2011


Aggregating invalid pages due to hot pages

Aggregating Invalid Pages due to Hot Pages

  • An example trace

    • 1,4,3,1,2,7, 8, 2, 1, …

  • Single random buffer partition suffer from distributed invalid pages

  • In LAST method, Hot partition aggregates invalid pages --> full merges can be reduced. In addition, full merges are delayed

ESA, POSTECH, 2011


Temporal locality hot or cold

Temporal Locality: Hot or Cold?

  • Update interval (calculated for each page access)

    = Current page access time – last page access time

  • If update interval < k (threshold)

    • Hot (means frequent writes)

ESA, POSTECH, 2011


Last scheme2

[Lee, 2008]

LAST Scheme

< 4KB

>= 4KB

ESA, POSTECH, 2011


Garbage collection in last step 1 victim partition selection

Garbage Collection in LAST:Step 1 Victim Partition Selection

  • Basic rule

    • If there is a dead block in Hot partition, we select Hot partition as the victim partition

    • Else, we select Cold partition as the victim

  • Demotion from Hot to Cold page

    • If there is a log block whose updated time is smaller than a certain threshold time, age threshold (i.e., old enough), then we select Hot partition as the victim

ESA, POSTECH, 2011


Garbage collection in last step 2 victim block selection

Garbage Collection in LAST:Step 2 Victim Block Selection

  • Case A: Victim partition = Hot partition

    • If there is a dead block, select it

    • Else, select a least recently updated block

  • Case B: Victim partition = Cold partition

    • Choose the block with the lowest (full) merge cost (in the merge cost table)

      • Na: associativity degree, Np: # valid page copies

      • Cc: page copy cost, Ce: erase cost

ESA, POSTECH, 2011


Adaptiveness in last

Adaptiveness in LAST

  • Hot/cold partition size (Sh, Sc), temporal locality threshold (k), age threshold, etc. are adjusted at runtime depending on the given traffics

    • Nd = # dead blocks in Hot partition

    • Uh = utilization of Hot partition (# valid pages / # total pages)

  • One example of runtime policy

    • If Nd is increasing, then reduce Sh since too many log blocks are assigned to Hot partition

  • There are several more policy examples in the paper

    • Comments: They do not seem to be extensive. Thus, they can be improved further

ESA, POSTECH, 2011


Experimental results

Experimental Results

  • Full merge cost is significantly reduced by LAST

  • Many dead blocks are created  GC with the lowest cost (only erase is needed)


Reference

Reference

  • [Ban, 1995] A. Ban, Flash File System, US Patent, no. 5,404,485, April 1995.

  • [Kim, 2002] J. Kim, et al., “A Space-Efficient Flash Translation Layer for compactflash Systems”, IEEE Transactions on Consumer Electronics, May 2002.

  • [Kang, 2006] J. Kang, et al., “A Superblock-based Flash Translation Layer for NAND Flash Memory”, Proc. EMSOFT, Oct. 2006.

  • [S. Lee, 2007] A Log Buffer-Based Flash Translation Layer Using Fully-Associative Sector Translation, ACM TECS, 2007

  • [Lee, 2008] S. Lee, et al., “LAST: locality-aware sector translation for NAND flash memory-based storage systems”, ACM SIGOPS Operating Systems Review archive, Volume 42 ,  Issue 6, October 2008.

ESA, POSTECH, 2011


  • Login