iris a scalable cloud file system with efficient integrity checks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Iris: A Scalable Cloud File System with Efficient Integrity Checks PowerPoint Presentation
Download Presentation
Iris: A Scalable Cloud File System with Efficient Integrity Checks

Loading in 2 Seconds...

play fullscreen
1 / 34

Iris: A Scalable Cloud File System with Efficient Integrity Checks - PowerPoint PPT Presentation


  • 178 Views
  • Uploaded on

Iris: A Scalable Cloud File System with Efficient Integrity Checks. Cloud Storage. Dropbox. Enterprise. Amazon S3, EBS. Windows Azure Storage. Enterprise. SkyDrive. EMC Atmos. Mozy. iCloud. Google Storage. Can you trust the cloud?. User. Infrastructure bugs Malware

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Iris: A Scalable Cloud File System with Efficient Integrity Checks' - selena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Cloud Storage

Dropbox

Enterprise

Amazon S3, EBS

Windows Azure Storage

Enterprise

SkyDrive

EMC Atmos

Mozy

iCloud

Google Storage

Can you trust the cloud?

User

  • Infrastructure bugs
  • Malware
  • Disgruntled employees

User

User

iris file system
Iris File System
  • Integrity verification (on the fly)
    • value read == value written (integrity)
    • value read == last value written (freshness)
    • data & metadata
  • Proof of Retrievability (PoR/PDP)
    • Verifies: ALL of the data is on the cloud or recoverable
    • More on this later
  • High performance (low overhead)
    • Hundreds of MB/s data rates
    • Designed for enterprises
iris deployment scenario
Iris Deployment Scenario

heavyweight

(TBs to PBsof data)

cloud

enterprise

lightweight

(1 to 5 portals)

portal

portal(s)(distributed)

(appliances)

clients

overview file system tree
Overview: File System Tree
  • Most file systems have file-system tree.
  • Contains:
    • Directory structure
    • File names
    • Timestamps
    • Permissions
    • Other attributes
  • Efficiently laid out on disk (e.g., using B-tree)
overview merkle trees
Overview: Merkle Trees

A

  • Parents contain hash of children.
  • To verify an element (e.g., “y”) is in the tree: nodes accessed

C

B

E

D

x

y

iris unified file system merkle tree
Iris: Unified File System + Merkle Tree
  • Binary
    • Balancing nodes
  • Directory Tree
    • Root node:
      • Directory attributes
    • Leafs:
      • Subdirectories
      • Files
  • File Version Tree
    • Root node:
      • File attributes
    • Leafs:
      • File block version numbers

/u/

  • File system tree is also a Merkle tree

Free List

/u/

b

e

g

v/

c

a

f

v/

b

e

g

v/

c

a

f

  • Free List: stores deleted subtrees

b

e

g

Directory tree

File version tree

File blocks

file version tree
File Version Tree
  • Each file has a version tree
  • Version numbers increase when blocks are modified.
  • Version numbers propagate upwards to version tree root

0 : 7

v1

v0

0 : 3

4 : 7

v0

v1

v0

v1

v0

0 : 1

4 : 7

4 : 5

6 : 7

v1

v0

v1

v0

v1

v0

0

1

2

3

4

5

6

7

v0

v0

v0

v0

v1

v1

v0

v1

v0

v0

v1

v1

v0

slide9

File Version Tree

  • Process repeats for every write
  • Unique version numbers after each write
    • Helps ensure freshness

0 : 7

v2

v1

0 : 3

4 : 7

v2

v1

v2

v1

v1

0 : 1

4 : 7

4 : 5

6 : 7

v1

v2

v1

v2

v0

v2

0

1

2

3

4

5

6

7

v1

v1

v1

v0

v2

v1

v2

v1

v2

v0

v2

v0

integrity verification macs
Integrity Verification: MACs

4 KB

4 KB

  • For each file, Iris generates a MAC file.
    • Later used to verify integrity of data blocks.
    • 4 KB blocks
  • MAC is computed over:
    • file id, block index, version number, block data

b1

b2

b3

b4

b5

bi

m1

m2

m3

m4

m5

mi = MAC(fid, i, vi, bi)

20 bytes

20 bytes

merkle tree efficiency
Merkle Tree Efficiency
  • Many FS operations access paths in the tree
  • Inefficient to access one path at a time
    • Paths share ancestor nodes
      • Accessing same nodes over and over
        • Unnecessary I/O
        • Redundant Merkle tree crypto
    • Latency bound
  • Accessing paths in parallel?
    • Naïve techniques can lead to corruption
      • Same ancestor node accessed in separate threads
    • Need a Merkle tree cache
      • Very important part of our system
merkle tree cache challenges
Merkle Tree Cache Challenges
  • Nodes depend on each other
    • Parents contain hashes of children
    • Cannot evict parent before child
  • Asynchronous
    • Inefficient: one thread per node/path
  • Avoid unnecessary hashing
    • Nodes near the root of the tree often reused
  • Efficient sequential file operations
    • Inefficient: access path per block  log overhead
    • Adjacent nodes must stay “long enough” in cache.
merkle tree cache
Merkle Tree Cache

Pinned

Nodes are read into the tree in parallel.

verifying

Unpinned

To Verify

reading

Compacting

Updating Hash

Ready to Write

writing

reading a path
Reading a Path

/u/

Path:“/u/v/b”

v/

c

a

f

b

e

g

Directory tree

File version tree

Data file

MAC File

merkle tree cache1
Merkle Tree Cache

When both siblings arrive, they are verified.

Pinned

verifying

Top-down verification:

parent verified before children

Unpinned

To Verify

reading

Compacting

Updating Hash

Ready to Write

writing

verification
Verification

….

A

….

….

B

C

verify

….

….

D

E

verify

merkle tree cache2
Merkle Tree Cache

Verified nodes enter “pinned” state.

Pinned

Pinned nodes cannot be evicted.

verifying

Pinned nodes used by async file system operations.

Unpinned

To Verify

reading

Compacting

While used by at least one operation, nodes remain pinned.

Updating Hash

Ready to Write

writing

merkle tree cache3
Merkle Tree Cache

When node no longer used, it becomes “unpinned”.

Pinned

verifying

Unpinned

Unpinned nodes are eligible for eviction.

To Verify

reading

Compacting

When cache 75% full, eviction begins.

Updating Hash

Ready to Write

writing

merkle tree cache4
Merkle Tree Cache

Eviction Step #1:Adjacent nodes with identical version numbers are compacted.

Pinned

verifying

Unpinned

To Verify

reading

Compacting

Updating Hash

Ready to Write

writing

compacting
Compacting

v2

0 : 15

  • Keep:
    • if version ≠ parent version
    • for balancing
  • Stripped out redundant information

v2

4 : 7

8 : 15

v1

8 : 9

14 : 15

v1

v1

v2

0 : 15

Often files are written sequentially and compact to a single node.

v2

v2

0 : 7

8 : 15

v2

v2

0 : 3

4 : 7

8 : 11

12 : 15

v2

v1

8 : 9

10 : 11

12 : 13

14 : 15

v1

v2

v1

v2

merkle tree cache5
Merkle Tree Cache

Pinned

Eviction Step #2:

Hashes are then updated in bottom-up order.

verifying

Unpinned

To Verify

reading

Compacting

Updating Hash

Ready to Write

writing

merkle tree cache6
Merkle Tree Cache

Pinned

Eviction Step #3:Nodes written to cloud storage.

verifying

Unpinned

To Verify

reading

Compacting

Updating Hash

Ready to Write

writing

merkle tree cache7
Merkle Tree Cache

Note:Node can be pinned at any time during eviction.

Pinned

verifying

Unpinned

To Verify

Path to node becomes “pinned”.

reading

Compacting

Updating Hash

Ready to Write

writing

merkle tree cache crucial for real world workloads
Merkle Tree Cache:Crucial for Real-World Workloads
  • Iris benefits from locality
  • Very small cache required to achieve high throughput
    • Cache size: 5 MB to 10 MB
sequential workloads
Sequential Workloads
  • Results
    • 250 to 300 MB/s
    • 100+ clients
  • Cache
    • Minimal cache size ( < 1 MB ) to achieve high throughput
    • Reason: Nodes get compacted
    • Usually network bound
random workloads
Random Workloads
  • Results
    • Bound by disk seeks
  • Cache
    • Minimal cache size ( < 1 MB ) to achieve seek-bound throughput
    • Cache only used to achieve parallelism to combat latency.
    • Reason: Very little locality.
other workloads
Other Workloads
  • Very workload dependent
  • Specifically
    • Depends on number of seeks
  • Iris is designed to reduce Merkle tree seek overhead via:
    • Compacting
    • Merkle tree Cache
proofs of retereivability
Proofs of Retereivability
  • How can we be sure our data is still there?
  • Iris Continuously Verifies that the Cloud Possesses All Data
  • First sublinear solution to the open problem of

Dynamic Proofs of Retreivability

proofs of retereivability1
Proofs of Retereivability
  • Iris verifies that cloud possesses 99.9% of data (with high probability).
  • Remaining 0.1% can be recovered using Iris parity data structure.
  • Custom designed error-correcting code (ECC) and parity data structure.
    • High throughput (300-550 MB/s).
ecc challenges
ECC Challenges
  • Update efficiency
    • Want high-throughput file system
    • On-the-fly
    • ECC should not be a bottleneck
    • Reed–Solomon codes are too slow.
  • Hiding code structure
    • Adversary should not know which blocks to corrupt to make ECC fail.
    • Adversarially-secure ECC
  • Variable-length encoding
    • Handles: blocks, file attributes, Merkle tree nodes, etc
iris error correcting code
Iris Error Correcting Code

File system:

ECC Parity Stripes:

Block on file system

Pseudo-random Error-Correcting Code

Mapping from file system position to corresponding parities:

The cloud does not know the key , so it can’t determinewhich 0.1% subset of data to corrupt to make the ECC fail.

Stripe

Offset

Stripe

Offset

iris error correcting code1
Iris Error Correcting Code

File system:

ECC Parity Stripes:

Block on file system

  • Memory:
  • Update time:
  • Verification time:
    • Amortized cost

Stripe

Offset

Stripe

Offset

ecc update efficiency
ECC Update Efficiency
  • Very fast
    • 300-550 MB/s
  • Not a bottleneck in Iris
conclusion
Conclusion
  • Presented Iris file system
    • Integrity
    • Proofs of retreivability / data possession
    • On the fly
  • Very practical
    • Overall system throughput
      • 250-300 MB/s per Portal
    • Scales to enterprises