the google file system l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Google File System PowerPoint Presentation
Download Presentation
The Google File System

Loading in 2 Seconds...

play fullscreen
1 / 17

The Google File System - PowerPoint PPT Presentation


  • 334 Views
  • Uploaded on

S. Ghemawat, H. Gobioff and S-T. Leung, The Google File System, In Proc. of the 19th ACM Symposium on Operating Systems Principles, Oct. 2003. Presenter: John Otto. The Google File System. Outline. Overview Motivation Assumptions and Optimizations Design Considerations Structure Physical

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

The Google File System


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the google file system

S. Ghemawat, H. Gobioff and S-T. Leung, The Google File System, In Proc. of the 19th ACM Symposium on Operating Systems Principles, Oct. 2003.

Presenter: John Otto

The Google File System
outline
Outline
  • Overview
  • Motivation
  • Assumptions and Optimizations
  • Design Considerations
  • Structure
    • Physical
    • Data
  • File System Operations
  • Application Requirements
  • Write Procedure
  • Master Operations
  • Related Work
  • Discussion
overview
Overview
  • Distributed tiered system
  • Terabytes of data, thousands of machines
  • Handles component failures
  • Optimized for small random reads, large sequential reads, and record append operations
  • Manages multiple clients by implementing atomic operations
motivation
Motivation
  • Need a robust storage mechanism for very large files
  • Manage large volumes of data being read/written
  • Transparently provide replication mechanisms to prevent data loss and handle component failure
assumptions and optimizations
Assumptions and Optimizations
  • Assume that components will fail
  • Optimize for large files; support small ones
  • Optimize for long sequential reads, small random reads
  • Optimize for long sequential writes, possibly from multiple clients
  • Optimize for high throughput, not low latency
design considerations
Design Considerations
  • More important to implement these optimizations than the POSIX API
  • Flexibility to implement custom operations
    • e.g. snapshot, record append
data structure
Data Structure
  • Chunks
    • 64MB, uniquely identified by chunk handle
  • Single Master
    • maintains file system metadata
    • logs all operations, commits to disk on self and replicas before reporting changes to clients
    • caches in memory current chunk locations
    • versions chunks
  • “Shadow” Master replicas
    • maintain logs of master operations
    • bear read-only load from clients
  • Many Chunkservers
    • maintain local authoritative chunk list
    • interact with clients for read/write data operation
file system operations
File System Operations
  • Read
  • Mutation
    • Write
    • Record Append
    • Delete
    • Rename
  • Snapshot
    • Lease Revocation; “Copy on Write”
application requirements
Application Requirements
  • Prefer append operations rather than overwriting data
  • Should be able to handle duplicate records/padding
  • Has to be able to handle stale or indefinite data (regions of the file written by multiple concurrent clients)
master operations
Master Operations
  • Locking
    • Read/Write
    • Creation; directory doesn't maintain list of files
  • Replica Placing/Modification
  • Garbage Collection/Deletion
fault tolerance
Fault Tolerance
  • Chunkservers come up within seconds
  • Master functions within 30-60 seconds
    • Must get current chunk locations from chunkservers
  • Replication
  • Checksums
  • Logging for Diagnostics
related work
Related Work
  • AFS doesn't spread files across multiple servers
  • xFS and Swift use RAID, more efficient disk use regarding replication
  • Frangipani... no centralized server
  • NASD have variable object size on server vs. chunks
discussion questions
Discussion / Questions
  • How much responsibility is/should be pushed to the application? Is this a good thing?
  • Should there be better write/record append monitoring to keep track of consistency and versioning?
  • What would a “self-contained self-verifying record” look like?
  • Why aren't files treated as abstractions, with records being explicitly tracked, essentially making a big “database table” or set of rows?
  • Who maintains the list of record offsets and locations? The client or application?