1 / 21

The Google File System

The Google File System . presenter : Kim, youngjin. Introduction. Component failures are the norm Multi-GB files are common most files are mutated by appending new data rather than overwriting. Interface. Create, delete, open, close, read and write snapshot record append. Architecture.

avian
Download Presentation

The Google File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Google File System • presenter : Kim, youngjin

  2. Introduction • Component failures are the norm • Multi-GB files are common • most files are mutated by appending new data rather than overwriting

  3. Interface • Create, delete, open, close, read and write • snapshot • record append

  4. Architecture

  5. Single Master • Simplify design • enable to make chunk placement and replication decisions using global knowledge • bottleneck -> minimize its involvement

  6. Chunk size • one of the key design parameters • 64MB • advantages vs disadvantages

  7. metadata • Three major type of metadata • the file and chunk namespace • the mapping from files to chunk • the locations of each chunk’s replicas

  8. metadata(cont’d) • In-memory Data structure • Chunk Location • Operation Log

  9. Consistency model • GFS has a relaxed consistency model • write • data to be written at an application-specified file offset • record appends • data to be appended atomically at least once

  10. System Interaction • Minimize the master’s involvement • Leases and Mutation Order • primary

  11. System Interaction(cont’d)

  12. Data flow • goal : To fully utilize each machine’s network bandwidth, avoid network bottlenecks and high-latency links, and minimize the latency to push through all the data

  13. Atomic Record Appends • Traditional write vs Record append

  14. Snapshot • makes a copy of a file or a directory tree • to use a check point to roll back or commit

  15. Master Operation • Goal • Keeping chunk fully replicated • balancing load across all the chunkservers • Reclaiming unused storage

  16. Namespace Management and Locking • Use lock over regions of the namespace to ensure proper serialization • Read-Write lock per each namespace node • Allow concurrent mutations in the same directory

  17. Replica Placement • Maximize data reliability and availability , and maximize network bandwidth utilization • spread chunk replicas across racks

  18. Creation, Re-replication, Rebalancing • Chunk replicas are created for these 3 reasons • Creation • re-replication • Rebalancing

  19. Garbage Collection • rename file name to hidden name including the deletion timestamp • keep the file for 3 days • orphaned chunk -> garbage • advantage vs disadvantage

  20. Fault Tolerance and diagnosis • High Availability • Fast Recovery • Replication • chunk Replication • Master Replication

  21. Data integrity • Checksum is used by each chunkservers • For detecting corruption of stored data • kept in memory -> fast lookup / comparison • optimized for record append

More Related