the google file system
Skip this Video
Download Presentation
The Google File System

Loading in 2 Seconds...

play fullscreen
1 / 36

The Google File System - PowerPoint PPT Presentation

  • Uploaded on

The Google File System. Sanjay Ghemawat , Howard Gobioff , and Shun- Tak Leung Google* 정학수 , 최주영. Outline. Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions. Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Google File System' - lobo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the google file system

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung


정학수, 최주영

  • Introduction
  • Design Overview
  • System Interactions
  • Master Operation
  • Fault Tolerance and Diagnosis
  • Conclusions
  • GFS was designed to meet the demands of Google’s data processing needs.
  • Emphasis on Design
    • Component failures
    • Files are huge
    • Most files are mutated by appending
  • Composed of inexpensive components often fail
  • Stores 100 MB or larger size file
  • Large streaming reads, small random reads
  • Large, sequential writes that append data to files.
  • Atomicity with minimal synchronization overhead is essential.
  • High sustained bandwidth is more important than low latency
  • Files are organized hierarchically in directories and identified by pathnames


  • Google File System. Designed for system-to-system interaction, and not for user-to-system interaction.
chunk size
Chunk Size
  • Large chunk size – 64MB
    • Advantages
      • Reduce client-master interaction
      • Reduce network overhead
      • Reduce the size of metadata
    • Disadvantages
      • Hot spot - Many clients accessing the same file
  • All metadata is kept in master’s memory
  • Less than 64bytes metadata each chunk
  • Types
    • File and chunk namespace
    • File to chunk mapping
    • Location of each chunk’s replicas
metadata cont d
  • In-Memory data structure
    • Master operations are fast
    • Easy and efficient periodically scan
  • Operation log
    • Contain historical record of critical metadata changes
    • Replicate on multiple remote machines
    • Respond to client only after log record
    • Recovery by replaying the operation log
consistency model
Consistency Model
  • Consistent
    • all clients will always see the same data regardless of which replicas they read from
  • Defined
    • consistent and clients will see what mutation writes in its entirety
  • Inconsistent
    • different clients may see different data at different times
leases and mutation order
Leases and Mutation Order
  • Leases
    • To maintain a consistent mutation order across replicas and minimize management overhead
    • The master grants one of the replicas to become the primary
    • Primary picks a serial order of mutation
    • When applying mutation all replicas follow the order
data flow
Data Flow
  • Fully utilize network bandwidth
    • Decouple control flow and data flow
  • Avoid network bottlenecks and high-latency
    • Forwards the data to the closest machine
  • Minimize latency
    • Pipelining the data transfer
atomic record appends
Atomic Record Appends
  • Record append : atomic append operation
    • Client specifies only the data
    • GFS appends data at an offset of GFS’s choosing and return that offset to client
    • Many clients append to the same file concurrently
      • such files often serves as multiple-producer/ single-consumer queue
      • Contain merged results


Make a copy of a file or a directory tree

Standard copy-on-write

namespace management and locking
Namespace Management and Locking
  • Namespace
    • Lookup table mapping full pathname to metadata
  • Locking
    • To ensure proper serialization multiple operations active and use locks over regions of the namespace
    • Allow concurrent mutations in the same directory
    • Prevent deadlock consistent total order
replica placement
Replica Placement
  • Maximize data reliability and availability
  • Maximize network bandwidth utilization
    • Spread replicas across machines
    • Spread chunk replicas across the racks
creation re replication rebalancing
Creation, Re-replication, Rebalancing
  • Creation
    • Demanded by writers
  • Re-replication
    • Number of available replicas fall down below a user-specifying goal
  • Rebalancing
    • For better disk space and load balancing
garbage collection
Garbage Collection
  • Lazy reclaim
    • Log deletion immediately
    • Rename to a hidden name with deletion timestamp
      • Remove 3 days later
      • Undelete by renaming back to normal
  • Regular scan
    • Heartbeat message exchange with each chunkserver
    • Identify orphaned chunks and erase the metadata
stale replica detection
Stale Replica Detection
  • Maintain a chunk version number
    • Detect stale replicas
  • Remove stale replicas in regular garbage collection
high availability
High Availability
  • Fast recovery
    • Restore state and start in seconds
  • Chunk replication
    • Different replication levels for different parts of the file namespace
    • Master clones existing replicas as chunkservers go offline or detect corrupted replicas through checksum verification
high availability1
High Availability
  • Master replication
    • Operation log and checkpoints are replicated onmultiple machines
    • Master machine or disk fail
      • Monitoring infrastructure outside GFS starts new master process
    • Shadow master
      • Read-only access when primary master is down
data integrity
Data Integrity
  • Checksum
    • To detect corruption
    • Every 64KB block in each chunk
    • In memory and stored persistently with logging
  • Read
    • Chunkserver verifies checksum before returning
  • Write
    • Append
      • Incrementally update the checksum for the last block
      • Compute new checksum
data integrity cont d
Data Integrity(Cont’d)
  • Write
    • Overwrite
      • Read and verify the first and last block then write
      • Compute and record new checksums
  • During idle periods
    • Chunkservers scan and verify inactive chunks
micro benchmarks
  • GFS cluster
    • 1 master
    • 2 master replicas
    • 16 chunkservers
    • 16 clients
  • Server machines connected to one switch
  • client machines connected to the other
  • Two switches are connected with 1 Gbps link.
micro benchmarks1

Figure 3: Aggregate Throughputs. Top curves show theoretical limits imposed by our network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.

real world clusters
Real World Clusters

Table2: characteristic Of two GFS clusters


Real World Clusters

Table 3: Performance Metrics for

Two GFS Clusters

real world clusters1
Real World Clusters
  • In cluster B
    • Killed a single chunk server containing 15,000 chunks (600GB of data)
      • All chunks restored in 23.2minutes
      • Effective replication rate of 440MB/s
    • Killed two chunk servers each 16,000 chunks

(660GB of data)

      • 266 chunks only have a single replica
      • Higher priority
      • Restored with in 2 minutes
  • Demonstrates qualities essential to support large-scale processing workloads
    • Treat component failure as the norm
    • Optimize for huge files
    • Extend and relax standard file system
  • Fault tolerance provide
    • Consistent monitoring
    • Replicating crucial data
    • Fast and automatic recovery
    • Use checksum to detect data corruption
  • High aggregate throughput