The google file system
Download
1 / 36

The Google File System - PowerPoint PPT Presentation


  • 173 Views
  • Uploaded on

The Google File System. Sanjay Ghemawat , Howard Gobioff , and Shun- Tak Leung Google* 정학수 , 최주영. Outline. Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Google File System' - lobo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The google file system

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Google*

정학수, 최주영


Outline
Outline

  • Introduction

  • Design Overview

  • System Interactions

  • Master Operation

  • Fault Tolerance and Diagnosis

  • Conclusions


Introduction
Introduction

  • GFS was designed to meet the demands of Google’s data processing needs.

  • Emphasis on Design

    • Component failures

    • Files are huge

    • Most files are mutated by appending



Assumptions
Assumptions

  • Composed of inexpensive components often fail

  • Stores 100 MB or larger size file

  • Large streaming reads, small random reads

  • Large, sequential writes that append data to files.

  • Atomicity with minimal synchronization overhead is essential.

  • High sustained bandwidth is more important than low latency


Interface
Interface

  • Files are organized hierarchically in directories and identified by pathnames


Architecture

  • Google File System. Designed for system-to-system interaction, and not for user-to-system interaction.



Chunk size
Chunk Size

  • Large chunk size – 64MB

    • Advantages

      • Reduce client-master interaction

      • Reduce network overhead

      • Reduce the size of metadata

    • Disadvantages

      • Hot spot - Many clients accessing the same file


Metadata
Metadata

  • All metadata is kept in master’s memory

  • Less than 64bytes metadata each chunk

  • Types

    • File and chunk namespace

    • File to chunk mapping

    • Location of each chunk’s replicas


Metadata cont d
Metadata(Cont’d)

  • In-Memory data structure

    • Master operations are fast

    • Easy and efficient periodically scan

  • Operation log

    • Contain historical record of critical metadata changes

    • Replicate on multiple remote machines

    • Respond to client only after log record

    • Recovery by replaying the operation log


Consistency model
Consistency Model

  • Consistent

    • all clients will always see the same data regardless of which replicas they read from

  • Defined

    • consistent and clients will see what mutation writes in its entirety

  • Inconsistent

    • different clients may see different data at different times



Leases and mutation order
Leases and Mutation Order

  • Leases

    • To maintain a consistent mutation order across replicas and minimize management overhead

    • The master grants one of the replicas to become the primary

    • Primary picks a serial order of mutation

    • When applying mutation all replicas follow the order


Leases and mutation order cont d
Leases and Mutation Order(Cont’d)


Data flow
Data Flow

  • Fully utilize network bandwidth

    • Decouple control flow and data flow

  • Avoid network bottlenecks and high-latency

    • Forwards the data to the closest machine

  • Minimize latency

    • Pipelining the data transfer


Atomic record appends
Atomic Record Appends

  • Record append : atomic append operation

    • Client specifies only the data

    • GFS appends data at an offset of GFS’s choosing and return that offset to client

    • Many clients append to the same file concurrently

      • such files often serves as multiple-producer/ single-consumer queue

      • Contain merged results


Snapshot
Snapshot

SNAPSHOT

Make a copy of a file or a directory tree

Standard copy-on-write



Namespace management and locking
Namespace Management and Locking

  • Namespace

    • Lookup table mapping full pathname to metadata

  • Locking

    • To ensure proper serialization multiple operations active and use locks over regions of the namespace

    • Allow concurrent mutations in the same directory

    • Prevent deadlock consistent total order


Replica placement
Replica Placement

  • Maximize data reliability and availability

  • Maximize network bandwidth utilization

    • Spread replicas across machines

    • Spread chunk replicas across the racks


Creation re replication rebalancing
Creation, Re-replication, Rebalancing

  • Creation

    • Demanded by writers

  • Re-replication

    • Number of available replicas fall down below a user-specifying goal

  • Rebalancing

    • For better disk space and load balancing


Garbage collection
Garbage Collection

  • Lazy reclaim

    • Log deletion immediately

    • Rename to a hidden name with deletion timestamp

      • Remove 3 days later

      • Undelete by renaming back to normal

  • Regular scan

    • Heartbeat message exchange with each chunkserver

    • Identify orphaned chunks and erase the metadata


Stale replica detection
Stale Replica Detection

  • Maintain a chunk version number

    • Detect stale replicas

  • Remove stale replicas in regular garbage collection


Fault tolerance and diagnosis
FAULT TOLERANCE AND DIAGNOSIS


High availability
High Availability

  • Fast recovery

    • Restore state and start in seconds

  • Chunk replication

    • Different replication levels for different parts of the file namespace

    • Master clones existing replicas as chunkservers go offline or detect corrupted replicas through checksum verification


High availability1
High Availability

  • Master replication

    • Operation log and checkpoints are replicated onmultiple machines

    • Master machine or disk fail

      • Monitoring infrastructure outside GFS starts new master process

    • Shadow master

      • Read-only access when primary master is down


Data integrity
Data Integrity

  • Checksum

    • To detect corruption

    • Every 64KB block in each chunk

    • In memory and stored persistently with logging

  • Read

    • Chunkserver verifies checksum before returning

  • Write

    • Append

      • Incrementally update the checksum for the last block

      • Compute new checksum


Data integrity cont d
Data Integrity(Cont’d)

  • Write

    • Overwrite

      • Read and verify the first and last block then write

      • Compute and record new checksums

  • During idle periods

    • Chunkservers scan and verify inactive chunks



Micro benchmarks
Micro-benchmarks

  • GFS cluster

    • 1 master

    • 2 master replicas

    • 16 chunkservers

    • 16 clients

  • Server machines connected to one switch

  • client machines connected to the other

  • Two switches are connected with 1 Gbps link.


Micro benchmarks1
Micro-benchmarks

Figure 3: Aggregate Throughputs. Top curves show theoretical limits imposed by our network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.


Real world clusters
Real World Clusters

Table2: characteristic Of two GFS clusters


Real World Clusters

Table 3: Performance Metrics for

Two GFS Clusters


Real world clusters1
Real World Clusters

  • In cluster B

    • Killed a single chunk server containing 15,000 chunks (600GB of data)

      • All chunks restored in 23.2minutes

      • Effective replication rate of 440MB/s

    • Killed two chunk servers each 16,000 chunks

      (660GB of data)

      • 266 chunks only have a single replica

      • Higher priority

      • Restored with in 2 minutes


Conclusions
Conclusions

  • Demonstrates qualities essential to support large-scale processing workloads

    • Treat component failure as the norm

    • Optimize for huge files

    • Extend and relax standard file system

  • Fault tolerance provide

    • Consistent monitoring

    • Replicating crucial data

    • Fast and automatic recovery

    • Use checksum to detect data corruption

  • High aggregate throughput


ad