Lecture 24: GFS

Lecture 24: GFS

Google File System • Why yet another file system? • Who are the “clients”? • Design objectives • Architecture • Cache consistency • Location transparent/independent? • Stateful or stateless?

Google File System • Key background: • New workload => new filesystem (why?) • Extreme scale: 100 TB over 1000s of disks on >1000 machines • New API • Architecture? • Who are the clients? • Four problems to solve: • Fault tolerance: this many components ensures regular failures. Must have automatic recovery. • Huge files (are they?) • Most common operation is append, not random writes • Most files are write once, and read-only after that • web snapshots, intermediate files in a pipeline, archival files • implies that streaming is much more important than block caching (and LRU would be a bad choice) • Customize the API to enable optimization and flexibility

GFS: New Workload • Few million large files, rather than billions of small files • Large streaming reads, random reads • Large streaming writes, very rare random writes • Concurrency for append is critical • also producer/consumer concurrency • Focus on throughput (read/write bandwidth) not latency • Why?

GFS Architecture

GFS Architecture • single master, multiple chunkservers, multiple clients • fixed-size chunks (giant blocks) (64MB) • 64-bit ids for each chunk • clients read/write chunks directly from chunkservers • unit of replication? • master maintains all metadata • namespace and access control • map from filenames to chunk ids • current locations for each chunk • no caching for chunks • metadata is cached at clients

GFS Master • Single master: • Claim: simple, but good enough • Enables good chuck placement (centralized decision) • Scalability is a concern. • What is the GFS solution? • Clients cache (file name -> chunk id, replica locations), this expires eventually • Large chunk size reduces master RPC interaction and space overhead for metadata • All metadata is in memory (why?) • metadata is 64B per chunk

Question: Does GFS implement Location Transparency? What about Location Independence?

Cache Consistency in GFS • Caching? • Of data? • Of metadata? • Metadata changes? • Namespace changes are atomic and serializable (easy since they go through one place: the master) • Replicas: • “defined” if it reflects a mutation and “consistent” • “consistent”: all replicas have the same value • Concurrent writes will leave region consistent, but not necessarily defined; some updates may be lost • “region”? • A failed write leaves the region inconsistent • To ensure definition, GFS must apply writes in the same order at all replicas: How?

Writes and replica updates in GFS

Consistency in GFS (cont) • consistency ensured via version numbers • stale versions not used and GC’d • client chunk caching may contain stale data! • this is OK for append only files, but not for random writes. • primary replica decides the write order (it has lease for this right from the master) • lease extensions piggybacked on the heartbeat messages • master can revoke a lease but only with the replicas agreement; otherwise it has to wait for expiration • client pushes data out to all replicas using a data flow tree • after all data received, client notifies the primary to issue the write, which then gets ordered and can now be executed • client retries if the write fails • writes across chunks: transactional? Consequences?

GFS: statefull or stateless?

Lecture 24: GFS

Lecture 24: GFS

Presentation Transcript

LECTURE No.2

Euthanasia*

Function-Oriented Software Design (lecture 5)

STERILIZATION AND ASEPSIS

LECTURE 7: OTHER TYPES OF ADVERTISING

AAPG Distinguished Lecture Series Presents the J. Ben Carsey Lecture

Mobile Programming Lecture 10

Lecture #1 Passives

Lesson 1 – Lecture of a Lifetime

NEUROANATOMY Lecture : 10

EEE 464 Wireless Communications Lecture 9

Sequence Alignment and Dynamic Programming

Statistical met h od s for boson s

EEE 464 Wireless Communications Lecture 7

Statistical met h od s for boson s

H o w C a n I K n o w I f I a m R e a l l y S a v e d ? (lecture #3)

LECTURE 5: LAWS IN THE SOCIAL SCIENCES?

Formal Specification (continued) Lecture 3:

ENT Undergraduate Lecture

Lecture 2 Stress and Its Management