1 / 29

The Google File System

The Google File System. Presenters: Rezan Amiri Sahar Delroshan. Azad university of Kurdistan. Outline. Distributed File systems Overview GFS (Google File System) Motivations Assumptions Architecture Algorithms conclusions. Azad university of Kurdistan. Distributed File systems.

teagan
Download Presentation

The Google File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Google File System Presenters: Rezan Amiri Sahar Delroshan Azad university of Kurdistan

  2. Outline • Distributed File systems Overview • GFS (Google File System) • Motivations • Assumptions • Architecture • Algorithms • conclusions The Google File System Azad university of Kurdistan

  3. Distributed File systems • Sneaker net is the process of sharing files by copying them onto floppy disks, physically carrying it to another computer and copying it again (1980S) • a file system that joins together the file systems of individual machines in network • Files are stored (distributed) on different machines in a computer network but are accessible from all machines. • called network file systems The Google File System Azad university of Kurdistan

  4. Examples of Distributed File System • Distributed file systems • Andrew File System (AFS) • Network File System (NFS) • Apple Filing Protocol (AFP) • Distributed fault tolerant file systems • Coda • Distributed File System (Microsoft) (DFS( The Google File System Azad university of Kurdistan

  5. Motivation • Who doesn’t know about Google? • the accuracy is dependent on how the algorithm is designed. • Lary Page & Sergey Brin 1998 • Google is beyond the searching • Google Video • Gmail • Google Map • Google Earth,… • Google Operations The Google File System Azad university of Kurdistan

  6. Motivation • Redundant storage of massive amounts of data on cheap and unreliable computers • problems caused by (OS bugs , human errors , …) • Why not use an existing file system? • Google’s problems are different from anyone else’s • Different workload and design priorities The Google File System Azad university of Kurdistan

  7. Assumptions • High component failure rates • Inexpensive commodity components fail all the time • Workloads • large streaming read, Small random reads , Writes • efficiently implement well-defined semantics for multiple clients that concurrently append to the same file • High sustained bandwidth is more important than low latency The Google File System Azad university of Kurdistan

  8. GFS Design Decisions • Files stored as chunks • Fixed size (64MB) • Reliability through replication • Each chunk replicated across 3+ chunk servers • Single master to coordinate access, keep metadata • Add snapshot and record append operations The Google File System Azad university of Kurdistan

  9. GFS Architecture • 1 master • many chunks server • Files is divided into large chunks • chunks are replicated across the system The Google File System Azad university of Kurdistan

  10. Chunks and Chunk Servers • Similar to block in file systems • Size is always 64 MB • Lazy allocation to avoid fragmentation • Reduce the need to contact to master • Reduce network overhead • Client consistently maintain TCP connection with chunk server • Reduce size of metadata in memory • Allow to keep metadata in memory • Problem :” hot spot “ with small file The Google File System Azad university of Kurdistan

  11. Metadata • Global metadata is stored on the master • File and chunk namespaces • Mapping from files to chunks • Locations of each chunk’s replicas • All in memory (64 bytes / chunk) • Fast • Easily accessible • operation log • Persistent on local disk • Replicated • Checkpoints for faster recovery The Google File System Azad university of Kurdistan

  12. Master’s Responsibilities • Master is a single process running on a separate machine that stores all metadata • Periodic communication with chunk servers • Namespace management and locking • It allows concurrent mutations in the same directory • creation, re-replication, rebalancing The Google File System Azad university of Kurdistan

  13. Master’s Responsibilities • The master re-replicatesa chunk as soon as the number of available replicas falls below a user-specified goal : • a chunk server becomes unavailable • Chunk re-replicated is prioritized • how far it is from its replication goal • Re-replicate for live files The Google File System Azad university of Kurdistan

  14. Master’s Responsibilities • Replica placement • Improve reliability, availability • network bandwidth utilization • Garbage Collection • simpler, more reliable than traditional file delete • master logs the deletion, renames the file to a hidden name • Lease mechanism • Help reduce the overhead at master The Google File System Azad university of Kurdistan

  15. Read Algorithm 1 .Application originates the read request 2. GFS client translates request and sends it to master 3. Master responds with chunk handle and replica Locations The Google File System Azad university of Kurdistan

  16. Read Algorithm 4.Client picks a location and sends the request 5. Chunk server sends requested data to the client 6. Client forwards the data to the application The Google File System Azad university of Kurdistan

  17. Write Algorithm 1,2,3 as like as read algorithm 4.Client pushes write data to all locations. Data is stored in chunk server's internal buffers The Google File System Azad university of Kurdistan

  18. Write Algorithm 5.Client sends write command to primary 6. Primary determines serial order for data instances in its buffer and writes the instances in that order to the Chunk 7. Primary sends the serial order to the secondary's and tells themto perform the write The Google File System Azad university of Kurdistan

  19. Write Algorithm 8.Secondaries respond back to primary 9. Primary responds back to the client The Google File System Azad university of Kurdistan

  20. Leases and mutation order The Google File System Azad university of Kurdistan

  21. Append Algorithm • GFS appends it to the file atomically at least once • - Client determine data • GFS picks the offset • 1,2,3 ,4 as like as write algorithm 5. Primary checks if record fits in specified chunk 6. If the record does not fit: 1. Pads the chunk 2. Tells secondary to do the same 3. Informs client 4. Client then retries the append with the next chunk The Google File System Azad university of Kurdistan

  22. Append Algorithm 7. If record fits, then the primary: 1. Appends the record 2.Tells secondary's to do the same 3.Receives responses from secondary's 4. And sends final response to the client • snapshot - Fast Replicate from all system The Google File System Azad university of Kurdistan

  23. Fault Tolerance • High availability • Fast recovery • chunk replication • default: 3 replicas • replication • Shadow master • Data integrity The Google File System Azad university of Kurdistan

  24. Performance Test The Google File System Azad university of Kurdistan

  25. Conclusion • GFS demonstrates how to support large-scale processing workloads on commodity hardware • Performance • Scalability • Fault-tolerance • Minimize master involvement The Google File System Azad university of Kurdistan

  26. Strong point • Single master  simplify design  fast  simple failure handling The Google File System Azad university of Kurdistan

  27. weakness • Replication • can be out of sync • Space overhead • Consistency among replica(what if replica has different length) The Google File System Azad university of Kurdistan

  28. References • Sanjay Ghemawat , Howard Gobioff , and Shun- Tak Leung, The Google File System, ACM Symposium on Operating Systems Principles, 2003 • NaushadUzZaman, Survey on Google File System, CSC 456 (Operating Systems), 2007 • Wikipedia Contributors, Google File System, Wikipedia -The Free Encyclopedia, 2010 The Google File System Azad university of Kurdistan

  29. Questions? The Google File System Azad university of Kurdistan

More Related