1 / 17

Google File System Simulator

Google File System Simulator. Pratima Kolan Vinod Ramachandran. Google File System. Master Manages Metadata Data Transfer Happens directly between client and chunk server Files broken into 64 MB chunks Chunks replicated across three machines for safety. Event Based Simulation.

azize
Download Presentation

Google File System Simulator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Google File System Simulator Pratima Kolan Vinod Ramachandran

  2. Google File System • Master Manages Metadata • Data Transfer Happens directly between client and chunk server • Files broken into 64 MB chunks • Chunks replicated across three machines for safety

  3. Event Based Simulation Get Next High Priority Event from Queue Component 1 Simulator Place Event in Priority Queue Priority Queue Component 2 Event 1 Event 2 Event 3 Output of simulated event Component 3

  4. Simplified GFS Architecture Switch: Infinite Bandwidth Client Master Server Switch Represent Network Queues Network Disk 3 Network Disk 4 Network Disk 5 Network Disk 1 Network Disk 2

  5. Data Flow The client queries the master server for a Chunk ID it wants to read. The master server returns a set of disks ids that contain the Chunk. The client requests a disk for the Chunk The disk transfers the data to the client

  6. Experiment Setup • We have a client whose bandwidth can be varied from 0…..1000 Mbps • We have 5 disks each a having a per disk bandwidth of 40 Mbps • We have 3 chunk replicas per chunk of data as a baseline • Each client request is for 1 Chunk of data from a disk

  7. Simplified GFS Architecture Client Bandwidth varied from 0…..1000 Mbps Switch: Infinite Bandwidth Client Master Server Switch Represent Network Queues Network Disk 3 Network Disk 4 Network Disk 5 Network Disk 1 Network Disk 2 Chunk ID: 0-1000 0-1000 0-2000 1001-2000 1001-2000 Per Disk Bandwidth : 40 Mbps

  8. Experiment 1 • Disk Requests Served With out Load Balancing • In this case we pick the first chunk server from the list of available chunk servers that contains the disk block. • Disk Requests Served With Load Balancing • In this case we apply a greedy algorithm and balance the load of incoming requests across the 5 disks

  9. Expectation • In the Non load balancing case we expect the effective request/data rate to reach a peak value of 2 disks(80 Mbps) • In the load balancing case we expect the effective request/data rate to reach a peak value of 5 disks(200 Mbps)

  10. Load Balancing Graph This graph plots the data rate at client vs. client bandwidth

  11. Experiment 2 • Disk Requests Served With No Dynamic Replication • In this case we have a fixed number of replicas(3 in our case) and the server does not create more replication based on statistics for read requests. • Disk Requests Served With Dynamic Replication • In this case the server replicates certain chunks based on the frequency of the chunk requests. • We define a replication factor , which is fraction < 1 • No of Replicas For Chunk = (replication factor) * No of requests For The Chunk • We Cap the Max No of Replicas by the Number of disks

  12. Expectation • Our Requests are all aimed on the chunks placed in disk 0,disk 1 , disk2. • In the non replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 3 disks(120 Mbps) • In the replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 5 disks(200 Mbps)

  13. Replication Graph This graph plots the data rate at client vs. client bandwidth

  14. Experiment 3 • Disk Requests Served with no Rebalancing • In this case we do not implement any rebalancing of read requests based on frequency of chunk requests • Disk Requests Served with Rebalancing • In this case we perform rebalancing of read requests by picking a request with highest frequency and transferring it to a disk with a lesser load

  15. Graph 3

  16. Request Distribution Graph

  17. Conclusion and Future Work • GFS is a simple file system for large-data intensive applications • We studied the behavior of certain read workloads on this file system • In the future we would like to come up with optimizations that could fine tune GFS

More Related