Distributed File System By Manshu Zhang
Outline • Basic Concepts • Current project • Hadoop Distributed File System • Future work • Reference
DFS A distributed implementation of the classical time sharing model of a file system, where multiple users share files and storage resources.
Key Characteristics of DFS • Dispersion • Clients and files • Multiplicity • Clients and files
Primary issues of DFS Naming and Transparency Fault Tolerance
Naming Naming – mapping between logical and physical objects. Multilevel mapping. Transparent replicas and location
Naming Schemes — Three Main Approaches • Host name + local name • guarantees a unique system wide name. • Mount remote directories to local directories • once mounted, files can be referenced in a location-transparent manner • Total integration of the component file systems. • A single global name structure • If a server is unavailable, some arbitrary set of directories on on different machines also becomes unavailable
Transparency(1) • Login Transparency:User can log in at any host with uniform login procedure and perceive a uniform view of the file system. • Access Transparency: Client process on a hots has uniform mechanism to access all files in system regardeless of files are on local/remote host. • Location Transparency: The names of the files do not reveal their physical location.
Transparency(2) Concurrency Transparency: An update to a file should not have effect on the correct execution of other process that is concurrently sharing a file. Replication Transparency: Files may be replicated to provide redundancy for availability and also to permit concurrent access for efficiency.
Fault Tolerance • Stateful Vs. Stateless • Maintain information on client • File Replication
Distinctions Between Stateful &Stateless Service • Failure Recovery. • A stateful server loses all its volatile state in a crash. • With stateless server, the effects of server failure and recovery are almost unnoticeable.
File Replication Several copies of a file's contents at different locations enable multiple servers to share the load of providing the service Naming scheme maps a replicated file name to a particular replica. Updates
Current Project HDFS: Hadoop Distributed File System Distributed parallel fault tolerant file system. It is designed to reliably store very large files across machines in a large cluster. Efficient, reliable, and open source
Naming: central metadata server Synchronization: write-once-read-many, give locks on objects to clients, using leases Consistency and replication: server side replication, asynchronous replication, checksum Fault tolerance: failure as norm Security: no dedicated security mechanism
Future Work Robustness of data sharing model The preceding section, architecture, naming, synchronization, availability, heterogeneity and support for databases Security
Reference  Thanh, T.D.; Mohan, S.; Choi, E.; SangBum Kim; Pilsung Kim. 2008Networked Computing and Advanced Information Management. “A Taxonomy and Survey on Distributed File Systems”  Randy chow,1997,Distributed operating systems & Algorithms  Eliezer Levy, Abraham Silberschatz. December 1990 Computing Surveys (CSUR) , Volume 22 Issue 4. ”Distributed file systems: concepts and examples”. http://hadoop.apache.org/common/docs/current/hdfs_design.html#Introduction http://www.snia.org/events/wintersymp2009/cloud/dhruba_hadoop_snia.pdf
http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systemshttp://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems http://en.wikipedia.org/wiki/Hadoop#Hadoop_Distributed_File_System http://www.cs.gsu.edu/~cscyqz/courses/aos/slides08/ch6.1-Fall08.pptx