File Access Services, Directory Services, & The World Wide Web (WWW)

File Access Services, Directory Services, & The World Wide Web (WWW) 635.413.31 – Summer 2007

File Access Services Definition of File Access Services • The ability to transparently access information stored in files on other systems across a network File Access vs. File Transfer • File Transfer Characteristics • File transfer makes independent ‘copies’ of files on different hosts • Copies the entire file and access is generally not available until the copy is completely executed • File access methods & privileges are not the same for local and remote resources (so it’s not ‘transparent’) • Allows for storage of intermediate results, distributed management & control, and data protection

File Access Services File Access Characteristics • User access is only to a selected portion of a file; the complete ‘master’ copy of the file is stored on a single system • ‘Images’ of file are not independent; there is only one real copy stored on a single system • File access methods and permissions are typically the same for local and remote resources • Typically provides quick up-to-date access to files and prevents different ‘versions’ of a file from getting out of synchronization • Multi-user access to files must be carefully managed!!

Examples of Major File Access Services • The Networked File System (NFS) • The goal of NFS is to provide transparent file access for clients to files and filesystems on NFS servers • From both a user & a technical perspective uses a client-server architecture • Developed by Sun Microsystems to allow filesystems to be distributed across multiple UNIX/Linux systems • Current specification (version 4) is in RFC 3530. still a number of version 3 implementations (RFC 1813) • NFS clients available for Windows and Mac Operating Systems • Theoretically any application on the client that works with local file access should also work with NFS file access • Technical details of NFS will be examined later

Examples of Major File Access Services • Microsoft File Services • File access and sharing has been around as a separate product for a decade (LANManager); support for networked file access now an integral part of Windows 95/98/NT and later versions • Originally proprietary but Microsoft has transitioned to TCP/IP (actually UDP) and published the interface specifications in RFC 1001 and 1002 • Major components of Microsoft file access services: • The Server Message Block API • The NetBIOS API • Browser service • The Universal Naming Convention (UNC)

Examples of Major File Access Services • Microsoft File Services (Continued) • The client typically maps a networked file or filesystem by binding the resource (via the UNC) to a local drive letter – at that point access is the same as a local resource • Also Resources can be mapped directly with the UNC • Clients available for Windows, OS\2, Mac, and many desktop UNIX/Linux systems (via SAMBA, etc.) • Microsoft has introduced a new component called DFS (Distributed File Services) • Provides a unified virtual namespace that makes distributed resources look like local resources • Provides unified access across multiple ‘real’ file services like SMB, Netware, and NFS • Still proprietary to Microsoft

Examples of Major File Access Services • Apple (Appleshare and the Appletalk Filing Protocol) • Created as part of a proprietary network architecture for Macs • Created for Apple File sharing: client-server or peer-to-peer • Current versions run on TCP/IP • Novell Netware • Created to support Novell File Servers & Clients • Proprietary client-server architecture though Novell does do 3rd party licensing • Access to remote resources via ‘drive mapping’ where a remote resource is bound to a local drive letter • Current versions run on TCP/IP • Clients for DOS, Win3.1, Win95/98, NT, OS/2, and Mac

The Remote Procedure Call (RPC) The foundation of NFS – The Remote Procedure Call (RPC) • NFS is based on the Remote Procedure Call, a generic API that allows procedures called on one system to execute on another • If a client is RPC ‘aware’ nothing needs to be done by a user to access resources via NFS • There are two different RPC varieties; Sun RPC and the OSF Distributed Computing Environment (DCE) • Sun RPC specifications published in RFC 1057; version 2 ‘open’ specifications published in RFC 1831 (version 2) • Sun RPC, the foundation of most NFS implementations, uses the Sockets API to access TCP and UDP transport services • The RPC specifications consist of two main pieces: • The transfer protocol (message format & exchange) • Data representation & encoding

The Remote Procedure Call (RPC) General Diagram of RPC Operation

The Remote Procedure Call (RPC) RPC Message format • Two messages used – a Request and a Reply • Fields in an RPC Request • Transaction ID (XID) field [4 bytes]: a field initialized by the client w/ a unique number used to match Replies w/Requests • Call field [4 bytes]: set to zero for a Request & one for a Reply • RPC version field [4 bytes]: specifies what version of RPC is in use; typically either two (version 2) or three (for version 3) • Program numbers, version, and procedure fields [4 bytes each]: used by the server to determine what specific procedure the client wants to access remotely

The Remote Procedure Call (RPC) RPC Message Format (Continued) • Credentials field [up to 408 bytes]: • Used to identify the client to the server • Use is optional; typically the User ID and Group ID of the client process is included • Can be used for the server to enforce access restrictions on the RPC • Actual length of field encoded at the beginning of the field value • Verifier field [up to 408 bytes]: used with secure RPC to pass encryption related information • Parameters field [variable length]: holds all parameter information necessary for the procedure call to be executed by the server

The Remote Procedure Call (RPC) RPC Data Representation • RPC uses a special encoding system call External Data Representation (XDR) for encoding the data values in RPC fields • Allows heterogeneous systems to communicate using a common data representation • Field structure including bit and byte order specified for each field in the RPC messages • As an example, in XDR all integers are encoded as four byte values • XDR specifications published in RFC 4506

The Remote Procedure Call (RPC) The RPC Port Mapper (‘411’ for RPC) • RPC server programs use ephemeral ports instead of well-known ports, so some kind of ‘registrar’ is necessary for client stub processes to determine how to communicate with server processes • The port mapper is an RPC server process all other RPC server processes on that particular server registers with when they initialize • The port mapper listens for client calls on well-known port 111 (both UDP and TCP)

The Technical Details of NFS General NFS Technical information • NFS uses a subset of defined RPCs to provide distributed file access (RPC can do much more than that) • Files and filesystems are referenced through the use of file ‘handles’ – objects or structures used to represent the files or filesystems • NFS servers are stateless (version 3 and earlier); servers do not keep track of which clients are accessing its resources or which resources are being accessed • Version 4 introduces file locking, which introduces stateful operation and requires more complex operational procedures

The Technical Details of NFS An Example of NFS Operation

The Technical Details of NFS NFS Procedures • NFS Version 4 has 39 procedures for file & directory manipulation and access as well as an extensible security framework • Built on RPCSEC_GSS (RPC 2203) • Provides integrity, privacy, and authentication • Allows for negotiation of security options • NFS Version 3 has 20 procedures for file/directory operations • Important NFS procedures (common to v3 and v4) • OPEN & CLOSE: compound commands for file manipulation • ACCESS – check file access permissions • READ & WRITE – reads and writes to/from files; options for either synchronous or asynchronous writes • GETATTR & SETATTR – get and set attributes on a file • MKDIR & RMDIR – make directory & delete directory

The Technical Details of NFS Mounting Drives • In order for a client to access remote files via NFS it must use the NFS mount procedure • The mount procedure returns a ‘handle’ which the client uses to reference the filesystem • The client then uses the handle to integrate the remote filesystem into its local filesystem structure (analogous to mapping a networked drive) • As with other NFS procedures the port mapper must be called first to find the UDP or TCP port being used on the NFS server • Note: this procedure works differently ‘under the hood’ in v4, as no portmapper is used and the mount protocol has been incorporated into the core NFS specification

The Technical Details of NFS NFS over TCP • Originally NFS was strictly a LAN technology rarely implemented over wide area connections, so UDP was used to provide efficient throughput • With the globalization of companies NFS is being used more often over wide geographic areas • Using TCP provides NFS with more robust implementations over WAN connections • The default transport for NFS version 4 is TCP

The Technical Details of NFS An NFS Example – Reading the file testplan.txt from mountpoint/smiley/testplan.exe

Directory Services

Directory Services • Definition • A means of searching a database of information easily & rapidly using keywords to match attributes stored in the database • With the explosion of networking and the Internet very important for finding people, information, & services • Directory services must be: • Easy to use • Contain correct data • Provide value • Directory services are categorized as White or Yellow Pages • White Pages: Directory Services oriented toward information associated with individuals • Yellow Pages: Directory Services oriented toward finding resources (networked printers, servers, etc) or services

Directory Service Protocols There are 4 Major Directory Service Protocols used in the Internet: • DNS, • NIS, • X.500, and • LDAP • And yes, NDS, WINS, and AD could be included too… • The Domain Name Service • Yes, this can be considered a very limited Directory Service! • Can be adapted for other uses such as spam blacklists • Provides only name resolution services but some information can be gleaned from the name structure • Has almost global accessibility • Already covered earlier in exhaustive detail

Directory Service Protocols • The Network Information Service (NIS) • NIS was developed by Sun Microsystems as a companion to NFS to ease the burden on administrators of distributed computing systems • Uses a client-server architecture for distribution of Yellow Pages information • NIS allows the use of a single administrative ‘repository’ for important network-wide information such as password files. • Enables the use of a single username & password across multiple systems • Also includes information search and retrieval commands that can be used in the development of non-administrative distributed directories and databases • Sun is trying to migrate from NIS to the LDAP-enabled Sun Directory Services (SDS); there’s a large installed base of users so this will likely take a while

Directory Service Protocols • Technical Details of NIS • Like NFS, NIS is built upon the Remote Procedure Call (RPC) and NIS procedures operate very much like NFS procedures • Built around map files, which are the data or directory information repositories and maps, which are unique views or indexes for the data files • Map files are stored in individual directories on the server (example: the password file is typically stored on a server at /var/yp/passwd • Servers come in two varieties: master and slave • The master server is the single administration point for the map files; analogous to a primary DNS server • There is only one master NIS server for a set of map files • Slaves read all their map file information from the Master • Clients can be serviced equally well by either a Master or a Slave

Directory Service Protocols • Technical Details of NIS (continued) • Organizations that want multiple access policies for directory or distributed database information can segregate NIS servers into Domains • A domain defined as a set of servers that access and distribute information from a unique set of maps • Each domain has a single Master server and as many slaves as necessary to adequately service client requests • Domains can overlap (an NIS server could be a Master for multiple domains) and clients can easily switch between domains • NIS Database Operations come in two flavors: distributed filesystems and explicit commands • With the distributed filesystem mode of operation NIS clients read data for administrative functions from the NIS server instead of a local file • This mode is used for such functions as password or license files – it allows a single file administered from the Master to be used across multiple NIS client systems

Directory Service Protocols • Technical Details of NIS (continued) • NIS clients can also be set up so they will check their local file first and if a match is not found do an NIS lookup • NIS services (maps) may also be accessed using explicit commands • This is very useful for building distributed database or directory applications that have nothing to do with system administration (i.e. – systemwide personnel directory, etc.) • The most important NIS user command is ypmatch; this allows the search of a map using a keyword • Three groups of internal NIS procedure calls perform the work for either mode of operation • Client-Lookups: key-driven procedures (match, get-first, get-next, get-all) • Maintenance-calls: checking server names & status (get-master, get-order) • Internal NIS calls: commands between servers such as a map transfer from Master to Slave

Directory Service Protocols • The X.500 Directory Service (aka DAP) • ITU standard for storing, accessing & distributing directory information • What X.500 provides: • A standards-based directory • A structured information framework • A single global namespace • Powerful search capabilities • Decentralized maintenance • Encompasses seven recommendations (X.501, X.509, X.511, X.518, X.519, X.520, and X.521) besides X.500 – the standards cover: • Directory Service Quality: access rules, authentication, filters, etc. • Directory Queries: read, compare, list, search operations, etc. • Directory Modification: add, rename, modify operations, etc. • Error Reporting: definition of of error conditions and responses • Referrals: relationships between Directory servers & other external objects • Originally designed for OSI protocols but later modified for TCP/IP

Directory Service Protocols • X.500 Technology • X.500 Directory Structure and Terminology • Information is held in a Directory Information Base (DIB) • A DIB consists of individual entries organized in a tree structure called the Directory Information Tree (DIT) • Can also have relational characteristics • Each entry is composed of attributes and has a Distinguished Name (DN) that uniquely identifies it (User friendly Distinguished Names are very important to X.500) • Each attribute composed of a type and one or more associated values • The type specifies a particular syntax and data type for the value (boolean, integer, etc.) • DIT entries have some object-oriented characteristics like inheritance • The DIT also has a schema; in X.500 the schema is a rule-set that ensures the DIT maintains its logical structure during modifications

Directory Service Protocols • An X.500 DIT example

Directory Service Protocols • X.500 Technology • The X.500 communication protocols • X.500 client (Directory User Agent or DUA) communicates with an X.500 server (Directory Server Agent or DSA) using the Directory Access Protocol (DAP) • The X.500 DAP uses the standard OSI presentation layer Remote Operations Service Element (ROSE) and Association Control Service Element (ASCE) for communications • RFC 1249 defines a standard for interface the X.500 service to the UDP and TCP transport layers for use over the Internet • OSI networking is ridiculously complex and X.500 over TCP/IP isn’t much better so that’s all we’re going to cover on this subject

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • Even with the adaptation of X.500 to TCP/IP very few organizations actually adopted it • Required too much computing and OSI engineering expertise • Most implementations were kludgey and complex • Engineers & researchers at the University of Michigan decided a completely new protocol for accessing X.500 services from Internet clients would hasten deployment of global directory services • The Lightweight Directory Access Protocol (LDAP) was originally designed to be a ‘gateway’ service; an X.500 ‘back-end’ was still necessary • Eventually the designers saw the value of LDAP as a completely independent directory services access protocol • LDAP version 3 specifications published as RFC 2252 • Recent update in RFC 4510 – provide new features & protocol extensibility

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • What the current version of LDAP provides: • Centralized administrative tasks within an organization • Storage of sensitive information in a centrally secure repository • Authentication and security services • Centralized procedure for determining the status and role of an individual in an organization • Standardizes the search and retrieval of white & yellow page data • Multi-platform, multi-vendor open standard with published APIs for software development • URL format specifications for accessing LDAP data via browsers • Gateway services to other directory services like NIS and NDS • Vendors with current LDAP enabled products: Redhat, Innosoft, OpenLDAP, Sun, Microsoft, and Novell (list not exhaustive)

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • LDAP Technology • LDAP standard addresses 3 areas: API, data format, and access protocol • There are several APIs published and in use: • The base API developed by the U. of Michigan published as RFC 1823 • Other C and Perl based APIs readily available • LDAP data format • LDAP has the same object format and definitions as X.500 • Directory Information Base (DIB) & Directory Information Tree (DIT) • Entries and Attributes • LDAP DIB structure includes both hierarchal & relational structure • Four LDAP object types are currently defined: Person, Organizational Unit, Group, and Domain

Directory Service Protocols Example LDAP directory structure

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • LDAP access protocol • Conforms to a client-server architecture (clients performing operations against servers) • Does not define how data is stored in server; just how to access it in a standard manner • Uses TCP; client can issue multiple queries over a TCP connection • Basic LDAP Operations • Binding an LDAP client to a server • Required operation before a client can search information on a LDAP server • Authentication and access controls are parts of this operation • Unbinding and rebinding can occur over a single TCP connection allowing a client to change access privileges

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • Basic LDAP Operations • Server search • Single or multiple keyword • Wildcards are allowed • Compare entries • Add entry or entries • Modify existing entry or entries • Delete existing entry or entries

Directory Service Protocols • The Lightweight Directory Access Protocol (LDAP) • Other LDAP Operations • Referral – allows one server to pass searches to another server • Replication – allows multiple servers to handle requests for a DIT; enhances availability via load balancing and redundancy • Encryption & Security – supports Kerberos, SASL, and Secure Sockets Layer (SSL) • DIT discovery – allows a client to query a server for its structure and entity relationships

Introduction to the World Wide Web

Introduction to the World Wide Web Goals • Developed to provide an easy way to distribute info to members of a geographically dispersed group • Especially well-suited for multimedia information • Meant to allow software, hardware, & system independent information sharing! • Built on the concept of hypertext and hypermedia – cross-linked groups of documents or objects • Plenty of good Web related information available at http://www.w3c.org

Introduction to the World Wide Web History • The WWW concept was developed in 1989 by Tim Berners-Lee at CERN • He wanted a way to easy distribute high energy physics data to researchers located around the world • Concept was developed and culminated in the release of the first WWW browser in 1993 (Mosaic) • With the incorporation of Netscape commercial development of the WWW gained momentum • WWW is actually based on an earlier text-based system called gopher that was developed in the mid-80s

Introduction to the World Wide Web General Operation & Concepts • WWW has client-server architecture, with browsers (clients) pulling data from Web servers • Browsers request information (web pages) using a unique 'address' for each page called a Uniform Resource Locator (URL) • Servers deliver either an error indication or the requested web page information to browser • The WWW uses TCP for transport (i.e. one-to-one communications) • WWW is mostly a 'pull' technology – only client-requested info is sent • Web pages are linked together in a hierarchical 'mesh' by embedding URLs in web pages – hyperlinks • Caching is an integral part of the WWW & can take place in many locations • Server-side in web farms (load balancers & caching appliances) • In the network at firewalls and proxy servers • Client-side in web browsers

World Wide Web Addressing There are three basic parts to the WWW: Addressing, Transport, & Presentation Addressing (URLs) • One of most important aspects of the WWW is it’s global addressing scheme called the Uniform Resource Locator (URL) • URLs provide a unique ‘address’ allowing direct access to web pages • URLs consist of three parts: protocol, host name, and document name • The URL Protocol part • Web browsers can handle multiple protocols to provide backwards compatibility & flexibility • Depending on the browser these additional protocols can be handled internally in the browser or through the use of 'helper' applications • Additional protocols handled: telnet, ftp, gopher, news (NNTP), mail (SMTP), and local file access

World Wide Web Addressing • The URL Host Name part • Second part is the DNS name or IP address of the server where the web page resides • Use of IP addresses is valid and works but is discouraged • If no TCP port number is specified with the name or address, port 80 is implied • If you wish to run web services on another port the port is specified after the DNS name (example: http://www.noname.org:7777/admin/system.html) • The URL document name • Final part of URL specifies the location of the web page on the server • Usually consists of directory path followed by web page filename • Path is typically relative to a default directory on the web server (operating system dependent) • Path structure and filenames are operating system specific

Transport: The Hypertext Transfer Protocol (HTTP) Introduction • The Hypertext Transfer Protocol is the application layer protocol responsible for delivery of WWW data between browsers & servers • Current HTTP release is version 1.1 (RFC 2616) though a significant number of systems run the first standard 'production' release (v1.0) • HTTP is a very simple text based request-response protocol like SMTP; MIME like header fields and encoding conventions are used • RFC-822/2822 format headers and MIME extensions are used to define the web page content, negotiate page parameters, & support conditional requests • HTTP uses TCP for reliable transport layer services • HTTP v1.0 works in a stop and wait fashion - only one HTTP command can be outstanding & requires a separate TCP connection for each object • For each object (file) on a page a separate TCP connection is established • HTTP v1.1 allows persistent connections and ‘pipelining’

Transport: The Hypertext Transfer Protocol (HTTP) • Important HTTP Characteristics • Application Level • Request/Response • Stateless • Bi-directional Transfer • Capability Negotiation • Support for Caching • Support for Intermediaries/Proxies

Transport: The Hypertext Transfer Protocol (HTTP) • Important HTTP commands • GET: transfers a file from server to browser • HEAD: transfers the HTML header information for a web page from server to browser • PUT: uploads a file from browser to server (used a lot with web-based mailers) • POST: append information to a URL (used with forms) • LINK: create a link between files on a web server • UNLINK: delete a link between files on a web server • DELETE: delete a file on the server • ALL of these commands are subject to access restrictions enforced by both the web server software and the server operating system!

Transport: The Hypertext Transfer Protocol (HTTP) HTTP in action

Transport: The Hypertext Transfer Protocol (HTTP) • HTTP v1.1 Technical Details • Persistent TCP Connections • Allow multiple HTTP commands (and responses) to be sent over a single TCP connection; can greatly increase response times and interactivity • Reduces load opening & closing TCP connections • Fall back to v1.0 if necessary • Application-layer compression • Does not resend HTTP headers on sequential requests to the same server • HTTP Command Pipelining • Multiple HTTP commands can be issued without a response • The server must respond to each request in the order it was received • Expanded and improved directives for more intelligent Web caching • Enhanced support for Web content lifetimes and expiration dates • Validation algorithms & commands to allow a cache to ‘check’ on web page expiration • Better support for transfer of dynamically generated web pages • ‘Chunked’ transfers; files sent as a sequential stream of chunks with a defined size • HTTP 1.1 has explicit header fields for defining ‘chunk’ mode transfers and expressing the size of the chunks

Presentation: The Hypertext Markup Language (HTML) Introduction • HTML is concerned with the structure and formatting of web pages • HTML was developed from an ISO standard called SGML; a ‘metalanguage’ developed in 1974 (you may ask why SGML isn’t used for the WWW instead of writing a new language – it’s because SGML is very general and terribly complex) • Current version is 4.01 (December 1999) • HTML is designed to be backwards compatible to previous versions • It was the original objective of HTML to allow the browser to control formatting & presentation of page • This was instituted to allow better data sharing in a heterogeneous environment • Content providers want more control over the presentation of their information so the original object of browser control over presentation is being gradually eroded in each new release of HTML

Presentation: The Hypertext Markup Language (HTML) HTML page structure • An HTML page consists of ASCII text and is divided into two main parts: a header and a body • The header contains information about the document • The body contains the actual information to display • HTML depends on embedded commands in the web page called TAGS • Tags define both a command action and a scope • Tags are enclosed in start and end symbols (‘<’ and ‘>’) to distinguish them from normal text that is for client display • Tags, depending on their specific actions, can have optional attributes (size, align, action, etc.)

File Access Services, Directory Services, & The World Wide Web (WWW)