1 / 30

File Server Performance

File Server Performance. AFS vs YFS. Accepted AFS Limitations A lter Deployments. U se large numbers of small file servers Use many small partitions per file server Restrict the number of processors to 1 or 2 Limit the network bandwidth to 1gbit Avoid workloads requiring:

roman
Download Presentation

File Server Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Server Performance AFS vs YFS

  2. Accepted AFS LimitationsAlter Deployments • Use large numbers of small file servers • Use many small partitions per file server • Restrict the number of processors to 1 or 2 • Limit the network bandwidth to 1gbit • Avoid workloads requiring: • Multiple clients creating / removing entries in a single directory • Multiple clients writing to or reading from a single file • More clients than file server worker threads accessing a single volume • Applications requiring features that AFS does not offer: • Byte range locking, ext. attributes, per file ACLs, etc

  3. Instead of fixing the core problems organizations have … • Deployed isolation file servers and complex monitoring to detect hot volumes and quarantine them • Developed complex workarounds including vicep-access, OSD, and OOB • Segregated RW and RO access into separate cells and constructed their own volume management systems to “vos release” volumes from RW cell to RO cells • Used the AFS name space for some tasks and other “high performance” file systems for others • NFS3, NFS4, Lustre, GPFS, Panasys, others

  4. At what cost? • Additional servers cost money • US$6800 per year according to Cornell University • Including hardware depreciation, support contracts, maintenance, power and cooling, staff time • Increased complexity for end users • Multiple backup strategies

  5. The YFS Premise Maintain the data and the name space Fix the performance problems Enhance the functionality to match Apple/Microsoft first class file systems Improve Security Save money

  6. Talk Outline What are the bottlenecks in AFS and why do they exist? What can be done to maximize the performance of an AFS file server? How scalable is a YFS file server?

  7. AFS RX • File Server Throughput is bound by the amount of data the listener thread can read from the network during any time period • As Simon Wilkinson likes to say: • “There are only two things wrong with AFS RX, the protocol and the implementation.”

  8. AFS RX: The Protocol Issues • Incorrect Round Trip Time calculations • Incorrect Retransmission Timeout implementation • Window size vs Congested Networks • Broken window management makes congested networks worse • Soft ACKs and Hard ACKs • Twice as many ACKs as necessary

  9. AFS RX: Implementation Issues • Lock Contention • 20% of runtime spent waiting for locks • UDP Context Switching • Every packet processed on a different CPU • Cache line invalidation

  10. Simon’s RX Performance Talk • To see the full details, see • http://tinyurl.com/p8c8yqs

  11. The legacy of LWP • Light weight processes (LWP) is a cooperative threading model that was used for the original AFS implementation • Only one thread can execute at a time • Threads yield voluntarily or when blocking for I/O • Data access is implicitly protected by single execution • All lock state changes are atomic when a thread yields. In other words: • Acquire + Release + Yield == Never AcquireAcquire A + Acquire B == Acquire B + Acquire A

  12. The pthreads conversion When converting a cooperative threaded application to pthreads, it is faster to add global locks to protect data structures that are accessed across I/O than to redesign the data structures and the work flow AFS 3.4 added pthread file servers by adding a minimum number of global locks to each package AFS 3.6 added finer grained but still global locks

  13. The many locks • AFS file servers must acquire many mutexes during the processing of each RPC (* = global) • RX • peer_hash*, conn_hash*, peer, conn_call, conn_data, stats*, free_packet_queue*, free_call_queue*, event_queue*, and more • viced • H* [host table, callbacks] • FS* [stats] • VOL* [volume metadata] • VNODE [file/dir]

  14. Lock Contention • Threads are scheduled to a processor and must give up their time slice whenever a required lock is unavailable • When there are multiple processors, threads are scheduled to a processor. • Any data not in the processor cache or that has been invalidated, must be fetched. Locks are represented as data in memory whose state changes when acquired and released. • Two side effects of global locks: • Only one thread at a time can make progress • Multiple processor cores hurt performance

  15. AFS Cache Coherency via Callbacks An AFS file server promises its client that for a fixed period of time it notify the client if the metadata or data state of an accessed object changes For read write volumes, one callback promise per file object For read only volumes, one callback promise per volume regardless of how many file objects are accessed Today, many file servers are deployed with callback tables containing millions of entries

  16. Host Table Contention A host table and hash tables for looking up host entries by IP address and UUID are protected by a single global lock. Host entries have their own locks. To avoid hard deadlocks, locking an entry requires dropping the global lock, obtaining the entry lock, and obtaining the global lock. Soft deadlocks occur when multiple threads are blocked on the entry lock but the thread holding it is blocked waiting for the global lock. Lock contention occurs multiple times for each new rx connection and each time a call is scheduled.

  17. Callback Table Contention The Callback Table is protected by the same global lock as the Host Table Each new/updated callback promise requires exclusive access to the table Notifying registered clients of state changes (breaking callbacks) requires exclusive access Garbage collection of expired callbacks (5 minute intervals) requires exclusive access Callback Table Limit exceeded requires exclusive access for immediate garbage collection and premature callback notification

  18. Impact of Host and Callback Table Contention The larger the callback table the longer exclusive access is maintained for garbage collection and callback breaks While exclusive access is maintained, no calls can be scheduled nor can existing calls be completed

  19. AFS Worker Thread Pool Increasing the worker thread pool permits additional calls to be scheduled instead of blocking in the rx wait queue Primary benefit of scheduling is that locks provide a filtering mechanism to decide which calls can make progress. Calls on the rx wait queue can never make progress of thread pool is exhausted Downside of increased thread pool size is increased lock contention and more CPU time wasted on thread scheduling

  20. Worker Thread Pool • Start with “large” configuration • -L • Make thread pool as large as possible • For 1.4, -p 128 • For 1.6, -p 256 • Set directory buffer size to twice the thread count • -b 512

  21. Volume and Vnode Caches • Volume Cache larger than total volume count • -vc <number of volumes plus some> • Small vnode cache (files) • -s <10 x volume count> • Large vnode cache (directories) • -l <3 x volume count> • If volumes are very large, may require higher multiples

  22. Callback Tables and Thrashing • The callback table must be large enough to avoid thrashing • -cb <volume-count * 13 * vnode-count> • Where that value *72 bytes should not exceed 10% of machine physical memory • Use “xstat_fs_test's-collId3–once” to monitor “GetSomeSpaces” value. If non-zero, increase –cb value

  23. UDP Tuning • UDP Receive Buffer • Must be large enough to receive all packets for in process calls. • <thread-count * winsize (32) * packet-size> • -udpsize 16777216 • Won’t take effect unless OS is configured to match • UDP Send Buffer • -sendsize 2097152 • (2^21) unless client chunk size is larger

  24. Mount vicep* with noatime AFS protocol does not expose the last access time to clients Nor does the AFS file server make use of it Turn off last access time updates to avoid large amounts of unnecessary disk i/o unrelated to serving the needs of clients

  25. Syncing data to disk • Syncing data to disk is very expensive. If you trust your UPS and have a good battery backup caching storage adapter we recommend reducing the frequency of sync operations. • For 1.6.5, new option • -sync onclose

  26. YFS File Servers Scale Far Beyond AFS • YFS File Server experience much less contention between threads • RPCs take less time to complete • Store operations do not block simultanenous Fetch requests • One YFS File Server can replace at least 30 AFS file servers • Max in-flight RPCs per AFS server = 240 • Max in-flight RPCs per YFS server = 16,000 (dynamic) • 240 * 30 = 7,200

  27. How Fast can RX/UDP go? Up to 8.2 gbits/second per listener thread

  28. SLAC Testing • SLAC has experienced file server meltdowns for years. Large number of file servers deployed to permit distribution of load isolation of volume accesses by users. • One YFS file server satisfied 500 client nodes for nearly 24 hours without noticeable delays • 1gbit NIC, 8 processor cores, 6gbit/sec local raid disk • 800 operations per second • 55MB/sec FetchData • 5MB/sec StoreData

  29. Other Benefits • 2038 Safe • 100ns time • 2^64 volumes • 2^96 vnodes / volume • 2^64 max quota/vol/part size • Per File ACLs • Volume Security Policies • Max ACL / Wire Privacy • Servers do not run as “root” • Linux O_DIRECT • Mandatory Locking • IPv6 network stack

  30. Security, Security, Security • RXGK • GSS-API Authentication • AES-256/SHA-1 wire privacy • File server wire security policies • File servers cannot serve volumes with stronger required policies • Combined Identity Tokens • Keyed cache managers / Machine IDs • Maximum Volume ACL prevents data leaks

More Related