1 / 23

6.894: Distributed Operating System Engineering

6.894: Distributed Operating System Engineering. Lecturers: Frans Kaashoek ( kaashoek@mit.edu ) Robert Morris ( rtm@lcs.mit.edu ) TA: Jinyang Li ( jinyang@lcs.mit.edu ) www.pdos.lcs.mit.edu/6.894. Operating System. Software that turns silicon into something useful

pbauer
Download Presentation

6.894: Distributed Operating System Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek (kaashoek@mit.edu) Robert Morris (rtm@lcs.mit.edu) TA: Jinyang Li (jinyang@lcs.mit.edu) www.pdos.lcs.mit.edu/6.894

  2. Operating System • Software that turns silicon into something useful • Provides applications with a programming interface • Manages hardware resources on behalf of applications

  3. Distributed Operating System • The holy grail: transparency • provide applications with a virtual machine consisting of many processors distributed around the network. • Distributed OS engineering is difficult: • Failures • High-degree of concurrency • Long latencies • New classes of security attacks

  4. Client/Server Architecture • A modular architecture to structure distributed systems • Clients request services from servers • Client and servers communicate with messages • Servers are typically trusted • Other architectures • Peer-to-peer (decentralized) • Single address space

  5. 6.894 topics • Client-server components • Remote procedure call, threads, address spaces, etc. • Storage • File systems, transactions • Security • Confidentiality, authentication, etc. • Scalable servers

  6. 6.894 is an advanced 6.033 • Perform actual systems research • Perform a research project • Study recent research papers • Design systems for real workloads • New abstractions, protocols, datastructures, algorithms, etc. • Build a real system (lab) • Real enough that you can use it

  7. Internet video-on-demand server • Example to study issues and overview 6.894 • Requirements: • Low and high-quality video • Many users, spread around the Internet • Last mile bandwidth may be low • Access control

  8. Client() { fd = connect(“server”); write (fd, “video.mpg”); while (!eof(fd)) { read (fd, buf); display (buf); } } Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} Client and server structure

  9. Performance “analysis” • Server capacity: • Network (100 Mbit/s) • Disk (20 Mbyte/s) • Obtained performance: one client stream • Server is limited by software structure • If a video is 200 Kbit/s, server should be able to support more than one client.

  10. Better single-server performance • Goal: run at server’s hardware speed • Disk or network should be bottleneck • Method: • Pipeline blocks of each request • Multiplex requests from multiple clients • Two implementation approaches: • Multithreaded server • Asynchronous I/O

  11. server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} for (i = 0; i < 10; i++) fork (server); Multithreaded server • When waiting for I/O, thread scheduler runs another thread • All shared data must protected by locks • Release locks when blocking

  12. struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; } main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); } } } Asynchronous I/O • Code is structured as a collection of handlers • Handlers are nonblocking • Create new handlers for blocking operations • When operation completes, call handler

  13. init() { on_accept(accept_cb); } accept_cb() { on_readable(cfd,name_cb); } on_readable(fd, fn) { c = new callback(test_readable, fn, fd); add c to callback list; } name_cb(cfd) { read(cfd,name); fd = open(name); on_readable(fd, read_cb); } read_cb(cfd, fd) { read(fd, block); on_writeeable(fd, write_cb); } write_cb(cfd, fd) { write(cfd, block); on_readable(fd, read_cb); } Asychronous server

  14. Hard to program Locking code Need to know what blocks Coordination explicit State stored on thread’s stack Memory allocation implicit Context switch may be expensive Multiprocessors Hard to program Callback code Need to know what blocks Coordination implicit State passed around explicitly Memory allocation explicit Lightweight context switch Uniprocessors Multithreaded vs. Async

  15. Threaded server: Thread for network interface Interrupt wakes up network thread Protected (locks and conditional variables) shared buffer shared between server threads and network thread Asynchronous I/O Poll for packets How often to poll? Or, interrupt generates an event Be careful: disable interrupts when manipulating callback queue. Coordination example

  16. Scheduling: polling vs. interrupts • Maintain peak performance under heavy load • Interrupts model can lead to livelock • Solution: • Use interrupts under low load (good latency) • Use polling under heavy load (good throughput) • Polling is typically more efficient than interrupts • Fits naturally into asynchronous I/O model

  17. Other design issues • Disk scheduling • Elevator algorithm • Memory management • File system buffer cache • Address spaces (VM management) • Fault isolate different servers • Efficient local communication? • Efficient transfers between disk and networks • Avoid copies

  18. More than one processor • Problem: single machine may not scale to enough clients • Solutions: • Multiprocessors • Helps when CPU is bottleneck • Server clusters • Helps when bandwidth between server and backbone is high • Distributed server clusters • Helps when bandwidth between client and distant server is low

  19. Clusters • Naming transparency • Server cluster transparent to client? • Server selection • Metrics: CPU load, presence of data • Consistency • Partition data • Availability • More processors can decrease reliability • Replicate data (makes consistency more difficult)

  20. Distributed clusters • Replication policies • Data distribution • Consistency • Network monitoring and modeling • Global load balancing Tradeoff between accuracy, latency, and network load

  21. Making it secure: access control • Redo design: don’t add on • Firewalls: insecure and break many things • CPU cycles is an issue • A secure HTTP server can do about 10-20 connections a second • Pulls in other global issues • Name to key binding • Key management infrastructure

  22. Example summary • Pipelining of disk and network requests • Need a lot of sophisticated software infrastructure • Replication for reliability and performance • Need sophisticated protocols • Difficult: We did it for one application • What if data changes rapidly? • Lack of abstractions!

  23. 6.894 lab: real systems • Multi-finger (due next week) • Asynchronous I/O • HTTP proxy • High-performance proxy • Cache, consistency, etc. • Open-ended file system project • Research

More Related