1 / 42

Fast Communication

Fast Communication. Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy. Why Remote Procedure Call?. Simplify building distributed systems and applications Looks like local procedure call Transparent to user Balance between semantics and efficiency

morgan
Download Presentation

Fast Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Communication • Firefly RPC • Lightweight RPC • CS 614 • Tuesday March 13, 2001 • Jeff Hoy

  2. Why Remote Procedure Call? • Simplify building distributed systems and applications • Looks like local procedure call • Transparent to user • Balance between semantics and efficiency • Universal programming tool • Secure inter-process communication

  3. RPC Model Client Application Server Application Return Client Stub Server Stub Client Runtime Server Runtime Network Call

  4. RPC In Modern Computing • CORBA and Internet Inter-ORB Protocol (IIOP) • Each CORBA server object exposes a set of methods • DCOM and Object RPC • Built on top of RPC • Java and Java Remote Method Protocol (JRMP) • Interface exposes a set of methods • XML-RPC, SOAP • RPC over HTTP and XML

  5. Goals • Firefly RPC • Inter-machine Communication • Maintain Security and Functionality • Speed • Lightweight RPC • Intra-machine Communication • Maintain Security and Functionality • Speed

  6. Firefly RPC • Hardware • DEC Firefly multiprocessor • 1 to 5 MicroVAX CPUs per node • Concurrency considerations • 10 megabit Ethernet • Takes advantage of 5 CPUs

  7. Fast Path in a RPC • Transport Mechanisms • IP / UDP • DECNet byte stream • Shared Memory (intra-machine only) • Determined at bind time • Inside transport procedures “Starter”, “Transporter”, “Ender”, and “Receiver” for the server

  8. Caller Stub • Gets control from calling program • Calls “Starter” for packet buffer • Copies arguments into the buffer • Calls “Transporter” and waits for reply • Copies result data onto caller’s result variables • Calls “Ender” and frees result packet

  9. Server Stub • Receives incoming packet • Copies data into stack, a new data block, or left in the packet • Calls server procedure • Copies result into the call packet and transmit

  10. Transport Mechanism • “Transporter” procedure • Completes RPC header • Calls “Sender” to complete UDP, IP, and Ethernet headers (Ethernet is the chosen means of communication) • Invoke Ethernet driver via kernel trap and queue the packet

  11. Transport Mechanism • “Receiver” procedure • Server thread awakens in “Receiver” • “Receiver” calls the stub interface included in the received packet, and the interface stub calls the procedure stub • Reply is similar

  12. Threading • Client Application creates RPC thread • Server Application creates call thread • Threads operate in server application’s address space • No need to spawn entire process • Threads need to consider locking resources

  13. Threading

  14. Performance Enchancements • Over traditional RPC • Stubs marshal arguments rather than library functions handling arguments • RPC procedures called through procedure variables rather than by lookup table • Server retains call packet for results • Buffers reside in shared memory • Sacrifices abstract structure

  15. Performance Analysis • Null() Procedure • No arguments or return value • Measures base latency of RPC mechanism • Multi-threaded caller and server

  16. Time for 10,000 RPCs • Base latency – 2.66ms • MaxResult latency (1500 bytes) – 6.35ms

  17. Send and Receive Latency

  18. Send and Receive Latency • With larger packets, transmission time dominates • Overhead becomes less of an issue • Good for Firefly RPC, assuming large transmission over network • Is overhead acceptable for intra-machine communication?

  19. Stub Latency • Significant overhead for small packets

  20. Fewer Processors • Seconds for 1,000 Null() calls

  21. Fewer Processors • Why the slowdown with one processor? • Fast path can be followed only in multiprocessor environment • Lock conflicts, scheduling problems • Why little speedup past two processors?

  22. Future Improvements • Hardware • Faster network will help larger packets • Triple CPU speed will reduce Null() time by 52% and MaxResult by 36% • Software • Omit IP and UDP headers for Ethernet datagrams, 2~4% gain • Redesign RPC protocol ~ 5% gain • Busy thread wait, 10~15% gain • Write more in assembler, 5~10% gain

  23. Other Improvements • Firefly RPC handles intra-machine communication through the same mechanisms as inter-machine communication • Firefly RPC also has very high overhead for small packets • Does this matter?

  24. RPC Size Distribution • Majority of RPC transfers under 200 bytes

  25. Frequency of Remote Activity • Most calls are to the same machine

  26. Traditional RPC • Most calls are small messages that take place between domains of the same machine • Traditional RPC contains unnecessary overhead, like • Scheduling • Copying • Access validation

  27. Lightweight RPC (LRPC) • Also written for the DEC Firefly system • Mechanism for communication between different protection domains on the same system • Significant performance improvements over traditional RPC

  28. Overhead Analysis • Theoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to return • Theoretical minimum on Firefly RPC: 109 us. • Actual cost: 464us

  29. Sources of Overhead • 355us added • Stub overhead • Message buffer overhead • Not so much in Firefly RPC • Message transfer and flow control • Scheduling and abstract threads • Context Switch

  30. Implementation of LRPC • Similar to RPC • Call to server is done through kernel trap • Kernel validates the caller • Servers export interfaces • Clients bind to server interfaces before making a call

  31. Binding • Servers export interfaces through a clerk • The clerk registers the interface • Clients bind to the interface through a call to the kernel • Server replies with an entry address and size of its A-stack • Client gets a Binding Object from the kernel

  32. Calling • Each procedure is represented by a stub • Client makes a call through the stub • Manages A-stacks • Traps to the kernel • Kernel switches context to the server • Server returns by its own stub • No verification needed

  33. Stub Generation • Procedure representation • Call stub for client • Entry stub for server • LRPC merges protocol layers • Stub generator creates run-time stubs in assembly language • Portability sacrificed for Performance • Falls back on Modula2+ for complex calls

  34. Multiple Processors • LRPC caches domains on idle processors • Kernel checks for an idling processor in the server domain • If a processor is found, caller thread can execute on the idle processor without switching context

  35. Argument Copying • Traditional RPC copies arguments four times for intra-machine calls • Client stub to RPC message to kernel’s message to server’s message to server’s stack • In many cases, LRPC needs to copy the arguments only once • Client stub to A-stack

  36. Performance Analysis • LRPC is roughly three times faster than traditional RPC • Null() LRPC cost: 157us, close to the 109us theoretical minimum • Additional overhead from stub generation and kernel execution

  37. Single-Processor Null() LRPC

  38. Performance Comparison • LRPC versus traditional RPC (in us)

  39. Multiprocessor Speedup

  40. Inter-machine Communication • LRPC is best for messages between domains on the on the same machine • The first instruction of the LRPC stub checks if the call is cross-machine • If so, stub branches to conventional RPC • Larger messages are handled well, LRPC scales by packet size linearly like traditional RPC

  41. Cost • LRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocols • Abstraction is sacrificed for functionality • RPC is built into operating systems (Linux DCE RPC, MS RPC)

  42. Conclusion • Firefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough? • “The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate” (1990) • Is speed still an issue?

More Related