ethan kao cs 6410 oct 18 th 2011 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Networking PowerPoint Presentation
Download Presentation
Networking

Loading in 2 Seconds...

play fullscreen
1 / 37

Networking - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Ethan Kao CS 6410 Oct. 18 th 2011. Networking. Papers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Networking' - annice


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ethan kao cs 6410 oct 18 th 2011
Ethan Kao

CS 6410

Oct. 18th 2011

Networking

papers
Papers
  • Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.
  • U-Net: A User-Level Network Interface for Parallel and Distributed  Computing, Von Eicken, Basu, Buch and Werner Vogels. 15th SOSP, December 1995.
parallel vs distributed systems
Parallel vs. Distributed Systems
  • Parallel System:
  • Multiple processors – one machine
  • Shared Memory
  • Supercomputing

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

parallel vs distributed systems1
Parallel vs. DistributedSystems

Distributed System:

  • Multiple machines linked together
  • Distributed memory
  • Cloud computing

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

challenges
Challenges
  • How to efficiently communicate?

Between processors Between machines

  • Active Messages U-Net

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

active messages authors
Active Messages: Authors
  • Thorsten von Eicken
    • Berkeley Ph.D. -> Assistant professor at Cornell -> UCSB
    • Founded RightScale, Chief Architect at Expertcity.com
  • David E. Culler
    • Professor at Berkeley
  • Seth Copen Goldstein
    • Berkeley Ph.D. -> Associate professor at CMU
  • Klaus Erik Schauser
    • Berkeley Ph.D. -> Associate professor at UCSB
active messages motivation
Active Messages: Motivation
  • Existing message passing multiprocessors had high communication costs
  • Message passing machines made inefficient use of underlying hardware capabilities
    • nCUBE/2
    • CM-5
    • Thousands of nodes interconnected
  • Poor overlap between computation and communication
active messages goals
Active Messages: Goals
  • Improve overlap between computation & communication
  • Aim for 100% utilization of resources
  • Low start-up costs for network usage
active messages takeaways
Active Messages: Takeaways
  • Asynchronous communication
  • Minimal buffering
  • Handler interface
  • Weaknesses:
    • Address of the message handler must be known
    • Design needs to be hardware specific?
active messages design
Active Messages: Design
  • Asynchronous communication mechanism
  • Messages contain user-level handler address
  • Handler executed on message arrival
    • Takes message off network
    • Message body is argument
    • Does not block
active messages design1
Active Messages: Design
  • Sender blocks until messages can be injected into network
  • Receiver interrupted on message arrival - runs handler
  • User level program pre-allocates receiving structures
    • Eliminates buffering
traditional message passing
Traditional Message Passing
  • Traditional send/receive models
active messages performance
Active Messages: Performance
  • Key optimization in AM vs. send/receive is reduction of buffering.
  • AM can achieve near order of magnitude reduction:
    • nCUBE/2 AM send/handle: 11us/15us overhead
    • nCUBE/2 async send/receive: 160us overhead
    • CM-5 AM : <2us overhead
    • CM-5 blocking: 86us overhead
    • Prototype of blocking send/receive on top of AM: 23us overhead
active messages split c
Active Messages: Split-C
  • Non-blocking implementations of PUT and GET
  • Implementations consist of a message formatter and a message handler
active messages matrix multiply
Active Messages: Matrix Multiply
  • Multiplication of C = A x B . Processor GETS one column of A after another to perform rank-1 update with its own columns of B.
  • Achieves 95% of peak performance
message driven architectures
Message Driven Architectures
  • Computation occurs in the message handler.
    • Specialized hardware -> Monsoon, J-Machine
    • Memory allocation and scheduling required upon message arrival
    • Tricky to implement in hardware
    • Expensive
  • In Active Messages, handler only removes messages from the network.
  • Threaded Abstract Machine (TAM)
    • Parallel execution model based on Active Message
    • Typically no memory allocation upon message arrival
    • No test results
active messages recap
Active Messages: Recap
  • Good performance
  • Not a new parallel programming paradigm
    • “Evolutionary not Revolutionary”
  • AM systems?
  • Multiprocessor vs. Cluster
u net authors
U-Net: Authors
  • Thorsten von Eicken
  • AnindyaBasu
    • Advised by von Eicken
  • VineetBuch
    • M.S. from Cornell
    • Co-founded Like.com -> Google
  • Werner Vogels
    • Research Scientist at Cornell -> CTO of Amazon
u net motivation
U-Net: Motivation
  • Bottleneck of local area communication at kernel
    • Several copies of messages made
    • Processing overhead dominates for small messages
  • Low round-trip latencies growing in importance
    • Especially for small messages
  • Traditional networking architecture inflexible
    • Cannot easily support new protocols or send/receive interfaces
u net goals
U-Net: Goals
  • Remove kernel from critical path of communication
  • Provide low-latency communication in local area settings
  • Exploit full network bandwidth even with small messages
  • Facilitate the use of novel communication protocols
u net takeaways
U-Net: Takeaways
  • Flexible
  • Low latency for smaller messages
  • Off the shelf hardware – good performance
  • Weaknesses :
    • Multiplexing resources between processes not in kernel
    • Specialized NI needed?
u net design
U-Net: Design
  • User level communication architecture independent
  • Virtualizes network devices
  • Kernel control of channel set-up and tear-down
u net design1
U-Net: Design
  • Remove kernel from critical path: send/recv
u net control
U-Net: Control

U-Net:

  • Multiplexes NI among all processes accessing network
  • Enforces protection boundaries and resource limits

Process:

  • Contents of each message and management of send/recv resources (i.e. buffers)
u net architecture
U-Net: Architecture
  • Main building blocks of U-Net:
    • Endpoints
    • Communication Segments
    • Message Queues
  • Each process that wishes to access the network
    • Creates one or more endpoints
    • Associates a communication segment with each endpoint
    • Associates set of send, receive and free message queues with each endpoint
u net send
U-Net: Send
  • Prepare packet -> place it in the commseg
  • Place descriptor on the Send queue
  • U-Net takes descriptor from queue
  • Transfer packet from memory to network

packet

Network

U-Net NI

From ItamarSagi

u net receive
U-Net: Receive
  • U-Net receives message and identifies Endpoint
  • Takes free space from free queue
  • Places message in communication cegment
  • Places descriptor in receive queue
  • Process takes descriptor from receive queue and reads message

Network

U-Net NI

U-Net NI

packet

From ItamarSagi

u net protection boundaries
U-Net: Protection Boundaries
  • Only owning process can access:
    • Endpoints
    • Communication Segments
    • Message queues
  • Outgoing messages tagged with the originating endpoint
  • Incoming messages demultiplexed by U-Net
u net zero copy
U-Net: “zero-copy”
  • Base-level: “zero-copy”
    • Comm segment not regarded as memory regions
    • 1 copy betw application data structure and buffer in comm segment
    • Small messages held entirely in queue
  • Direct-access: “true zero copy”
    • Comm segments can span entire process address space
    • Sender can specify offset within destination commseg for data
    • Difficult to implement on existing workstation hardware
u net zero copy1
U-Net: “zero-copy”
  • U-Net implementations support Base-level
    • Hardware for direct-access not available
    • Copy overhead not a dominant cost
  • Kernel emulated endpoints
u net implementation
U-Net: Implementation
  • Implemented on SPARCstations running SunOS 4.13
    • Fore SBA-100 interface
      • Lack of hardware for CRC computation = overhead
    • Fore SBA-200 interface
      • Uses custom firmware to implement base-level architecture
      • i960 processor reprogrammed to implement U-Net directly
  • Small messages: 65us RTT vs. 12us for CM-5
  • Fiber saturated with packet sizes of 800 bytes
u net tcp ip and udp ip
U-Net: TCP/IP and UDP/IP
  • Traditional UDP and TCP over ATM performance disappointing
    • < 55% max bandwidth for TCP
  • Better performance with UDP and TCP over U-Net
    • Not bounded by kernel resources
    • More state awareness =

better application-network relationships

u net discussion
U-Net: Discussion
  • Main goals were to achieve low latency communication and flexibility
  • NetBump