reliable transport and code distribution in wireless sensor networks n.
Skip this Video
Download Presentation
Reliable Transport and Code Distribution in Wireless Sensor Networks

Loading in 2 Seconds...

play fullscreen
1 / 66

Reliable Transport and Code Distribution in Wireless Sensor Networks - PowerPoint PPT Presentation

  • Uploaded on

Reliable Transport and Code Distribution in Wireless Sensor Networks. Thanos Stathopoulos CS 213 Winter 04. Reliability: Introduction. Not an issue on wired networks TCP does a good job Link error rates are usually ~ 10 -15 No energy cost However, WSNs have: Low power radios

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Reliable Transport and Code Distribution in Wireless Sensor Networks' - garin

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reliable transport and code distribution in wireless sensor networks

Reliable Transport and Code Distribution in Wireless Sensor Networks

Thanos Stathopoulos

CS 213 Winter 04

reliability introduction
Reliability: Introduction
  • Not an issue on wired networks
    • TCP does a good job
    • Link error rates are usually ~ 10-15
    • No energy cost
  • However, WSNs have:
    • Low power radios
      • Error rates of up 30% or more
      • Limited range
    • Energy constraints
      • Retransmissions reduce lifetime of network
    • Limited storage
      • Buffer size cannot be too large
    • Highly application-specific requirements
      • No ‘single’ TCP-like solution
  • Loss-tolerant algorithms
    • Leverage spatial and temporal redundancy
    • Good enough for some applications
      • But what about code updates?
  • Add retransmission mechanism
    • At the link layer (e.g. SMAC)
    • At the routing/transport layer
    • At the application layer
  • Hop-by-hop or end-to-end?
relevant papers
Relevant papers
  • PSFQ: A Reliable Transport Protocol for Wireless Sensor Networks
  • RMST: Reliable Data Transport in Sensor Networks
  • ESRT: Event-to-Sink Reliable Transport in Wireless Sensor Networks
psfq overview
PSFQ: Overview
  • Key ideas
    • Slow data distribution (pump slowly)
    • Quick error recovery (fetch quickly)
    • NACK-based
    • Data caching guarantees ordered delivery
    • Assumption: no congestion, losses due only to poor link quality
  • Goals
    • Ensure data delivery with minimum support from transport infrastructure
    • Minimize signaling overhead for detection/recovery operations
    • Operate correctly in poor link quality environments
    • Provide loose delay bounds for data delivery to all intended receivers
  • Operations
    • Pump
    • Fetch
    • Report
end to end considered harmful
End-to-end considered harmful ?
  • Probability of reception degrades exponentially over multiple hops
    • Not an issue in the Internet
    • Serious problem if error rates are considerable
      • ACKs/NACKs are also affected
proposed solution hop by hop error recovery
Proposed solution: Hop-by-Hop error recovery
  • Intermediate nodes now responsible for error detection and recovery
    • NACK-based loss detection probability is now constant
      • Not affected by network size (scalability)
      • Exponential decrease in end-to-end
  • Cost: Keeping state on each node
    • Potentially not as bad as it sounds!
      • Cluster/group based communication
      • Intermediates are usually receivers as well
pump operation
Pump operation
  • Node broadcasts a packet to its neighbors every Tmin
    • Data cache used for duplicate suppression
  • Receiver checks for gaps in sequence numbers
  • If all is fine, it decrements TTL and schedules a transmission
    • Tmin < Ttransmit < Tmax
    • By delaying transmission, quick fetch operations are possible
    • Reduce redundant transmissions (don’t transmit if 4 or more nodes have forwarded the packet already)
    • Tmax can provide a loose delay bound for the last hop
      • D(n)=Tmax * (# of fragments) * (# of hops)
fetch operation
Fetch operation
  • Sequence number gap is detected
    • Node will send a NACK message upstream
      • ‘Window’ specifies range of sequence numbers missing
      • NACK receivers will randomize their transmissions to reduce redundancy
    • It will NOT forward any packets downstream
    • NACK scope is 1 hop
    • NACKs are generated every Tr if there are still gaps
      • Tr < Tmax
        • This is the pump/fetch ratio
      • NACKs can be cancelled if neighbors have sent similar NACKs
proactive fetch
Proactive Fetch
  • Last segments of a file can get lost
    • Loss detection impossible; no ‘next’ segment exists!
  • Solution: timeouts (again)
    • Node enters ‘proactive fetch’ mode if last segment hasn’t been received and no packet has been delivered after Tpro
    • Timing must be right
      • Too early: wasted control messages
      • Too late: increased delivery latency for the entire file
    • Tpro = a * (Smax - Smin) * Tmax
      • A node will wait long enough until all upstream nodes have received all segments
    • If data cache isn’t infinite
      • Tpro = a * k * Tmax (Tpro is proportional to cache size)
report operation
Report Operation
  • Used as a feedback/monitoring mechanism
  • Only the last hop will respond immediately (create a new packet)
    • Other nodes will piggyback their state info when they receive the report reply
    • If there is no space left in the message, a new one will be created
experimental results
Experimental results
  • Tmax= 0.3s, Tr = 0.1s
  • 100 30-byte packets sent
  • Exponential increase in delay happens at 11% loss rate or higher
psfq conclusion
PSFQ: Conclusion
  • Slow data dissemination, fast data recovery
    • All transmissions are broadcast
  • NACK-based, hop-by-hop recovery
    • End-to-end behaves poorly in lossy environments
    • NACKs are superior to ACKs in terms of energy savings
  • No out-of-order delivery allowed
    • Uses data caching extensively
  • Several timers and duplicate suppression mechanisms
    • Implementing any of those on motes is challenging (non-preemptive FIFO scheduler)
rmst overview
RMST: Overview
  • A transport layer protocol
    • Uses diffusion for routing
    • Selective NACK-based
  • Provides
    • Guaranteed delivery of all fragments
      • In-order delivery not guaranteed
    • Fragmentation/reassembly
placement of reliability for data transport
Placement of reliability for data transport
  • RMST considers 3 layers
    • MAC
    • Transport
    • Application
  • Focus is on MAC and Transport
mac layer choices
MAC Layer Choices
  • No ARQ
    • All transmissions are broadcast
    • No RTS/CTS or ACK
    • Reliability deferred to upper layers
    • Benefits: no control overhead, no erroneous path selection
  • ARQ always
    • All transmissions are unicast
    • RTS/CTS and ACKs used
    • One-to-many communication done via multiple unicasts
    • Benefits: packets traveling on established paths have high probability of delivery
  • Selective ARQ
    • Use broadcast for one-to-many and unicast for one-to-one
    • Data and control packets traveling on established paths are unicast
    • Route discovery uses broadcast
transport layer choices
Transport Layer Choices
  • End-to-End Selective Request NACK
    • Loss detection happens only at sinks (endpoints)
    • Repair requests travel on reverse (multihop) path from sinks to sources
  • Hop-by-Hop Selective Request NACK
    • Each node along the path caches data
    • Loss detection happens at each node along the path
    • Repair requests sent to immediate neighbors
    • If data isn’t found in the caches, NACKs are forwarded to next hop towards source
application layer choices
Application Layer Choices
  • End-to-End Positive ACK
    • Sink requests a large data entity
    • Source fragments data
    • Sink keeps sending interests until all fragments have been received
    • Used only as a baseline
rmst details
RMST details
  • Implemented as a Diffusion Filter
    • Takes advantage of Diffusion mechanisms for
      • Routing
      • Path recovery and repair
  • Adds
    • Fragmentation/reassembly management
    • Guaranteed delivery
  • Receivers responsible for fragment retransmission
    • Receivers aren’t necessarily end points
    • Caching or non-caching mode determines classification of node
rmst details cont d
RMST Details (cont’d)
  • NACKs triggered by
    • Sequence number gaps
      • Watchdog timer inspects fragment map periodically for holes that have aged for too long
    • Transmission timeouts
      • ‘Last fragment’ problem
  • NACKs propagate from sinks to sources
    • Unicast transmission
    • NACK is forwarded only if segment not found in local cache
    • Back-channel required to deliver NACKs to upstream neighbors
  • NS-2 simulation
    • 802.11 MAC
  • 21 nodes
    • single sink, single source
  • 6 hops
  • MAC ARQ set to 4 retries
  • Image size: 5k
    • 50 100-byte fragments
  • Total cost of sending the entire file: 87,818 bytes
    • Includes diffusion control message overhead
    • All results normalized to this value
results baseline no rmst
Results: Baseline (no RMST)
  • ARQ and S-ARQ have high overhead when error rates are low
  • S-ARQ is better in terms of efficiency
    • Also helps with route selection
  • No ARQ results drop considerably as error rates increase
    • Exponential decay of end-to-end reliability mechanisms
results rmst with h b h recovery and caching
Results: RMST with H-b-H Recovery and Caching
  • Slight improvement for ARQ and S-ARQ results over baseline
  • No ARQ is better even in the 10% error rate case
    • But, many more exploratory packets were sent before the route was established
results rmst with e 2 e recovery
Results: RMST with E-2-E Recovery
  • No ARQ doesn’t work for the 10% error rate case
    • Numerous holes that required NACKs couldn’t make it from source to sink without link-layer retransmissions
  • ARQ and S-ARQ results are statistically insignificant from H-b-H results
    • NACKs were very rare when any form of ARQ was used
results performance under high error rates
Results: Performance under High Error Rates
  • No ARQ doesn’t work for the 30% error rate case
    • Diffusion control messages could not establish routes most of the time
    • In the 20% case, it took several minutes to establish routes
rmst conclusion
RMST: Conclusion
  • ARQ helps with unicast control and data packets
    • In high error-rate environments, routes cannot be established without ARQ
  • Route discovery packets shouldn’t use ARQ
    • Erroneous path selection can occur
  • RMST combines a NACK-based transport layer protocol with S-ARQ to achieve the best results
congestion control
Congestion Control
  • Sensor networks are usually idle…
  • …Until an event occurs
    • High probability of channel overload
    • Information must reach users
  • Solution: congestion control
esrt overview
ESRT: Overview
  • Places interest on events, not individual pieces of data
  • Application-driven
    • Application defines what its desired event reporting rate should be
  • Includes a congestion-control element
  • Runs mainly on the sink
  • Main goal: Adjust reporting rate of sources to achieve optimal reliability requirements
problem definition
Problem Definition
  • Assumption:
    • Detection of an event is related to number of packets received during a specific interval
  • Observed event reliability ri:
    • # of packets received in decision interval I
  • Desired event reliability R:
    • # of packets required for reliable event detection
    • Application-specific
  • Goal: configure the reporting rate of nodes
    • Achieve required event detection
    • Minimize energy consumption
reliability vs reporting frequency
Reliability vs Reporting frequency
  • Initially, reliability increases linearly with reporting frequency
  • There is an optimal reporting frequency (fmax), after which congestion occurs
  • Fmax decreases when the # of nodes increases
characteristic regions
Characteristic Regions
  • n: normalized reliability indicator
  • (NC,LR): No congestion, Low reliability
    • f < fmax, n < 1-e
  • (NC, HR): No congestion, High reliability
    • f <= fmax, n < 1+e
  • (C, HR): Congestion, High reliability
    • f > fmax, n > 1
  • (C, LR): Congestion, Low reliability
    • f < fmax, n <= 1
  • OOR: Optimal Operating Region
    • f < fmax, 1-e <= n <= 1+e
esrt requirements
ESRT Requirements
  • Sink is powerful enough to reach all source nodes (i.e. single-hop)
  • Nodes must listen to the sink broadcast at the end of each decision interval and update their reporting rates
  • A congestion-detection mechanism is required
congestion detection and reliability level
Congestion Detection and Reliability Level
  • Both done at the sink
  • Congestion:
    • Nodes monitor their buffer queues and inform the sink if overflow occurs
  • Reliability Level
    • Calculated by the sink at the end of each interval based on packets received
esrt protocol operation
ESRT Protocol Operation
  • (NC, LR):
  • (NC, HR):
  • (C, HR):
  • (C, LR):
esrt conclusion
ESRT: Conclusion
  • Reliability notion is application-based
    • No delivery guarantees for individual packets
  • Reliability and congestion control achieved by changing the reporting rate of nodes
  • Pushes all complexity to the sink
  • Single-hop operation only
code distribution introduction
Code Distribution: Introduction
  • Nature of sensor networks
    • Expected to operate for long periods of time
    • Human intervention impractical or detrimental to sensing process
  • Nevertheless, code needs to be updated
    • Add new functionality
      • Incomplete knowledge of environment
      • Predicting right set of actions is not always feasible
    • Fix bugs
    • Maintenance
  • Transfer the entire binary to the motes
    • Advantage
      • Maximum flexibility
    • Disadvantage
      • High energy cost due to large volume of data
  • Use a VM and transfer capsules
    • Advantage
      • Low energy cost
    • Disadvantages
      • Not as flexible as full binary update
      • VM required
  • Reliability is required regardless of approach
  • A Remote Code Update Mechanism for Wireless Sensor Networks
  • Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks
moap overview
MOAP: Overview
  • Code distribution mechanism specifically targeted for Mica2 motes
  • Full binary updates
  • Multi-hop operation achieved through recursive single-hop broadcasts
  • Energy and memory efficient
requirements and properties of code distribution
Requirements and Properties of Code Distribution
  • The complete image must reach all nodes
    • Reliability mechanism required
  • If the image doesn’t fit in a single packet, it must be placed in stable storage until transfer is complete
  • Network lifetime shouldn’t be significantly reduced by the update operation
  • Memory and storage requirements should be moderate
resource prioritization
Resource Prioritization
  • Energy: Most important resource
    • Radio operations are expensive
      • TX: 12 mA
      • RX: 4 mA
    • Stable storage (EEPROM)
      • Everything must be stored and Write()s are expensive
  • Memory usage
    • Static RAM
      • Only 4K available on current generation of motes
      • Code update mechanism should leave ample space for the real application
    • Program memory
      • MOAP must transfer itself
      • Large image size means more packets transmitted!
  • Latency
    • Updates don’t respond to real-time phenomena
    • Update rate is infrequent
    • Can be traded off for reduced energy usage
design choices
Design Choices
  • Dissemination protocol: How is data propagated?
    • All at once (flooding)
      • Fast
      • Low energy efficiency
    • Neighborhood-by-neighborhood (ripple)
      • Energy efficient
      • Slow
  • Reliability mechanism
    • Repair scope: local vs global
    • ACKs vs NACKs
  • Segment management
    • Indexing segments and gap detection: Memory hierarchy vs sliding window
ripple dissemination
Ripple Dissemination
  • Transfer data neighborhood-by-neighborhood
  • Single-hop
    • Recursively extended to multi-hop
  • Very few sources at each neighborhood
    • Preferably, only one
  • Receivers attempt to become sources when they have the entire image
    • Publish-subscribe interface prevents nodes from becoming sources if another source is present
    • Leverage the broadcast medium
  • If data transmission is in progress, a source will always be one hop away!
    • Allows local repairs
  • Increased latency
reliability mechanism
Reliability Mechanism
  • Loss responsibility lies on receiver
    • Only one node to keep track of (sender)
  • NACK-based
    • In line with IP multicast and WSN reliability schemes
  • Local scope
    • No need to route NACKs
      • Energy and complexity savings
    • All nodes will eventually have the same image
retransmission policies
Retransmission Policies
  • Broadcast RREQ, no suppression
    • Simple
    • High probability of successful reception
    • Highly inefficient
    • Zero latency
  • Broadcast RREQ, suppression based on randomized timers
    • Quite efficient
    • Complex
    • Latency and successful reception based on randomization interval
retransmission policies cont d
Retransmission Policies (cont’d)
  • Broadcast RREQ, fixed reply probability
    • Simple
    • Good probability of successful reception
    • Latency depends on probability of reply
    • Average efficiency
  • Broadcast RREQ, adaptive reply probability
    • More complex than the static case
    • Similar latency/reception behavior
  • Unicast RREQ, single reply
    • Smallest probability of successful reception
    • Highest efficiency
    • Simple
      • Complexity increases if source fails
    • Zero latency
      • High latency if source fails
segment management discovering if a segment is present
Segment Management: Discovering if a segment is present
  • No indexing
    • Nothing kept in RAM
    • Need to read from EEPROM to find if segment i is missing
  • Full indexing
    • Entire segment (bit)map is kept in RAM
    • Look at entry i (in RAM) to find if segment is missing
  • Partial indexing
    • Map kept in RAM
    • Each entry represents k consecutive segments
    • Combination of RAM and EEPROM lookup needed to find if segment i is missing
segment management cont d
Segment Management (cont’d)
  • Hierarchical full indexing
    • First-level map kept in ram
      • Each entry points to a second-level map stored in EEPROM
    • Combination of RAM and EEPROM lookup needed to find if segment i is missing
  • Sliding window
    • Bitmap of up to w segments kept in RAM
    • Starting point: last segment received in order
    • RAM lookup
    • Limited out-of-order tolerance!
results energy efficiency
Results: Energy efficiency
  • Significant reduction in traffic when using Ripple
    • Up to 90% for dense networks
  • Full Indexing performs 5-15% better than Sliding Window
    • Reason: better out-of-order tolerance
    • Differences diminish as network density grows
results latency
Results: Latency
  • Flooding is ~ 5 times faster than Ripple
  • Full indexing is 20-30% faster than Sliding window
    • Again, reason is out-of-order tolerance
results retransmission policies
Results: Retransmission Policies
  • Order-of-magnitude reduction when using unicasts
current mote implementation
Current Mote implementation
  • Using Ripple-sliding window with unicast retransmission policy
  • User builds code on the PC
    • Packetizer creates segments out of binary
  • Mote attached to PC becomes original source and sends PUBLISH message
    • Receivers 1 hop away will subscribe, if version number is greater than their own
  • When a receiver gets the full image, it will send a PUBLISH message
    • If it doesn’t receive any subscriptions for some time, it will COMMIT the new code and invoke the bootloader
    • If a subscription is received, node becomes a source
  • Eventually, sources will also commit
current mote implementation cont d
Current Mote Implementation (cont’d)
  • Retransmissions have higher priority than data packets
    • Duplicate requests are suppressed
  • Nodes keep track of their sources’ activity with a keepalive timer
    • Solves NACK ‘last packet’ problem
    • If the source dies, the keepalive expiration will trigger a broadcast repair request
  • Late joiner mechanism allows motes that have just recovered from failure to participate in code transfer
    • Requires all nodes to periodically advertise their version
  • Footprint
    • 700 bytes RAM
    • 4.5K bytes ROM
moap conclusion
MOAP: Conclusion
  • Full binary updates over multiple hops
  • Ripple dissemination reduces energy consumption significantly
  • Sliding window method and unicast retransmission policy also reduce energy consumption and complexity
  • Successful updates of images up to 30K in size
  • Next steps
    • Larger experiments
    • Better Late Joiner mechanism
    • Verification phase
    • Sending DIFFS instead of full image
trickle overview
Trickle: Overview
  • State synchronization/code propagation mechanism
    • Suitable for VM environments, where transmitted code is small
  • Uses ‘polite gossip’ dissemination
    • Periodic broadcasts of state summary
    • Nodes overhear transmissions and stay quiet unless they need to update
  • Goals
    • Propagation: install new code
    • Maintenance: detect propagation need
basic mechanism
Basic Mechanism
  • A node will periodically transmit information
    • Only if less than a number of neighbors hasn’t sent the same data
  • ‘Cells’ (neighborhoods) can be in two states
    • All nodes up to date
    • Update needed
      • Node learns about new code
      • Node detects neighbor with old code
  • Since communication can be transmission or reception, ideally only one node per cell needs to transmit
    • Similar to MOAP’s ideal single-source scenario
  • Time is split in periods
  • Nodes pick a random slot from [0..T]
  • Transmission occurs if a node has heard less than K other identical transmissions
    • Otherwise, node stays quiet
    • K is small (1 or 2 usually)
  • If a node detects a neighbor that is out of date, it transmits the newer code
  • If a node detects it is out of date, it transmits its state
    • Update is triggered when other nodes receive this transmission
  • Nodes transmit at most once per period
  • In the presence of losses, scaling property is O(logn)
maintenance and timesync
Maintenance and timesync
  • When nodes are synchronized, everything works fine
  • If nodes are out of sync, some might transmit before others had a chance to listen
    • Short-listen problem
    • O(sqrt(n)) scaling
  • Solution: Enforce a listen-only periond
    • Pick a slot from [T/2..T] for transmission
  • Large T
    • Low communication overhead (less probable to pick same slot)
    • High latency
  • Small T: reversed
  • Solution: dynamic scaling of T
    • Use two bounds TL and TH
    • When T expires, it doubles until it reaches TH
    • When newer state is overheard, T= TL
    • When older state is overheard, immediately send updates
    • When new code is installed, T= TL
      • Helps spread new code quickly
trickle conclusion
Trickle: Conclusion
  • Efficient state synchronization protocol
    • Fast
    • Limits transmissions (localized flood)
    • Scales well with network size
  • Does not propagate code per se
    • Instead, it notifies the system that update is needed
    • In many cases, determining when to propagate can be more expensive than propagation
    • Contrast with MOAP’s simple late-joiner algorithm