1 / 51

Multi-Terminal Information Theory Problems in Sensor Networks

Multi-Terminal Information Theory Problems in Sensor Networks. Gregory J Pottie Professor, Electrical Engineering Department Associate Dean, Research and Physical Resources Deputy Director, NSF Center for Embedded Networked Sensing (CENS)

Antony
Download Presentation

Multi-Terminal Information Theory Problems in Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Terminal Information Theory Problems in Sensor Networks Gregory J Pottie Professor, Electrical Engineering Department Associate Dean, Research and Physical Resources Deputy Director, NSF Center for Embedded Networked Sensing (CENS) UCLA Henry Samueli School of Engineering and Applied Science pottie@icsl.ucla.edu

  2. Context and general issues Basic tools of information theory Multi-terminal information theory Research domains Data fusion Cooperative communication Sensor network scalability Network synchronization Distributed large-scale systems Outline

  3. Sensor Network Operation Cooperative communication Data fusion Routing Basic goal: detection/identification of point or distributed sources subject to distortion constraints, and timely notification of end user

  4. Basic Information Theoretic Concepts • Typical Sets (of sufficiently long sequences of i.i.d. variables): • Has probability nearly 1 • The elements are equally probable • The number of elements is nearly 2nH Xn Yn W source decoder channel p(y|x) channel encoder • Aim of communications system: • Minimize errors due to noise in channel • Maximize data rate • Minimize bandwidth and power (the resources) • Shannon Capacity establishes the fundamental limits

  5. Jointly Typical Sequences Xn Yn X1n X2n Output set in general larger due to additive noise; Output images of inputs may overlap due to noise

  6. Basic Information Theoretic Concepts Xn Yn W source decoder channel p(y|x) channel encoder • Capacity C is the max mutual information I(X;Y) wrt p(x); that is, choose the set X leading to largest mutual information. • Capacity C is the largest rate at which information can be transmitted without error • Jointly typical set: from among the typical input and output sequences, choose the ones for which 1/n log p(xn,yn) close to H(X,Y) • Size of jointly typical set is about 2nI(X,Y), thus there are about this number of distinguishable signals (codewords) in Xn • These codewords necessarily contain redundancy--size of set is smaller than the alphabet would imply; sequences provide better performance than isolated symbols if properly chosen.

  7. Gaussian Channel Capacity • Discrete inputs to channel, and channel adds noise with Gaussian distribution (zero mean, variance N) • Input sequence (codeword) power set to P • Capacity is maximum I(X;Y) over p(x) such that EX2 satisfies power constraint • C = 1/2 log(1+P/N) bits per transmission. • The more usual form is to consider a channel of bandwidth W and noise power spectral density No. Then C = W log(1+P/NoW) bits per second.

  8. Rate Distortion: Lossy Source Coding • Rate distortion function R(D) can be interpreted as • The minimum rate at which a source can be represented subject to a distortion D=d(X,Y) • The minimum distortion that can be achieved given a maximum rate constraint R • Interesting dual results to Capacity • Spend coding effort on distortion-typical set; rest are don’t cares • Applies to compression of real-valued sequences Achievable region R D

  9. Universal Source Coding • Divide sequence into distortion-typical (interesting) and distortion-atypical (uninteresting) sets • Index for distortion typical set of small length--consumes our coding effort; atypical set is large, but coding scheme not critical • Require systematic means of classifying sequences as typical (promotion mechanism and distance measure) • Gold washing algorithm: typical set, plus candidates Distortion typical set Atypical set

  10. Source/Channel Coding Separation • For single link, separately performing source and channel coding achieves optimal rates • Separate optimization greatly reduces theoretical complexity • Classes of codes have been identified that get very close to respective Shannon limits • Joint source/channel coding can reduce latency or overall complexity, but infrequently used since application-specific

  11. Multi-Terminal Information Theory • The preceding discussion assumed a single transmitter and receiver • Multi-terminal information theory considers maximization of mutual information for the following possibilities: • Multiple senders and one receiver (the multiple access channel) • One sender and multiple receivers (the broadcast channel) • One sender and one receiver, but intervening transducers that can assist (the relay channel) • Composite combinations of these basic types • Bayes estimation also aims to maximize mutual information, except the senders do not cooperate and usually there is a fidelity constraint: • One sender and multiple receivers (the data fusion problem) • Multiple senders and receivers (the source separation problem) • Delay and resource usage may also be included

  12. Gaussian Multiple Access Channel • m transmitters with power P sharing the same noisy channel • C(P/N)=1/2 log(1+P/N) bits per channel use for isolated sender • then the achievable rate region is • The last inequality dominates when rates are the same • Capacity increases with more users (there is more power) • Result is dual to Slepian-Wolf encoding of correlated sources

  13. Gaussian Broadcast Channel • One sender of power P and two receivers, one with noise N1 and one with noise N2, N1 < N2 • The two codebooks are coordinated to exploit commonality of information transmitted, otherwise capacity does not exceed simple multiplexing

  14. Relay Channel • One sender, one relay, and one receiver; relay transmits X1 based only on its observations Y1 Y1:X1 Y X • Combines a broadcast channel and a multiple access channel • Networks are comprised of multiple relay channels that may further induce delay

  15. General Multi-Terminal Networks • m nodes, with node j with associated transmission variable X(j), and receive variable Y(j) • Node 1 transmits to node m; what is the maximum achievable rate? (X1,Y1) (Xm,Ym) • Bounds derived from information flow across multiple cut sets • generally not achievable

  16. Costs of Source-Channel Separation • Source-channel coding separation theorem fails because capacity of multiple access channels increases with correlation, while source encoding eliminates correlation • Greatly complicates search for optimal codes; raises question of whether joint coding would be worth it • Gastpar has considered asymptotic cost of separate rate-distortion and channel coding • Compare: • Network rate-distortion coding, followed by cooperative transmission • Joint rate-distortion and channel coding • Potentially exponentially better performance for joint source and channel coding, in limit the number of nodes n observing a Gaussian source with comparable SNR goes to infinity. • Bound, not a prescription for how to do this!

  17. Now let it move… • Nodes move within bounded region according to some random distribution; what is capacity subject to energy constraint on messages? Node m Node 1 Time 2 Time 1 • Answer depends on delay constraint; eventually they will collide implying near-zero path loss and thus unbounded capacity • Other questions: • Probability the nodes have connecting path of required rate • Probability of message arriving in required delay

  18. Some Recent Research for Sensor Networks • Data fusion in sensor networks • N-helper problem • Cooperative communications in sensor networks • Scalability of sensor networks • Sensing for distributed sources • Network synchronization and rate distortion • Systems design

  19. General Assumptions • Objective of network is to solve some (multi-) hypothesis problem, subject to a set of fidelity criteria, and convey the result to some end-user, subject to resource constraints • Consequence: fidelity criteria and resource constraints allow meaningful optimization questions to be posed • Communications is more costly than signal processing • Consequence: long distance communication is to be avoided, if possible • Justification: Shannon capacity and Maxwell’s equations are fundamental; SP power cost follows Moore’s Law • Signals decay with distance of propagation • Consequence: local distributed algorithms become feasible • Justification: true for all natural propagation media

  20. Rate Distortion and Data Fusion • Can identify resource use (energy/number of bits transmitted) with rate,decision reliability (false alarm rate, missed detection prob) with distortion • Operate at different points on rate distortion curve depending on valuesof cost function • Location of fusion center, numerical resolution, number of sensors,length of records, routing, distribution of processing all affect R(D)

  21. A Simple Algorithm • Nodes activated to send requests for information from other nodes based on SNR • If above threshold T, decision is reliable, and suppress activity by neighbors • Otherwise, increase likelihood of requesting help based on proximity to T • In likelihood, higher SNR nodes form the cluster • Bits of resolution related to SNR (e.g., for use in maximal ratio combining) 1: high SNR; initiates 2: activated, and requests further information 3: SNR too low to respond 3 2 1 3

  22. Optimal Fusion and Information Theory • Bayes estimator maximizes the likelihood FX(x|z) where x is the state of nature and z is the set of observables. • Define Zr={z(1),z(2),…,z(r)}=set of observations to time r, then recursive form of the estimator is • A variety of classical estimators then maximize the likelihoods based on particular assumptions regarding the priors • Fusion: typically weighted combinations of likelihoods to produce decision; as sensors may be very different, question of optimal weighting scheme

  23. Likelihood Opinion Pool Sensor 1 F(Z1r|x) Sensor 2 F(Z2r|x) P F(x|Zr) . . . F(x) Prior information Sensor n F(Znr|x) The hard part: determination of the various likelihoods

  24. Likelihood Opinion Pool • Combine using the recursive rule: • Taking logarithms on each side, followed by expectations one obtains: • Which can be interpreted as posterior information=prior information+observation information; thus can deal in summations of mutual information obtained from different sensor types (e.g., video plus audio).

  25. Designing for Detection • In digital communications, choose modulation for ease of estimation of decision variables and subsequent selection of most likely signal (hypothesis); we design signals for separability • In sensor networks, have no control over nature, but we can control: • Density and locations of sensors • Sensor types • These can be manipulated in same way, given a fusion strategy, to ease signal separability or achieve Nyquist sampling of source features. • This can also be done adaptively as we learn more about the sources and the propagation environment (in general, reduce model uncertainty): • Add sensors, and/or change types (e.g., new deployment) • Move sensors • Articulate directional elements

  26. Networked Info-Mechanical Systems

  27. The n-helper Gaussian Scenario X Y1 Y2 • Multiple sensors observe event and generate correlated Gaussian data. One data node (X) is the main data source (e.g. closest to phenomenon), and the n additional nodes (Y1 - Yn) are the ‘helpers’. • The Problem: What codes and data rates so that gateway/data-fusion center can reproduce the data from the main node using the remaining nodes as sources of partial side information, subject to some distortion criterion. Gateway/Fusion center Y3 … Yn

  28. Main Result • We do not care about reproduction of the Y variables; rather they act as helpers to reproduce X • This problem was previously solved for the 2-node case • Key to extension: treat Yk|Yk-1,..X as single new helper Pk. • Our solution: for an admissable rate (Rx,R1,…,Rn), and for some Di’s>0, the n-helper system data rates can be fused to yield an effective data rate (wrt source X) satisfying the following rate distortion bound: • where s2 is the variance and r is the correlation (straightforward but tedious to calculate as n increases).

  29. Comments • Other source distributions analytically difficult, but many are likely to be convex optimizations • Generalization would consider instances of relay/broadcast channels in conveying information to fusion center with minimum energy • Many sensor network detection problems are inherently local: even though expression may be complicated, the number of helpers will usually be small due to decay of signals as power of distance • Numerical results for Gaussian sources indicate a small number of helpers lead to significant improvement; rapidly diminishing returns after four or so for typical propagation conditions. • Suggests that source/channel coding separation might in fact be good enough for many practical situations (especially above the local interaction)

  30. Many low-power and low-cost wireless sensors cooperate with each other to achieve more reliable and higher rate communications The dominant constraint is the peak power, the bandwidth is not the main concern Multiplexing (FDMA, TDMA, CDMA, OFDM) is the standard approach. Each sensor has an unique channel We focus on schemes where multiple sensors occupy the same channel Problem Definition of Cooperative Communication

  31. Example: Space-Time Coding • N transmit antennas and N receive antennas • Channel transition matrix displays independent Rayleigh (complex Gaussian) fading in each component • With properly designed codes, capacity is N times that of single Rayleigh channel • Note this implicitly assumes synchronization among Tx and Rx array elements--requires special effort in sensor networks • A coordinated transmission, not a multiple access situation.

  32. Context • Cooperative reception problem very similar to multi-node fusionproblem; same initiation procedure required to create the cluster, however we can choose channel code. • Cooperative transmission and reception similar to multi-target multi-node fusion, but more can be done: beacons, space-time coding • Use to overcome gaps in network, communicate with devicesoutside of sensor network (e.g. UAV)

  33. Channel state information: known at transmitter side, and at both sides If channel state information is known at the transmit side, RF synchronization can be achieved Channels: AWGN and fading channels with unequal path loss General formula Channel Capacity

  34. Receive diversity: Transmit diversity: Combined transmit-receive diversity RF synchronization Channel Capacity(cont’d)

  35. Comments • Capacity is much higher if phase synchronization within transmitter and receiver clusters can be achieved • Have investigated practical methods for satellite/ground sensors synchronization • Beacons (e.g. GPS) can greatly simplify the synchronization problem for ground/ground cooperative communications • Recent network capacity results do not take into account possibilities for cooperation by nodes as transmitter/receiver clusters

  36. Received signal power decays with distance, and transmission power is limited Frequency re-use is possible; sophisticated antenna/MIMO systems improve the constant Nodes generate traffic, and can relay traffic from other nodes If did not generate traffic, then higher node density implies greater network capability (improved re-use) All nodes alike We will also relax this later Capacity in Ad Hoc Networks

  37. n Nodes within some fixed region A, with max radio range R, bandwidth W, generating data. Source-destination pairs random; per node transport capacity is then Transport Capacity of Wireless Networks

  38. Note this is achieved by using simple relay strategy: one link at a time without cooperation in transmission or reception (Gupta-Kumar); but bad news continues even with optimal cooperation (Gastpar-Vetterli) The inverse square root of n behavior can be roughly explained by average number of links increasing in a path of a given length, each of which must deal with more traffic to be carried, with the same bandwidth. Transport Capacity of Wireless Networks II

  39. The only solution when everyone generates traffic is to add more resources as n increases Traditional approach: communication hierarchy where we add new resources at each layer Each level is limited in numbers Traffic is aggregated and carried on set of trunks of increasing bandwidth and thus capacity Higher levels are longer distance, also limiting latency Scaling in Ad Hoc Networks

  40. Elements not only generate traffic but can process data Do not necessarily want or need to send raw information to distant users with same probability as near neighbors Key to scalability is to change the source-destination pair distribution to local communication (in limit, most nodes in fact send nothing) Key to proof is to separately consider densities of sources, sensors and communication relays, and pose problem as extraction of information to within particular fidelity (rate distortion) Scaling in Sensor Networks

  41. Cooperative rate distortion coding results in most communication being local; more nodes do not necessarily result in more traffic under distortion criterion More relays reduce frequency re-use distance and thus interference; capacity can increase without bound Thus more nodes increase likelihood of extracting information at desired fidelity Scalability for Point Sources in Sensor Networks

  42. Number of bits a sensor reports is a complicated function of density Low density: report nothing if SNR too low High density: may need to report only decision Moderate density: many nodes may need to locally cooperate with mix of raw data and decision likelihoods Far away powerful sources will activate many nodes with similar SNR, but a small subset of nodes will be sufficient to make decisions Design objective will be to minimize resources required to suppress node activity Comments

  43. To estimate parameters of a field (e.g., to get isotherm map) information increases until achieve desired spatial sampling After this extra nodes contribute no additional information, but can increase communication resource Image processing analogy: specify pixel size Parameters to describe local field can be compact compared to raw data, for given level of distortion Scalability for Distributed Sources

  44. Dense network; in neighborhood have mix of nodes with different ranges, operating in separate bands Locally route towards the longer range links; they act as traffic attractors, causing number of hops at any given layer to be small Cooperative communication among nodes would serve mainly to assure reliability of paths towards next level of hierarchy Result is a (largely) standard overlay hierarchical network Any cross-layer optimization (e.g., joint source-channel coding) is confined to the local neighborhood, since this is where most of the resources are consumed in any scalable solution. Practical Implementation

  45. Synchronism is needed for wide set of purposes in sensor networks: Coordination of power down/up for energy savings Time stamping of data Coherent combining in communication or sensing (cooperative comm., fusion, position location) Traditional approaches assume receivers/processors always on, and provide same precision everywhere by locking oscillators Sensor networks are different Do not need same level of synchronism at all times and everywhere Do need to save energy Network Synchronization

  46. Clocks are not explicitly locked; rather record differences of time scales to allow explicit conversion. References are passed either on a schedule or on demand for post facto synchronization Frequency and precision of updates (the rate) depends on local accuracy requirement Dtj (the time distortion) Would like to bound rate subject to accuracy requirements and acceptable delays in achieving synchronism Very similar issues for position localization Synchronization and Rate Distortion

  47. Implications of Signal Locality • Severe decay of signals with distance (second to fourth power) • Mutual information to source dominated by small set of nodes • Cooperative communication clusters for ground to ground transmission will likely be small • Implications: • Local processing is good enough for many situations; do not need to convey raw data over long distances very frequently • Consequently, lowest layers of processing/network formation, etc. are the most important, since most frequently invoked (“typical”) • Practical example: • Specialized local transmission schemes (e.g., for forming ad hoc clusters), but long range might use conventional methods such as TCP/IP

  48. Hierarchy in Sensor Networks • For dealing with the network as a whole, number of variations of topology are immense • Distributed algorithms exploiting locality of events • Use of ensembles for deriving bounds • In between, considers layers of hierarchy, each of which may be amenable to a conventional optimization technique

  49. cue Information Processing Hierarchy Note difficulty of fully separating networking, database and signal processing problems transmit decision human observer beamforming base stationhigh resolutionprocessing query for more information high powerlow false alarm ratelow duty cycle low powerhigh false alarm ratehigh duty cycle

  50. Some Research Challenges • Minimal energy to obtain reliable decision in a distributed network • Minimal (average) delay in conveying information through network • Density and source separability trades • Model uncertainty and methods for reducing its effects • how do we know that we don’t know? • Role of hierarchy; how much leads to what kinds of changes in information theoretic optimal behavior • At small scale can use brute force, at large scale can use ensembles; what can we do in between? • Exploitation of signal locality: what is the spatial domain over which cross-layer optimization is useful

More Related