- 56 Views
- Uploaded on
- Presentation posted in: General

Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the MarkovDecision Process Approach

Yongle Wu, Beibei Wang, and K. J. Ray Liu

Presenter:WayneHsiao

Advisor:Frank, Yeong-Sung Lin

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- Cognitive radio technology has been receiving a growing attention
- In a cognitive radio network
- Unlicensed users (secondary users)
- Spectrumholders(primaryusers)

- Secondary users usually compete for limited spectrum resources
- Game theory has been widely applied as a flexible and proper tool to model and analyze their behavior in the network

- Cognitive radio networks are vulnerable to malicious attacks
- Security countermeasures
- Crucial to the successful deployment of cognitive radio networks

- We mainly focus on the jamming attack
- One of the major threats to cognitive radio networks
- Several malicious attackers intend to interrupt the communications of secondary users by injecting interference

- Secondary user could hop across multiple bands in order to reduce the probability of being jammed
- Optimal defense strategy
- Markov decision process (MDP)

- The optimal strategy strikes a balance between the cost associated with hopping and the damage caused by attackers

- In order to determine the optimal strategy, the secondary user needs to know some information
- the number of attackers

- Maximum Likelihood Estimation (MLE)
- A learning process in this paper that the secondary user estimates the useful parameters based on past observations

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- The problem becomes more complicated in a cognitive radio network
- Primary users’ access has to be taken into consideration

- We consider the scenario
- Asingle-radio secondary user
- Defense strategy is to hop across different bands

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- A secondary user opportunistically accesses one of the predefined M licensed bands
- Each licensed band is time-slotted
- The access pattern of primary users can be characterized by an ON-OFF model

- Assume all bands share the same channel model and parameters
- But different bands are used by independent primary users

- Secondary user has to detect the presence of the primary user at the beginning of each time slot

- Communication gain R
- When the primary user is absent in that band

- The cost associated with hoppingisC
- We assume there are m (m ≥ 1) malicious single-radio attackers
- Attackers do not want to interfere with primary users
- Because primary users’ usage of spectrum is enforced by their ownership of bands

- On finding the secondary user
- Attacker will immediately inject jamming power which makes the secondary user fail to decode data packets

- We assume that the secondary user suffers from a significant loss L when jammed
- When all the attackers coordinate to maximize the damage
- they detect m channels in a time slot

- The longer the secondary user stays in a band, the higher risk to be exposed to attackers
- At the end of each time slot the secondary user decides
- to stay
- to hop

- The secondary user receives an immediate payoff U(n) in the nth time slot

- 1(.) is an indicator function
- Returning 1 when the statement in the parenthesis holds true
- 0 otherwise

- Average Payoff Ū
- The secondary user wants to maximize
- Malicious attackers want to minimize

- The discount factor δ (0 < δ < 1) measures how much the secondary user values a future payoff over the current one

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- Attackstrategy
- Attackers coordinately tune their radios randomly to m undetected bands in each time slot
- When either all bands have been sensed or the secondary user has been found and jammed

- The jamming game can be reduced to a Markov decision process
- We first show how to model the scenario as an MDP
- Then solve it using standard approaches

- At the end of the nth time slot
- The secondary user observes the state of the current time slot S(n)
- And chooses an action a(n)
- Whether to tune the radio to a new band or not, which takes effect at the beginning of the next time slot

- S(n) = P
- The primary user occupied the band inthenthtimeslot

- S(n) = J
- The secondary user was jammedinthenthtimeslot

- a(n) = h
- The secondary user to hop to a new band

- The secondary user has transmitted a packet successfully in the time slot
- ‘to hop’ (a(n) = h)
- ‘tostay’ (a(n) = s)

- S(n) = K
- This is theKthconsecutiveslotwithsuccessfultransmission in thesameband

- The immediate payoff depends on both the state and the action
- p(S’|S, h)
- The transition probability from an old state S to a new state S’ when taking the action h

- p(S’|S, s)
- The transition probability from an old state S to a new state S’ when taking the action s

- If the secondary user hops to a new band, transition probabilities do not depend on the old state
- The only possible new states are
- P (the new band is occupied by the primary user)
- J (transmission in the new band is detected by an attacker)
- 1 (successful transmission begins in the new band)

- When the total number of bands M is large
- M ≫ 1

- Assume that the probability of primary user’s presence in the new band equalthesteady-stateprobabilityoftheON-OFFmodel
- Neglecting the case that the secondary user hops back to some band in very short time,

- The secondary user will be jammed with the probability m/M
- Each attacker detects one band without overlapping

- Transition probabilities are

- Note that s is not a feasible action when the state is in J or P
- At state K, only max(M−Km,0) bands have not been detected by attackers
- But another m bands will be detected in the upcoming time slot
- The probability of jamming conditioned on the absence of primary user

- To sum up, transition probabilities associated with the action s are as follows: ∀K ∈ {1,2,3,...}

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- If the secondary user stays in the same band for too long, he/she will eventually be found by an attacker
- p(K + 1|K,s) = 0 if K > M/m − 1

- Therefore, we can limit the state S to a finite set ,where

- An MDP consists of four important components
- a finite set of states
- a finite set of actions
- transition probabilities
- immediate payoffs

- The optimal defense strategy can be obtained by solving the MDP

- A policy is defined as a mapping from a state to an action
- π : S(n) → a(n)

- A policy π specifies an action π(S) to take whenever the user is in state S
- Among all possible policies, the optimal policy is the one that maximizes the expected discounted payoff

- The value of a state S is defined as the highest expected payoff given the MDP starts from state S
- The optimal policy is the optimal defense strategy that the secondary user should adopt since it maximizes the expected payoff

- After a first move the remaining part of an optimal policy should still be optimal
- The first move should maximize the sum of immediate payoff and expected payoff conditioned on the currentaction
- Bellman equation

- Critical state K*(K∗≤ )
- K∗ can be obtained from solving the MDP, and the optimal strategy becomes

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- A learning scheme
- Maximum Likelihood Estimation (MLE)

- The secondary user simply sets a value as an initial guess of the optimal critical state K∗
- And follows the strategy (10) with the estimate during the whole learning period

- This guess needs not to be accurate
- After the learning period,the secondary user updates the critical state K∗ accordingly.
- F
- Thetotal number of transitions from S to S’ with the action h taken

- T
- T
- t

- The likelihood that such a sequence has occurred
- A product over all feasible transition tuples
- (S,a,S’) ∈ {P,J,1,2,3,...,KL + 1}×{s,h}×{P,J,1,2,3,...,KL +1}

- Define
- The following proposition gives the MLE of the parameters β, γ, and ρ

- Proposition1: Given ,S ∈and,S∈counted from history of transitions, the MLE of primary users’ parameters are

- The MLE of attackers’ parameters ρML is the unique root within an interval (0, 1/(KL + 1)) of the following (KL + 1) order polynomial
- Proof

- With transition probabilities specified in (4) – (7)
- The likelihood of observed transitions (11) can be decoupled into a product of three terms Λ = ΛβΛγΛρ

- BydifferentiatinglnΛβ,lnΛγ,lnΛρandequatingthemto0
- ObtaintheMLE(12)(13)and(14)

- To ensure that the likelihood is positive, ρ has to lie in the interval (0, 1/(K + 1))
- The left-hand side of equation (14) decreases monotonically and approaches positive infinity as ρ goes to 0
- The right-hand side increases monotonically and approaches positive infinity as ρ goes to 1/(KL + 1)

- After the learning period, the secondary user rounds M ·ρML to the nearest integer as an estimation of m
- Calculate the optimal strategy using the MDP approach described in the previous section

- Introduction
- RelatedWorks
- SystemModel
- OptimalStrategywithPerfectKnowledge
- MarkovModels
- MarkovDecisionProcess

- LearningtheParameters
- SimulationResults

- Communication gain R = 5
- Hopping cost C = 1
- Total number of bands M = 60
- Discount factor δ = 0.95
- Primary users’ access pattern
- β = 0.01, γ = 0.1

- When the threat from attackers are more stronger the secondary user should proactively hop more frequently
- Toavoid being jammed

- Always hopping:the secondary user will hop every time slot
- Staying whenever possible:the secondary user will always stay in the band unless the primary user reclaims the band or the band is jammed by attackers.