Optimal Defense Against Jamming Attacks in Cognitive Radio Networks
Download
1 / 49

Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the Markov Decision Process Approach. Yongle Wu, Beibei Wang, and K. J. Ray Liu . Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin . Agenda. Introduction Related Works System Model

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin ' - topper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the MarkovDecision Process Approach

Yongle Wu, Beibei Wang, and K. J. Ray Liu

Presenter:WayneHsiao

Advisor:Frank, Yeong-Sung Lin


Agenda
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Agenda1
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Introduction
Introduction Networks

  • Cognitive radio technology has been receiving a growing attention

  • In a cognitive radio network

    • Unlicensed users (secondary users)

    • Spectrumholders(primaryusers)

  • Secondary users usually compete for limited spectrum resources

    • Game theory has been widely applied as a flexible and proper tool to model and analyze their behavior in the network


Introduction1
Introduction Networks

  • Cognitive radio networks are vulnerable to malicious attacks

  • Security countermeasures

    • Crucial to the successful deployment of cognitive radio networks

  • We mainly focus on the jamming attack

    • One of the major threats to cognitive radio networks

    • Several malicious attackers intend to interrupt the communications of secondary users by injecting interference


Introduction2
Introduction Networks

  • Secondary user could hop across multiple bands in order to reduce the probability of being jammed

    • Optimal defense strategy

    • Markov decision process (MDP)

  • The optimal strategy strikes a balance between the cost associated with hopping and the damage caused by attackers


Introduction3
Introduction Networks

  • In order to determine the optimal strategy, the secondary user needs to know some information

    • the number of attackers

  • Maximum Likelihood Estimation (MLE)

    • A learning process in this paper that the secondary user estimates the useful parameters based on past observations


Agenda2
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Related works
Related Networks Works

  • The problem becomes more complicated in a cognitive radio network

    • Primary users’ access has to be taken into consideration

  • We consider the scenario

    • Asingle-radio secondary user

    • Defense strategy is to hop across different bands


Agenda3
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


System model
System Networks Model

  • A secondary user opportunistically accesses one of the predefined M licensed bands

  • Each licensed band is time-slotted

  • The access pattern of primary users can be characterized by an ON-OFF model


System model1
System Networks Model

  • Assume all bands share the same channel model and parameters

  • But different bands are used by independent primary users


System model2
System Networks Model

  • Secondary user has to detect the presence of the primary user at the beginning of each time slot


System model3
System Networks Model

  • Communication gain R

    • When the primary user is absent in that band

  • The cost associated with hoppingisC

  • We assume there are m (m ≥ 1) malicious single-radio attackers

  • Attackers do not want to interfere with primary users

    • Because primary users’ usage of spectrum is enforced by their ownership of bands


System model4
System Networks Model

  • On finding the secondary user

    • Attacker will immediately inject jamming power which makes the secondary user fail to decode data packets

  • We assume that the secondary user suffers from a significant loss L when jammed

  • When all the attackers coordinate to maximize the damage

    • they detect m channels in a time slot


System model5
System Networks Model

  • The longer the secondary user stays in a band, the higher risk to be exposed to attackers

  • At the end of each time slot the secondary user decides

    • to stay

    • to hop

  • The secondary user receives an immediate payoff U(n) in the nth time slot


System model6
System Networks Model

  • 1(.) is an indicator function

    • Returning 1 when the statement in the parenthesis holds true

    • 0 otherwise


System model7
System Networks Model

  • Average Payoff Ū

    • The secondary user wants to maximize

    • Malicious attackers want to minimize

  • The discount factor δ (0 < δ < 1) measures how much the secondary user values a future payoff over the current one


Agenda4
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Optimal strategy with perfect knowledge
Optimal Networks StrategywithPerfectKnowledge

  • Attackstrategy

    • Attackers coordinately tune their radios randomly to m undetected bands in each time slot

    • When either all bands have been sensed or the secondary user has been found and jammed

  • The jamming game can be reduced to a Markov decision process

    • We first show how to model the scenario as an MDP

    • Then solve it using standard approaches


Optimal strategy with perfect knowledge1
Optimal Networks StrategywithPerfectKnowledge

  • At the end of the nth time slot

    • The secondary user observes the state of the current time slot S(n)

    • And chooses an action a(n)

      • Whether to tune the radio to a new band or not, which takes effect at the beginning of the next time slot

  • S(n) = P

    • The primary user occupied the band inthenthtimeslot

  • S(n) = J

    • The secondary user was jammedinthenthtimeslot


Optimal strategy with perfect knowledge2
Optimal Networks StrategywithPerfectKnowledge

  • a(n) = h

    • The secondary user to hop to a new band

  • The secondary user has transmitted a packet successfully in the time slot

    • ‘to hop’ (a(n) = h)

    • ‘tostay’ (a(n) = s)

  • S(n) = K

    • This is theKthconsecutiveslotwithsuccessfultransmission in thesameband


Optimal strategy with perfect knowledge3
Optimal Networks StrategywithPerfectKnowledge

  • The immediate payoff depends on both the state and the action

  • p(S’|S, h)

    • The transition probability from an old state S to a new state S’ when taking the action h

  • p(S’|S, s)

    • The transition probability from an old state S to a new state S’ when taking the action s


Optimal strategy with perfect knowledge4
Optimal Networks StrategywithPerfectKnowledge

  • If the secondary user hops to a new band, transition probabilities do not depend on the old state

  • The only possible new states are

    • P (the new band is occupied by the primary user)

    • J (transmission in the new band is detected by an attacker)

    • 1 (successful transmission begins in the new band)


Optimal strategy with perfect knowledge5
Optimal Networks StrategywithPerfectKnowledge

  • When the total number of bands M is large

    • M ≫ 1

  • Assume that the probability of primary user’s presence in the new band equalthesteady-stateprobabilityoftheON-OFFmodel

    • Neglecting the case that the secondary user hops back to some band in very short time,


Optimal strategy with perfect knowledge6
Optimal Networks StrategywithPerfectKnowledge

  • The secondary user will be jammed with the probability m/M

    • Each attacker detects one band without overlapping

  • Transition probabilities are


Optimal strategy with perfect knowledge7
Optimal Networks StrategywithPerfectKnowledge

  • Note that s is not a feasible action when the state is in J or P

  • At state K, only max(M−Km,0) bands have not been detected by attackers

    • But another m bands will be detected in the upcoming time slot

    • The probability of jamming conditioned on the absence of primary user


Optimal strategy with perfect knowledge8
Optimal Networks StrategywithPerfectKnowledge

  • To sum up, transition probabilities associated with the action s are as follows: ∀K ∈ {1,2,3,...}


Agenda5
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Markov decision process
Markov Networks DecisionProcess

  • If the secondary user stays in the same band for too long, he/she will eventually be found by an attacker

    • p(K + 1|K,s) = 0 if K > M/m − 1

  • Therefore, we can limit the state S to a finite set ,where


Markov decision process1
Markov Networks DecisionProcess

  • An MDP consists of four important components

    • a finite set of states

    • a finite set of actions

    • transition probabilities

    • immediate payoffs

  • The optimal defense strategy can be obtained by solving the MDP


Markov decision process2
Markov Networks DecisionProcess

  • A policy is defined as a mapping from a state to an action

    • π : S(n) → a(n)

  • A policy π specifies an action π(S) to take whenever the user is in state S

  • Among all possible policies, the optimal policy is the one that maximizes the expected discounted payoff


Markov decision process3
Markov Networks DecisionProcess

  • The value of a state S is defined as the highest expected payoff given the MDP starts from state S

  • The optimal policy is the optimal defense strategy that the secondary user should adopt since it maximizes the expected payoff


Markov decision process4
Markov Networks DecisionProcess

  • After a first move the remaining part of an optimal policy should still be optimal

  • The first move should maximize the sum of immediate payoff and expected payoff conditioned on the currentaction

    • Bellman equation


Markov decision process5
Markov Networks DecisionProcess

  • Critical state K*(K∗≤ )

  • K∗ can be obtained from solving the MDP, and the optimal strategy becomes


Agenda6
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Learning the parameters
Learning Networks theParameters

  • A learning scheme

    • Maximum Likelihood Estimation (MLE)

  • The secondary user simply sets a value as an initial guess of the optimal critical state K∗

  • And follows the strategy (10) with the estimate during the whole learning period


Learning the parameters1
Learning Networks theParameters

  • This guess needs not to be accurate

  • After the learning period,the secondary user updates the critical state K∗ accordingly.

  • F

    • Thetotal number of transitions from S to S’ with the action h taken

  • T

  • T

  • t


Learning the parameters2
Learning Networks theParameters

  • The likelihood that such a sequence has occurred

    • A product over all feasible transition tuples

    • (S,a,S’) ∈ {P,J,1,2,3,...,KL + 1}×{s,h}×{P,J,1,2,3,...,KL +1}

  • Define

  • The following proposition gives the MLE of the parameters β, γ, and ρ


Learning the parameters3
Learning Networks theParameters

  • Proposition1: Given ,S ∈and,S∈counted from history of transitions, the MLE of primary users’ parameters are


Learning the parameters4
Learning Networks theParameters

  • The MLE of attackers’ parameters ρML is the unique root within an interval (0, 1/(KL + 1)) of the following (KL + 1) order polynomial

  • Proof


Learning the parameters5
Learning Networks theParameters

  • With transition probabilities specified in (4) – (7)

  • The likelihood of observed transitions (11) can be decoupled into a product of three terms Λ = ΛβΛγΛρ


Learning the parameters6
Learning Networks theParameters

  • BydifferentiatinglnΛβ,lnΛγ,lnΛρandequatingthemto0

    • ObtaintheMLE(12)(13)and(14)

  • To ensure that the likelihood is positive, ρ has to lie in the interval (0, 1/(K + 1))

    • The left-hand side of equation (14) decreases monotonically and approaches positive infinity as ρ goes to 0

    • The right-hand side increases monotonically and approaches positive infinity as ρ goes to 1/(KL + 1)


Learning the parameters7
Learning Networks theParameters

  • After the learning period, the secondary user rounds M ·ρML to the nearest integer as an estimation of m

  • Calculate the optimal strategy using the MDP approach described in the previous section


Agenda7
Agenda Networks

  • Introduction

  • RelatedWorks

  • SystemModel

  • OptimalStrategywithPerfectKnowledge

    • MarkovModels

    • MarkovDecisionProcess

  • LearningtheParameters

  • SimulationResults


Simulation result
Simulation Networks Result

  • Communication gain R = 5

  • Hopping cost C = 1

  • Total number of bands M = 60

  • Discount factor δ = 0.95

  • Primary users’ access pattern

    • β = 0.01, γ = 0.1


Simulation result1
Simulation Networks Result

  • When the threat from attackers are more stronger the secondary user should proactively hop more frequently

    • Toavoid being jammed


Simulation result2
Simulation Networks Result

  • Always hopping:the secondary user will hop every time slot

  • Staying whenever possible:the secondary user will always stay in the band unless the primary user reclaims the band or the band is jammed by attackers.


ad