- 56 Views
- Uploaded on
- Presentation posted in: General

A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Call Admission Control for Service Differentiation and Fairness Managementin WDM Grooming Networks

Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris

BroadNet 2004 proceeding

Presented by Zhanxiang

February 7, 2005

- Goal:
- Fairness control and service differentiation in a WDM grooming network. Also maximizing the overall utilization.

- Contributions:
- An optimal CAC policy providing fairness control by using a Markov Decision Process approach;
- A heuristic decomposition algorithm for multi-link and multi-wavelength network.

- DTMC
- DTMDP
- We focus on DTMDP because CTMDP usually solved by discretization.

Originate from Professor Malathi Veeraraghavan’s slides.

Originate from Professor Malathi Veeraraghavan’s slides.

- Two states i and j communicateif for some n and n’, pij(n)>0 and pji(n’)>0.
- A MC is Irreducible, if all of its states communicate.
- A state of a MC is periodicif there exists some integer m>0 such that pii(m)>0 and some integer d>1 such that pii(n)>0 only if d|n.

Originate from Professor Malathi Veeraraghavan’s slides.

Originate from Professor Malathi Veeraraghavan’s slides.

Probability Theory

+

Utility Theory

=

Decision Theory

Describes what an agent should believe based on evidence.

Describes what an agent wants.

Describes what an agent should do.

Originate from David W. Kirsch’s slides

- MDP is defined by:
State Space:S

Action Space:A

Reward Function: R: S {real number}

Transition Function:T: SXA S (deterministic)

T: SXA Power(S) (stochastic)

The transition function describe the effect of an action in state s. In this second case the transition function has a probability distribution P(s’|s,a) on the range.

Originate from David W. Kirsch’s slides and modified by Zhanxiang

- MDP is like a DTMC, except the transition matrix depends on the action taken by the decision maker (a.k.a. agent) at each time step.
Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]

Action a

MDP

Current state s

DTMC

Next state s’

- Stochastic Actions:
- T : S X A PowerSet(S)
For each state and action we specify a probability distribution over next states, P( s’ | s, a).

- T : S X A PowerSet(S)
- Deterministic Actions:
- T : S X A S
For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.

- T : S X A S

- Assume we assign reward U(s) to each state s
- Expected Utility for an action a in state s is
- MEU Principle: An agent should choose an action that maximizes the agent’s EU.

EU(a|s) = s’ P(s’ | s, a) U(s’)

Originate from David W. Kirsch’s slides and modified by Zhanxiang

- Policy: a mapping from S to A, π : SA
- Following policy procedure:
1. Determine current state s

2. Execute action π(s)

3. Repeat 1-2

Originate from David W. Kirsch’s slides modified by Zhanxiang

- In deterministic processes, solution is a plan.
- In observable stochastic processes, solution is a policy
- A policy’s quality is measured by its EU

Notation:

π ≡ a policy

π(s) ≡ the recommended action in state s

π* ≡ the optimal policy

(maximum expected utility)

Originate from David W. Kirsch’s slides and modified by Zhanxiang

- In the definition of MDP we introduce R(s), which obviously depends on some specific properties of a state.
- Shall we let U(s)=R(s)?
- Often very good at choosing single action decisions.
- Not feasible for choosing action sequences, which implies R(s) is not enough to solve MDP.

- How to add rewards?
- simple sum

- mean reward rate

Problem: Infinite Horizon infinite reward

- discounted rewards

R(s0,s1,s2…) = R(s0) + cR(s1) + c2R(s2)… where 0<c≤1

Originate from David W. Kirsch’s slides modified by Zhanxiang

- Define Uπ(s) is specific to each π
Uπ(s) = E(tR(st)| π, s0=s)

- Define U(s)= Maxπ {Uπ(s) }= Uπ*(s)
- We can calculate U(s) on the base of R(s)
U(s)=R(s) + max P(s’|s,π(s))U(s’)

π s’

Bellman equation

If we solve the Bellman equation for each state, we will have solved the optimal policy π* for the given MDP on the base of U(s).

Originate from David W. Kirsch’s slides and modified by Zhanxiang

- We have to solve |S| simultaneous Bellman equations
- Can’t solve directly, so use an iterative approach:
1. Begin with an arbitrary utility function U0

2. For each s, calculate U(s) from R(s) and U0

3. Use these new utility values to update U0

4. Repeat steps 2-3 until U0 converges

This equilibrium is a unique solution! (see R&N for proof)

Originate from David W. Kirsch’s slides

- The author’s idea of using MDP is great, I’m not comfortable with state space definition and the policy definition.
- If I were the author, I will define system state space and policy as follows:
- S’ = S X E
where S={(n1, n2, … , nk) | tknk<=T} and

E={ck class call arrivals} U {ck class call departures} U {dummy events}

- Policy π : SA

- S’ = S X E

OADM: Optical Add/Drop Multiplexer

WC: wavelength converter

TSI: time-slot interchanger

L: # of links a WDM grooming network contains

M: # of origin-destination pairs the network includes

W: # of wavelengths in a fiber in each link

T: # of time slots each wavelength includes

K: # of classes of traffic streams

ck: traffic stream classes differ by their b/w requirements

tk: # of time slots required by class ck traffic to be established

nk: # of class ck calls currently in the system

- For each o-d pair, class ck arrivals are distributed according to a Poisson process with rate λk.
- The call holding time of class ck is exponentially distributed with mean 1/μk . Unless otherwise stated, we assume 1/μk = 1.
- Any arriving call from any class is blocked when no wavelength has tk available time slots.
- Blocked calls do not interfere with the system.
- The switching nodes are non-blocking

No preemption

- There is no significant difference between the blocking probabilities experienced by different classes of users;

- Complete Sharing (CS)
- No resources reserved for any class of calls;
- Lower b/w requirement & higher arrival rate calls may starve calls with higher b/w requirement and lower arrival rate;

- Complete Partitioning
- A portion of resources is dedicated to each class of calls;
- May not maximize the overall utilization of available resources.

Not Fair

Fair but

- System stat space S:
S={(n1, n2, … , nk) | tknk <= T}

k

- Operators:
- Aks = (n1, n2, … , nk+1, … , nK)
- Dks = (n1, n2, … , nk-1, … , nK)
- AkPas = (n1, n2, … , nk+a, … , nK)

- Sampling rate
v = ([T/tk]μk+k)

k

- Only one single transition can occur during each
time slot.

- A transition can correspond to an event of
- 1) Class ck call arrival
- 2) Class ck call departure
- 3) Fictitious or dummy event
(caused by high sampling rate)

Reward function R:

Value function

Optimal value function:

Optimal Policy:

Value iteration to compute Vn(s)

- Action decision:
If Vn(AkP1s) >= Vn(AkP0s)

then a=1;

else a=0;

Basing on the equation below.

- The author’s idea of using MDP is great

- Step 1: For each hop i, partition the set of available wavelengths into subsets, dedicated to each of o-d pairs using hop i.
- Step 2: Assume uniformly distributed among the Wm wavelengths, thus, the arrival rate of class ck for each of the Wm wavelengths is given by: λk/Wm.

- Step 3: Compute the CAC policy with respect to λk/Wm.
- Step 4: Using the CAC policy computed in Step 3, we determine the optimal action for each of the Wm wavelengths, individually.

- We can utilize MDP to model our bandwidth allocation problem in call admission control to achieve fairness;
- But in heterogeneous network the bandwidth granularity problem is still there;

- Under some conditions the optimal policy of an MDP exists.

- Other MDP representations

Markov Assumption:

The next state’s conditional probability depends only on a finite history of previous states (R&N)

kth order Markov Process

- Andrei Markov (1913)

- Markov Assumption:
The next state’s conditional probability depends only on its immediately previous state (J&B)

1st order Markov Process

The definitions are equivalent!!!

Any algorithm that makes the 1st order Markov Assumption can be applied to any Markov Process

Originate from David W. Kirsch’s slides

- A Markov Decision Process (MDP) model contains:
- A set of possible world states S
- A set of possible actions A
- A real valued reward function R(s,a)
- A description T(s,a) of each action’s effects in each state.

- A Markov Decision Process (MDP) is just like a Markov Chain, except the transition matrix depends on the action taken by the decision maker (agent) at each time step.
Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]

- The agent receives a reward R(s,a), which depends on the action and the state.
- The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some function of the sequence of rewards (e.g., the mean or expected discounted sum).

- Stochastic Actions:
- T : S X A PowerSet(S)
For each state and action we specify a probability distribution over next states, P( s’ | s, a).

- T : S X A PowerSet(S)
- Deterministic Actions:
- T : S X A S
For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.

- T : S X A S

Action a

MDP

Current state s

Next state s’

DTMC

- A policy π is a mapping from S to A
π : S A

- Assumes full observability: the new state resulting from executing an action will be known to the system

- How good is a policy π in the term of a sequence of actions?
- For deterministic actions just total the rewards obtained... but result may be infinite.
- For stochastic actions, instead expected total reward obtained… again typically yields infinite value.

- How do we compare policies of infinite value?

- A value function, Vπ: S Real, represents the expected objective value obtained following policy from each state in S .
- Bellman equations relate the value function to itself via the problem dynamics.

Can’t solve directly, so use an iterative approach:

1. Begin with an arbitrary utility vector V;

2. For each s, calculate V*(s) from R(s,π) and V;

3. Use these new utility values V*(s) to update V;

4. Repeat steps 2-3 until V converges;

This equilibrium is a unique solution!