Litmus robust assessment of changes in cellular networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Litmus: Robust Assessment of Changes in Cellular Networks PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Litmus: Robust Assessment of Changes in Cellular Networks . Ajay Mahimkar , Zihui Ge, Jennifer Yates, Chris Hristov*, Vincent Cordaro*, Shane Smith*, Jing Xu*, Mark Stockert* AT&T Labs – Research* AT&T Mobility Services ACM CoNEXT 2013, Santa Barbara, CA . Cellular network changes.

Download Presentation

Litmus: Robust Assessment of Changes in Cellular Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Litmus robust assessment of changes in cellular networks

Litmus: Robust Assessment of Changes in Cellular Networks

  • Ajay Mahimkar, Zihui Ge, Jennifer Yates, Chris Hristov*,

  • Vincent Cordaro*, Shane Smith*, Jing Xu*, Mark Stockert*

  • AT&T Labs – Research* AT&T Mobility Services

  • ACM CoNEXT 2013, Santa Barbara, CA


Cellular network changes

Cellular network changes

Packet switched core

Circuit switched core

  • Network changes and assessment

    • Software upgrades, configuration changes, …

    • How does it impact user perception of service quality?

      • Voice and data connection attempts (Accessibility)

      • Successful termination of ongoing calls (Retainability)

      • Data throughput, Voice Erlangs, …

    • Extensive testing in labs before deployment in the field

    • However, no lab can fully replicate scale, complexity and diversity of large-scale operational networks

  • First Field Application (FFA)

    • Before rolling out the change network-wide, conduct small scale testing in operational network

Data

Voice

Radio Network Controller

Cell Tower

FFA


Impact assessment of ffa

Impact Assessment of FFA

FFA Change

Performance Impacts

Improvement

FFA Change

Analyze the performance impact of FFA

No change

FFA Change

Degradation

FFA pre/post impact analysis of service performance

  • Compare service performance after FFA with that of before

  • If FFA is successfully trialed and shows expected performance impacts, then it can be rolled out network-wide

  • Go/no-go decision is crucial

  • Challenges: external factors can make assessment difficult

Go/no-go decision for a wide-scale roll-out


Dependency on external factors

Dependency on external factors

Service performance in cellular networks is influenced by several external factors

  • Weather (heavy rainfall introduces obstruction for radio signals)

  • Terrain (Mountains/flat surfaces/tall buildings have different propagation properties)

  • User population densities and mobility patterns

  • Seasonal changes (foliage or leaves budding)

  • Traffic pattern changes (holidays, major events or trade shows)

  • Other network events (outages or maintenance activities in other parts of network)

Configuration change accidentally co-occurs with strong winds that negatively impacted service performance

Unnecessary roll-back of change without knowledge of impact of strong winds


Dependency on external factors1

Dependency on external factors

Yearly seasonality in Voice Retainability for UMTS cell towers due to foliage

Assessment of changes would be difficult because of seasonal changes

Degradations in Voice Accessibility across multiple RNCs due to severe storms and damaging hail during a tornado

Assessment of changes at RNCs would be difficult because of weather impact


Dependency on external factors2

Dependency on external factors

Dramatic traffic pattern change during holidays induces significant changes in Voice Retainability

Assessment of changes would be difficult because of traffic pattern changes

Pre/post impact analysis of FFA changes needs to account for the overshadowing effects of external factors

Improvements in Voice Retainability across a majority of cell towers due to software upgrade at an upstream RNC

Assessment of changes at cell towers would be difficult because of upstream RNC changes


Litmus idea study control comparison

Litmus idea – study/control comparison

  • Compare performance between study and control group

    • Study group – network elements where change is implemented

    • Control group – network elements without the change

  • Intuition

    • Performance at geographically nearby elements is correlated

    • External factor influences performance at both study and control

    • A performance impacting change at study will change the dependency between study and control

  • Challenges

    • Unrelated performance changes in a small number of control group member

    • Poor selection of control group

  • Litmus Solution

    • Robust spatial regression algorithm

    • Domain knowledge guided control group selection

Circuit switched core

Packet switched core

Voice

Data

Radio Network Controller (RNC)

Study

Control

Cell Tower

FFA

User Equipments


Litmus comparison to related work

Litmus comparison to related work

  • Study-group only analysis

    • Mercury [SIGCOMM’10], PRISM [CoNEXT’11], Spectroscope [NSDI’11], …

    • Does not account for impact of unrelated external factors

  • A/B testing – also known as split testing, control/treatment

    • Popular in web domains for data driven decision making [KDD’07,’12]

    • Web users randomly exposed to the two variants of experiment

    • Why doesn’t it apply in our context?

      • Tight coupling between experiment and assessment

      • Control group might be subject to other network events such as changes or unplanned outages

  • Difference in Differences (DiD)

    • Compare mean/median difference between study and control before and after the change

    • Why doesn’t it apply in our context?

      • Contamination of forecast due to poor selection of control group

      • Sensitivity to performance changes in a small number of control group

Control (A)

Study (B)

Feedback

Serving users at lake

Serving users at business location


Robust spatial regression

Robust spatial regression

Before change

After change

Study

Study

f

Repeat the procedure for multiple iterations

Multiple iterations of forecast difference comparison increases the robustness to a few bad members in the control group

Sampled Control

Sampled Control

Regression coefficients

f

f

Forecast study

Forecast study

Robust rank-order tests

Delta

Delta

Output: Degradation/Improvement/No change


Domain knowledge guided control group selection

Domain knowledge guided control group selection

  • Guidelines for control group selection

    • Subject to same external factors as the study group

    • Share similar properties with study group such as geographical proximity or configuration

  • Control group size

    • Not too large: difficult to capture similar impact due to external factor

    • Not too small: loose benefits of robustness in spatial regression analysis

  • Attributes for selection

    • Geographical distance using latitude/longitude and zip-code

    • Topological structure of the cellular network

    • Configuration settings such as software version, or equipment model

  • Predicates to select control group

    • Uni-variate – single attribute (for e.g., LTE cell towers within the same zip-code)

    • Multi-variate – combination of attributes (for e.g., UMTS cell towers with same RNC and same OS)


Litmus evaluation

Litmus evaluation

  • Evaluation conducted using data collected from operational cellular networks

    • Lack of complete ground truth makes evaluation extremely challenging

    • Two-step methodology

      • A-priori known changes and assessment by Engineering & Ops

      • Manually conducted before through visual inspection & analysis

      • Synthetic injection of changes in performance time-series at cell towers

    • Compare Litmus with Difference in Differences (DiD) and study-group only analysis

  • Accuracy computation

  • Result summary

    • Litmus outperformed study-group only analysis because of robustness to external factors

    • Litmus outperformed DiD because of robustness to a small number bad members in control group

Real examples

Thorough and exhaustive evaluation

ALGORITHM OUTCOME


Evaluation results

Evaluation results

  • Evaluation using known assessments

  • Evaluation using synthetic injection

Litmus outperforms DiD due to zero false negatives

  • Precision = TP / (TP + FP)

  • Recall = TP / (TP + FN)

  • True Negative = TN / (TN + FP)

  • Accuracy = (TP + TN) / (TP + TN + FP + FN)

Study group only analysis has poor accuracy due to high FP and FN

Compared to study group only analysis and DiD, Litmus is robust to external factors and accurately conducts the impact assessment


Litmus operational experiences

Litmus operational experiences

  • Litmus is being heavily used for FFA impact assessment in production cellular networks

    • Pre/post impact analysis across a wide variety of performance metrics

    • Outcome is used for a go or no-go decision for wide-scale deployment of FFA change

13


Impact of son during hurricane sandy

Impact of SON during hurricane Sandy

SON (Self Optimizing Network) features were being trialed on some cell towers

  • SON Capabilities: automated load balancing, neighbor discovery and self-configuration

  • Key question: How did SON perform during hurricane Sandy?

    • This question cannot be answered without comparison to control group

    • Control group has to be within the Sandy-impacted region

  • Both study and control group were impacted due to Sandy; however study group did better than control

    • The recovery on study group was also faster than on control group

    • SON did a good job ! SON features were rolled-out network-wide


Ffa to improve cell change success rate

FFA to improve cell change success rate

FFA change applied at a few RNCs

  • Expectation: Improvement in data retainability

  • Study-group *only* analysis would have led to improvement inference and recommendation made for nation-wide roll-out

  • After comparing to control, Litmus identified that improvement was really due to holidays

    • Traffic pattern changes induced improvements in data retainability across both study & control

    • FFA change thus was not inducing performance improvements

    • Decision was made not to roll-out based on Litmus results


Conclusions and future work

Conclusions and Future Work

  • Litmus – an automated tool for robust assessment of changes in cellular networks

    • Carefully accounts for external factors such as foliage, weather, holidays, or network events

    • New spatial regression algorithm for robust performance comparison of study versus control

    • Domain knowledge guided control group selection

    • Outperforms study-group only analysis and Difference in Differences (DiD)

  • Operational Experiences

    • Litmus is being used successfully in go/no-go decisions for wide-scale deployment of changes

    • Considerably improved the assessment accuracy and analysis time

  • Future Work

    • Continue to improve methodology for control group selection

    • Apply to other networks and services such as clouds, data centers

    • Extend Litmus to device specific monitoring – e.g., Apple iPhone, Samsung Galaxy or Nokia Lumia

16


Litmus robust assessment of changes in cellular networks

Thank You !Questions ?


  • Login