240 likes | 698 Views
Overview. Statistical MethodUses Supervised Machine learning Uses only flow recordsBased on descriminators of the flows - port, inter-packet gap etc
E N D
1. Internet Traffic Classification Using Bayesian Analysis Techniques Presentation by Umamaheswararao K
2. Overview Statistical Method
Uses Supervised Machine learning
Uses only flow records
Based on descriminators of the flows
- port, inter-packet gap etc
Applies Nave Bayesian techniques
Reasonably high accuracy
3. Machine Learned Classification Deterministic Approach
Assigns data points to one of mutually exclusive classes
Probabilistic Approach
assigns the flow with probabilties of belonging to certain class
- Current technique falls into this category
4. Probabilistic Approach: Can Identify similar Characteristics of flows after their probabilistic class assignment
Robust to measurement error
Provides a mechanism for quantifying class assignment probabilities
Available in many implementations
5. Terminology Objects:
Entities to be classfied here traffic-flows which is a tuple of src/dst IP, protocol, src/dst port
Discriminators:
Characteristics parameterizing the flow behaviour flow duration, TCP port etc
- Here only complete TCP connections are considered
6. Discriminators/Categories
7. Analysis Tools Nave Bayesian Classifier
8. Bayes Tech: Contd.. Assumptions Discriminators Independent
TCP header length proportional to pak len or vice versa
Discriminator distribution is assumed to be normal (Gaussian)
- Distribution can be multimodal
9. Example
10. Example: contd
11. Nave Bayes: Kernel Estimation Descriminator distribution is not Gaussian
12. Nave Bayes vs Kernel
13. Descriminator selection Remove Irrelevant descriminators
Cannot differentiate the class
Same distribution for all classes
Remove Redundant descriminators
highly correlated with another discriminator
14. Descriminator reduction: Filter
-Uses characteristics of training data to see how relevant the descriminator to the class
- degree of correlation b/w discriminator & class
Wrapper
-uses results of a classifier to build optimal set
15. FCBF Fast-correlation based filter for discriminator filtering
Two stage process
Identifying the relevance of a discriminator
Identifying the redundancy of a feature with respect to discriminators
16. Results
17. Results: contd.. Accuracy: Correctly classified flows/Total number of flows
Trust: Probability that a flow that has been classified into some class in fact from this class
18. Nave Bayes- Trust
19. Trust: Kernel est.
20. Results for new data set
21. Identification of discriminators
22. Strengths Payload access not needed
Some mentioned in earlier slides
High accuracy and Trust with FCBF
Easily implementable
Single flow based (a strength and a weakness)
Allows any categorization
23. Weaknesses Bunch of them but then ?
Accuracy/Trust depends mainly on how good the training set is
Trust of some classes is really poor
works on flow based, characterization some flows require to see many flows (eg. Attacks)
Temporal stability is not really good
Discriminators are dependent on network dynamics
24. Weaknesses: Contd Training is not automatic
Assumes discriminator independence
Gaussian distribution assumption inaccurate
25. Future Work A significantly new approach hence can lead to many ideas
Spatial independence of traffic classification
Check from weaknesses section