Loading in 5 sec....

Packet Classification # 3PowerPoint Presentation

Packet Classification # 3

- 85 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Packet Classification # 3' - evette

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Packet Classification # 3

Ozgur Ozturk

CSE 581: Internet Technology

Winter 2002

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Introduction

- Importance
- Identify the context of packets
Apply necessary actions

- Differentiated services

- Identify the context of packets
- Memory and Time Efficiency
- Must handle Ks of rules
- Must be at wire-speed (No queuing)

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Packet Classification # 3Paper List

- T. Lakshman, D. Stiliadis, "High-Speed Policy-based Packet Forwarding Using Efficient Multi-dimensional Range Matching” [Bit-Parallelism]
- http://www.bell-labs.com/user/stiliadi/filter/paper.html

- F. Baboescu, G. Varghese, "Scalable Packet Classification” [ABV: Agregated Bit Vector]
- M. Buddhikot, S. Suri, M. Waldvogel, "Space Decomposition Techniques for Fast Layer-4 Switching“ [Space Decomposition]
- V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel, "Fast and Scalable Layer Four Switching“ [Paper4]

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism Paper-Intro.

- Presents packet classification schemes
- traffic-independent and worst-case performance metric
- a few K rules, at rates of M packets per second using range matches on more than 4 packet header fields

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperRequirement for Real-Time Operation

- Traditional router architectures
- flow-cache architectures to classify packets
- identified flows are expected to arrive in near future
- Current backbone routers
- active flows extremely high
- OC-3 links, 256K flows

- Cashes implemented as hash tables
- scales well to that size

- active flows extremely high

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperRequirement for Real-Time Operation 2 - Hash-Table Prob.s

- Good hash function is non-trivial
- 100 to 200 bits of header to be randomly distributed to no more than 20 to 24 bits of hash index
- header value distribution is unknown

- Performance of cache-based schemes is heavily traffic dependent
- Malicious Users
- limitations of hashing algo. & cashing techniques

- Packet queuing delays acceptable after classification

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperPacket Classification Constraints

- Scale to large routers with Gigabit links.
- Process at wire-speed
- 75% of packets < typical TCP packet size (552 bytes)
- Nearly half are 40 to 44 bytes (TCP Ack)

- Rules on several fields, specifying ranges, exact matches and prefixes
- Two prefix fields in some cases

- Allow arbitrary priorities for policies to allow distinction for multiple matches
- Optimize for lookups, sacrifice update performance
- lookup rate/update rate 107.

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperPacket Classification Constraints-2

- Memory access time; dominant factor in worst-case lookup execution time
- Amenable to hardware implementation
- Time vs. Space

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperGeneral Packet Classification

- Decomposable search to perform multi-dimensional search for packet filtering
- k-dimensional query a set of 1-dimensional queries on 1-dimensional intervals
- Exploit parallelism where possible
- Seek poly-logarithmic solution

- Packet header fields k-dimensions
- Filters overlapping regions in the k-dimensional space

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperEfficiency of Proposed Algorithms

- 1st Algorithm
- Memory: k*n2O(n) bits per dimension
- Time: log(2n)+1
- Memory access: n/w

- 2nd Algorithm
- Memory reduce to O(n log n) bits
- Time increase constant
- Can be optimized for time and memory budget
- Exploit on-chip memory in traffic-independent manner, to speed up worst case.

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Notation

- Rule rm in k dimentions
- rm = (e1,m, e2,m,…. ek,m)
- e range

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 1

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 2

Max 2n+1 intervals for n rules

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 3

Sets of rules formed corresponding to each region

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperAlgorithm demo on 2-D/Online 1

- P1 (x*,y*) to be classified
- find intervals x* and y* belongs to
- binary search log(2n+1)+1 comparisons/dimension

- Create Intersection of all sets
- conjunction of corresponding bit vectors

- Highest Priority entry in the resultant bit vector

- find intervals x* and y* belongs to

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism PaperAlgorithm demo on 2-D/Online 2

- Max Set Cardinality = O(n)
- Intersection step examines all rules at least ones Time complexity = O(n)
- With bit-level parallelism
- The bitmaps representing sets stored in a (2n+1)*n array Bj[i,1..n] (Ri,j set stored for each dimension)
- k*n/w memory accesses

- Different processing elements for each dimension in hardware implementation
- Prototype

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Different processing elements for each dimension in hardware implementation Prototype

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism Paper- Algorithm 2 implementationPacket Class. based on Inc. Reads

- Algorithm utilizes incremental reads to reduce required memory
- Allows time-space optimization and increases localization for off-chip SDRAM and wide on-chip memory implementations
- Consider a specific dimension j
- Assume maximum 2n+1 non-overlapping intervals
- Corresponding to intervals in an n-bit bitmap with the positions of the 1s indicating the filter rules that overlap this interval
- Adjacent intervals’ corresponding bitmaps differ in only one bit
- A single bitmap and 2n pointers of size log n to the differing bits can be used to reconstruct any bitmap

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism Paper- Algorithm 2 implementationPacket Class. based on Inc. Reads 2 Further Generalize Trade off decision according to on-chip/off-chip memory ratio.

- Reduces space requirement to O(n log n) from O(n2)

- (2n+1)/l bitmaps instead of 1
- (2n+1)/2l pointers needed
- Choose l by need
- 2n+1 memory reduce to O(n log n)
- Memory access increase n/w2n log n /w

- 2n+1 memory reduce to O(n log n)

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism Paper- Algorithm 2 implementationSpecial Case: 2-D Classification

- Necessary for best-effort traffic aggregation in Internet backbone
- Determine next hop and resource allocations based on destination and source addresses only
- Longest prefix match lookups
- Restrict source prefix ranges to powers of 2 in order to reduce space
- space requirement O(n) with trie implementation

- Longest prefix match lookups
- Virtual intervals
- Map intervals of prefix lengths to both dimensions, sorted by length
- “Virtual Intervals” allow worst-case lookup time of O(ls+log n) where ls is the number of possible prefix lengths

- Multicast group identification requires only two additional memory accesses

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Bit-Parallelism Paper- Algorithm 2 implementationConclusions

- Packet classification, or filtering, is a useful primitive in connectionless networks to provide differentiated service and policy-based routing
- More recently, security and active processing
- Two multi-dimensional range matching algorithms allow millions of packets per second to be processed on a set of thousands of filter rules
- Robust and predictable worst-case performance

- Efficient 2-D algorithm for backbone routers with hundreds of thousands of routing entries
- Algorithms demonstrate that there may be no need to restrict filtering to edge routers

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Paper4 implementation Layer Four Switching

- Traditional router performs looking-up based on destination address
- Layer four switching provides increased flexibility: it gives a router the capability to distinguish and deal with traffics differently:
- Block traffic from dangerous site
- Provide QoS service for certain traffics
- Give preferential treatment to certain traffic (say, database flow).

- Difficulties: need layer four header information, which may not always available
- any modification of layer four header may cause problems
- Do not how to get header info when encrypted

- Some variants of L4S:
- Firewall
- Reservation protocols such as RSVP
- Routing based on traffic type, say web traffic

Paper4 implementationThe Best Matching Filter Problem

- A packet P has k distinct header fields for lookup: H[1], … , H[k]
- The filter database of a Layer 4 Router consists of a finite set of filters: F1, F2, …, FN, each filter Fi has an associated directive acti
- Match: each field of P matches the corresponding field of F
- Cost: used to determine an unambiguous match (say order of filters)
- An address range can always be transferred into a sequence of prefixes so we can use prefix match

A filter database

Dest

Src

DP

SP

SP

M

M

M

M

T1

*

Net

*

*

*

S

*

T0

Net

*

*

25

53

53

23

123

*

*

*

*

*

*

*

123

*

*

*

*

UDP

*

*

UDP

*

TCP-ACK

*

A packet example:

(M, S, UDP, 53, 125)

Paper4 implementationSet Pruning Trees (1)

- Build a trie on the destination prefixes in the database
- Each valid prefix in the destination trie points to a trie containing some source prefixes.
- A single filter may be fit into multiple destination prefixes, thus has multiple source trie copies.
- Memory space: O(N2)
- Time complexity: O(N)

Set Pruning Trees (2) implementation

0

1

Dest-Trie

0

0

Src-Trie

0

1

0

1

0

0

1

F3

F4

F3

E.g.: Looking for: (001, 001)

0

1

0

1

0

1

0

1

0

F6

0

F7

F2

F1

F5

F7

F2

F1

F7

F7

Avoid the Memory Blowup (1) implementation

- Avoid the copying by having each destination prefix D point to a source trie that stores the filters whose destination field is exactly D
- When searching, may need go back to the destination trie for multiple times
- Time complexity: O(W2)
- Space complexity: O(NW)

Avoid the Memory Blowup (2) implementation

0

1

Dest-Trie

0

0

1

0

1

0

1

E.g.: Looking for: (001, 001)

F3

F4

1

1

0

F6

0

Src-Trie

F5

F2

F1

F7

Memory requirement=O(NW)

Lookup Worst Case= O(W2)

Improving Search Time: Basic Grid-of-Tries (1) implementation

- Basic idea:
- Use pre-computation and switch pointers (in the lower lever tries) to speed up search in a later source trie base on the search in an earlier source trie. (Remember the previous searching result)

- Role of switch pointer
- Allow us to increase the length of the matching source prefix, without having to restart at the root of the next ancestor source trie.
- Stored Filter: node (D,S) stores the least cost filter whose dest field is a prefix of D and src field is a prefix of S

- Time complexity: 2W
- Space complexity: O(NW)

Improving Search Time: Basic Grid-of-Tries (2) implementation

0

1

Dest-Trie

0

0

0

1

0

0

1

0

1

E.g.: Looking for: (001, 001)

x

F3

F4

0

0

1

1

0

F6

0

Src-Trie

y

F5

F2

F1

F7

Further Improvement & Extension implementation

- Use some faster scheme for destination address matching
- Time complexity O(W) O(log W)

- Use multi-bit tries for source address matching
- Time complexity O(W) O(W/k)

- Extend Grid-of-tries to handle protocol and port fields
- 3 GOT copies for TCP, UDP and OTHER respectively,
- 4 hash tables for 4 port combinations:
- both unspecified, destination only, source only, both specified

Cross-Producting (1) implementation

- How-to
- Slice filter database into column, the i-th column storing all distinct prefixes in field i.
- Make a cross-product table of all k columns
- Pre-compute the least cost filter that matches each cross-product entry
- When packet comes in, do best prefix matching for each field respectively
- With matching results, find out the corresponding entry in the cross-product table

- Discussion
- Very fast (for matching)
- Problem: memory explosion: N^k
- Solution: On Demand Cross-Producting

Cross-Producting (2) implementation

Dest

Src

DP

SP

SP

Dest

Prefix

Src

Prefix

DestPort

Prefix

SrcPort

Prefix

Flags

Prefixes

M

M

M

M

T1

*

Net

*

*

*

S

*

T0

Net

*

*

25

53

53

23

123

*

*

*

*

*

*

*

123

*

*

*

*

UDP

*

*

UDP

*

TCP-ACK

*

123

Default

M

T1

Net

Default

S

T0

Net

Default

25

53

23

123

Default

UDP

TCP-ACK

Default

Num

CrossProduct

Matching Filter

F1

F1

F1

F1

F1

F1

…

F8

F8

1

2

3

4

5

6

…

479

480

M, S, 25, 123, UDP

M, S, 25, 123, TCP-ACK

M, S, 25, 123, default

M, S, 25, default, UDP

M, S, 25, default, TCP-ACK

M, S, 25, default, default

… …

default,default,default,default,TCP-ACK

default,default,default,default,default

E.g. Looking for:

(M,S,UDP,25,57)

Conclusions implementation

- GOT solution scalable (linear) storage & fast lookups for D-S filters.
- More general filters high lookup cost

- Cross-Producting solution, higher variance, but faster on average (for lookup) because of cashing need.
- Hybrid scheme combines flexibility with efficiency.

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

ABV: implementation "Scalable Packet Classification” F. Baboescu, G. Varghese,

- GOAL
- Packet classification
- scalable (in rules, upto 100,000)
- wire speed

- Packet classification
- Past Work
- Linear time search
- Linear amount of TCAMS
- Lucent scheme
- worst case doesn't scale

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

SOLUTION implementation

- Aggregated Bit Vector
- improvement on Lucent bit vector
- rule aggregation
- rule rearrangement

- Rule Aggregation
- bit vectors are sparse
- i.e., few rules match

- Some compression scheme

- bit vectors are sparse

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

SOLUTION continued implementation

- Rule Rearrangement
- overlap is rare
- place rules w/ common values together
- sort out rule ordering later

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Comparing ABV w/ BV of Lucent implementation

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Results implementation

- At least an order magnitude faster than BV
- Scales well for memory access

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Paper # 3 implementation“Space Decomposition Techniques for Fast Layer-4 Switching" M. Buddhikot, S. Suri, M. Waldvogel

- new scheme, based on space decomposition, whose search time is comparable to the best existing schemes, but which also offers fast worst-case filter update time.
- three key ideas
- innovative data-structure based on quadtrees for a hierarchical representation of the recursively decomposed search space
- fractional cascading and precomputation to improve packet classification time
- prefix partitioning to improve update time

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Space Decomposition Evaluation implementation

- Depending on the actual requirements of the system this algorithm is deployed in, a single parameter can be used to tradeoff search time for update time.
- Amenable to fast software and hardware implementation.
- For Ntwo-dimensional filters specified using prefixes of up to W bits in length, Area-based Quadtrees (AQT) data structure requires O(N)space, O(W) search time, and O((N)1/)
- Both the average and worst-case search times and memory consumption are comparable or better than other schemes known in the literature.

Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

Download Presentation

Connecting to Server..