1 / 15

A scalable multithreaded L7-filter design for multi-core servers

A scalable multithreaded L7-filter design for multi-core servers. Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08) Presenter : JHAO-YAN JIAN

orien
Download Presentation

A scalable multithreaded L7-filter design for multi-core servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A scalable multithreaded L7-filter design for multi-core servers Authors:Danhua Guo、 Guangdeng Liao、Laxmi N. Bhuyan、Bin Liu、Jianxun Jason Ding Conf. :The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08) Presenter : JHAO-YAN JIAN Date : 2010/11/10

  2. Introduction • Traditional packet classifications make thedecision based on packet header information. But manyapplications, such asP2P and HTTP, hide their applicationcharacteristics in the payload. • The original L7-filter is a sequential DPI(Deep packet Inspection) program that identifies protocol information in a given connection. • Traditional single core server is insufficient to satisfy DPI functionality. (high speed networks, such as 10 Gigabit Ethernet) • In spite of its enhanced processing power, efficient core utilization in a multi-core architecture remains a challenge.

  3. Introduction • Network traffic in original L7-filter is captured by Netfilter, which consists of a set of hooks inside the Linux kernel that allows kernel modules to traverse the network stack. • Inside the network stack of the kernel, a series of operations are executed to establish a connection buffer based on 5-tuple connection information in the packet header. • Operations : TCP/IP packets checksum verification, TCP/IP reassembling, IP refragmentation, etc . • After such a preprocessing stage. L7-filter starts to match all the application layer data of the arriving packets in the same connection against the protocol database in a sequential fashion.

  4. Decoupling Linux L7-filter operations • Previous research from both academia and industry have demonstrated that the performance of L7-filter is bounded by the cost of pattern matching. • Therefore, the authors have developed a decoupled model to separate the packet arrival handling and focus on optimizing the pattern matching operations at the application layer. • Toparallelize the L7-filter operations based on a user space version.

  5. Modeling Single-Threaded L7-filter • choose libnids as a user space module. • Libnids reads tcpdump trace files and simulates kernel network stack behaviors in user space. • Libnids offers IP defragmentation, TCP stream assembly and TCP port scan detection. • The original online L7-filter is substituted by a combination of a Preprocessing Thread(P T) and a Matching Thread(M T) . • At any point of processing, a connection can only have one of the three statuses: • 1 ) MATCHED or 2) NO_MATCH • 3) NO_MATCH_YET.

  6. Modeling Single-Threaded L7-filter 4 3 2 1 1+2 1

  7. Parallelizing L7-filter at Connection Level • Once more MTs are created, each MT executes on a connection buffer basis. When a new packet is reassembled for a connection, randomly selecting a non-empty runqueue of a thread introduces additional cache over head by copying packets of the same connection to different cores. • In addition, it also wastes the thread resources. • we believe dispatching an independent thread to a dedicated core saves the cost of scheduling overhead and reduces cache misses introduced by live migrations of unbalanced work loads.

  8. Parallelizing L7-filter at Connection Level 3 4 3 1 2

  9. Parallelizing L7-filter at Connection Level 3 4 3 1 2

  10. Parallelizing L7-filter at Connection Level

  11. Experiment Platform • This server system has two CPU sockets, each embeds a quad-core Xeon X5355 2.66GHz processors, and 16GB of 667MHz DDR2 SDRAM. Each socket has two 4MB shared L2 caches. • To Use Linux kernel 2.6.18 as default OS.

  12. Throughput and Core Utilization • With 7 concurrent threads, the system throughput increases by 51% compared to the naive OS scheduling. The system scales near linearly ( a speedup of 6.5X when 7 threads are applied.) to the number of MTs.

  13. Cache Performance

  14. A Life-of-Packet Analysis

More Related