1 / 20

Supercomputing Center

Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0. Supercomputing Center of KISTI Kookhan Kim August 28, 2003. Supercomputing Center. Contents. Introduction FlowScan FlowScan+ 2.0 Traffic Measurement & Analysis Others. Introduction.

nhu
Download Presentation

Supercomputing Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August 28, 2003 Supercomputing Center

  2. Contents • Introduction • FlowScan • FlowScan+ 2.0 • Traffic Measurement & Analysis • Others

  3. Introduction • We have various types supercomputers • NEC, IBM, Compaq, PC cluster • Supercomputing traffics • All traffics to calculate many kinds of data, which is generated between supercomputers and every users • Users would have authenticated and authorized ID • Until now, we did’t try to measure supercomputing traffic and analyze them yet • We want to know the characteristics of supercomputing traffics • who use it? • what applications & protocols used? • how much amount traffic generated? • To meet these demands, we improved FlowScan

  4. What is FlowScan? • FlowScan is a passive measurement tool drawing traffic graphs by analyzing network flows exported by routers and switches • NetFlow is exported CISCO routers and switches • It was developed by Dave Plonka and managed by CAIDA (http://www.caida.org) • Main modules - Perl scripts • cflowd (a flow collection engine) • flowscan (central process in the system) • Our improvement focuses on this module • RRDtool (a visualization tool) • Definition : Flow • An IP flow is a unidirectional series of IP packets of a given protocol, travelling between a source and destination, within a certain period of time.

  5. Enhanced FlowScan+ • The goal • Make a good passive measurement tool • The Motivations • Lack of traffic measurement tool that supports real time visualization and detailed traffic analysese on demand • To make user friendly tool, it can help everyone easy to use • Why FlowScan? • An open source program • It has good graphing function on the web • But yet it does not support query interface • Who is involved? • Supercomputing Center of KISTI • System Architecture Lab., Dept. of Computer Science, KAIST

  6. FlowScan+ 2.0 NetFlow v7 FlowScan Original Module RRD Static graph Flow-Tools Analysis Module (FlowScan+ 1.0) Flowscan Link Query Parsed Data Aggregation (15 min) DB Dynamic graph Visualization Module (FlowScan+ 2.0)

  7. FlowScan+ 1.0 Use MySQL Store NetFlow Information into DB Rawflows Aggregated data Query interface Access to the DB By Web Easy to use FlowScan+ 2.0 Flow-tools NetFlow version problem User Group Edit Small group, large group Divided by IP Class Visualization of DB query result JAVA Servlet, jfreechart FlowScan+ Main Point

  8. FlowScan+ 2.0 : NetFlow Versions

  9. FlowScan+ 2.0 : Flow-tools • NetFlow v5 & v7 have different PDU formats and do not correspond with including information • Cflowd, main NetFlow collection module in the FlowScan, cannot collect NetFlow v7 • We have to change NetFlow capture module • Flow-tools replace cflowd as NetFlow v7 collection modules

  10. FlowScan+ 2.0 : User Grouping • There is no way to veryfy user(id) of the Supercomputer • The user-related information is only IP address in the NetFlow • By this information, we can consider that “who is generating traffic user” • If users always connect the supercomputer with same system, they have the same source/dest IP : it is no problem • But they can log in with other systems in the same office or same building • So we takes a user grouping concept • If completely different place log in, it is impossible analysis user(id) from NetFlow • Except from this siuation, we can verify supercomputing user with network IP of NetFlow

  11. FlowScan+ 2.0 : User Grouping Group name  group number Group ID  user ID or related information We have classified only C class IP • - If one has many user ids • - When we compare the traffic of • a number of institutes with each • others • We should aggregate its total traffics • Large grouping

  12. FlowScan+ 2.0 : Visualization • In FlowScan+, improved by adding MySQL, has free DBMS based on the query interface to get flow information • But results of query are text based information • difficulties to intuitive understand • It cannot display result plot as time sereis • To support this, FlowScan+ 2.0 takes a visualization servlet

  13. FlowScan+ 2.0 : Visualization • The text result is only way that we can see the result of query interface until now • If we want to see the result of graphical plot as time passed • FlowScan+ 2.0 makes one more query into DB Visualization process & graph

  14. Lion Kfddi2 Kordic Tiger Baram Ruby-8/80 Catayst6506 Cisco7513 Ruby-8/80 Catayst6506 FlowScan+ 2.0 NetFlow v7 export Si Si Si Si Si Si H-Ruby H-Opal NEC H-NFS COMPAQ C6506 C6506 PC Cluster IBM SUPER COMPUTERS Traffic Measurement topology • Our supercomputer is linked mesh type with 2 catalyst 6500 series switches • NetFlow v7 export • Drawing graph every 5min. • Storing aggregated data & rawflows into BD every 15min.

  15. FlowScan+ 2.0 – traffic analysis Top user (by Institute) (2003/July/21 14:00 ~ /28 14:00) - 1 week measurement traffic - It is analyzed by large group - The pie graph draws again by the Excel sheets

  16. FlowScan+ 2.0 – traffic analysis Application (2003/July/21 14:00 ~ /28 14:00) • It shows a strange result, we cannot expect • We want to know the cooupied portion by various applications • Involved in bio, physics, aerospace, chemistry and so on. • But those are operated in the supercomputer • Those applications are installed in the supercomputers • Users log in the supercomputer by telnet and ftp • Transfer theirs data & Operate application from remote sites

  17. Other usage of FlowScan+ 2.0 • Detection of Network abnormalities • Port scanning • Cord Red virus • NIMDA virus • Mass mailing worm component • DDoS attack • Some features between flow and traffic amount • Byte : normal size traffic • Flow : explosive increase • Detection of emerging new applications • GRID applications, P2P applications and so on • If we should match new emerge application with defined its port number • Decrease unknown traffic portion

  18. FlowScan+ of KISTI

  19. Conclusions • FlowScan+ developed by KISTI & KAIST • Characteristics of FlowScan+ 2.0 • Flow-tools • NetFlow version problem. • Group edit • It can be measure & analysis of traffics by each users • Visualization of results • It makes graphical plot as time serise. • Future Works • DB optimization to speed up • Installation packaging • More stability of flowscan • Aggregate merits of each versions

  20. Thank you for your attention Questions ?

More Related