1 / 6

What Data Do We Need and Why Do We Need It?

What Data Do We Need and Why Do We Need It?. Jim Pepin Chief Technology Officer University of Southern California. Network Data: Research Depends on It. Solutions depend on understanding the problem… Advances in many areas depend on analysis of real data

kapono
Download Presentation

What Data Do We Need and Why Do We Need It?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Data Do We Needand Why Do We Need It? Jim Pepin Chief Technology Officer University of Southern California

  2. Network Data: Research Depends on It • Solutions depend on understanding the problem… • Advances in many areas depend on analysis of real data • Network Management: Traffic engineering, net design • Network Control:Improving routing protocols • High Performance: Better transport protocols • Security:Tracking/stopping DoS and worm attacks • Over 30% of papers in top networking conference (SIGCOMM’04) depended on data collected by others • Most common providers: • ISPs (e.g., ATT, Sprint, I2) • Service Providers (e.g., Akamai) • Individual campuses (e.g., UNC, UOregon, USC – some campuses give data only to local researchers)

  3. Network Data: More than Just Packet Traces • Some data more sensitive than others • Dynamic routing information: routing protocol advertisements • Static design information: Router configuration files, peering arrangements, policies • Operational events: alarms, trouble tickets (very few sources of this important info!) • Traffic logs: netflow records, packet header traces • Application data: URLs, p2p filenames, DNS queries • Tension – how much correlation to permit? • Data that can be correlated across multiple sites most valuable in measuring network-wide events, e.g. worms • Techniques for privacy anonymize and blur identity

  4. Example of Data Provider • DHS PREDICT • DHS support for network research • Not for operational use by DHS • Major Players • Peer review ground rules • Generic sources for legitimate research • LANDER Project • Example of PREDICT supplier • Joint project of USC-ISI networking division and USC/ISD Center for High Performance Computing and Communications • USC-HPCC is manager of WAN for USC/CIT/JPL. • ISI provides networking research background • HPCC provides data storage and computational resources • We work together on ground rules and MOUs • LANDER funds collection systems, support staff and disk/tape space

  5. What is hard and easy • LANDER ground rules • Scrambled headers is primary product today • Requires MOU with researcher • No collection of data payloads. • Working on very strict MOU for very limited use of non-scrambled header data for very select uses in very controlled environment. • Build collection management system integrated with other PREDICT sites. • How we do this • Very close co-operation between ISI, ISD and university legal • MOUs will be very clear and understandable for the researcher • USC can reject any application • USC will review any publication based on unscrambled headers and all work processing these headers will be done inside HPCC

  6. Why would we do this • The Internet needs to be studied and engineered • What is the modern equivalent of Bell Labs for phone system? • How did we get to where we are today? • Co-operation between researchers and operators. • We can’t allow ourselves to have complete bunker mentality • We need to be selective in what we provide, but in case of demonstrated need provide what is needed consistent with policies • If we don’t do this no one will • The risks can be managed if we take the time and effort to work with campus management (legal, CIOs etc) to mitigate • Researchers can be brought into these discussions if cast correctly • If we don’t study how the network works our ability to manage it will degrade to zero over time

More Related