slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
phi PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 25

phi - PowerPoint PPT Presentation

  • Uploaded on

phi. public health for the internet joe hellerstein intel research & uc berkeley. agenda. three visions driving j building block: the PIER query engine challenges, synergies. vision 1: shift network security from medicine to public health. security tools focused on “medicine”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


public health for the internet

joe hellerstein

intel research & uc berkeley

  • three visions driving j
  • building block: the PIER query engine
  • challenges, synergies
vision 1 shift network security from medicine to public health
vision 1: shift network security from medicine to public health
  • security tools focused on “medicine”
    • vaccines for viruses
    • improving the world one patient at a time
  • weakness/opportunity in the “public health” arena
    • public health: population-focused, community-oriented
    • epidemiology: incidence, distribution, and control in a population
  • j: a new approach
    • enable population-wide measurement
    • engage end users: education and prevention
    • understand risky behaviors, at-risk populations.
a center for disease control
a center for disease control?
  • [staniford/paxson/weaver 2002]
    • am I being targeted?
    • is this remote host a “bad guy”?
    • is there a new type of activity?
    • is there global-scale activity
  • who owns the center? what do they control?
  • this will be unpopular at best
    • electronic privacy for individuals
      • the internet as “a broadly surveilled police state”?
        • dan geer, former cto of @Stake
    • provider disincentives
      • Transparency = maintenance cost
  • and hardly ubiquitous
    • can monitor the chokepoints (isp’s)
    • but inside intranets??
      • e.g. corporate IT
      • e.g. berkeley dorms
      • e.g. grassroots WiFi agglomerations?
energizing the end users
energizing the end-users
  • endpoints are ubiquitous
    • internet, intranet, hotspot
    • toward a uniform architecture
  • end-users will help
    • populist appeal to home users is timely
    • enterprise IT can dictate endpoint software
    • differentiating incentives for endpoint vendors
  • the connection: peer-to-peer technology
    • harnessed to the good!
    • ease of use
    • built-in scaling
    • decentralization of trust and liability

p2p technology is ripe. a noble app here with significant uptake?

vision 2 shared network monitoring
vision 2: shared network monitoring
  • endpoint monitoring becoming a trend
    • NETI@Home (GA Tech)
    • DIMES (TAU)
    • ForNet (Polytechnic)
    • DShield
    • DOMINO (Wisconsin)
  • we share the vision!
  • but all facing key challenges in getting uptake
    • what’s in it for the community members?
    • disincentives: privacy & security risks
a communal approach
a communal approach
  • enable multiple efforts with a single distributed infrastructure
    • extensible endpoint “sensors” and visualizations
    • shared engine connecting them up
  • a group bands together on the hard systems and crypto
    • cost-effective data processing and analysis
    • verifiable data and processing
    • distributed resource limiting
    • toolkit of privacy-preserving, distributed dataflow components
  • a theme: dissemination is as important as collection
    • attract end-users with visible community information
    • enable real-time swapping across research teams
    • there may be much more here (see next vision!)
  • intel research is prepared to invest in this community
    • as we did with planetlab
vision 3 the network oracle
vision 3: the network oracle
  • imagine that you knew everything about the internet, at every moment
    • network maps
    • link loading
    • point-to-point latency and bandwidth
    • event detections (e.g., from firewalls)
    • naming (DNS, ASes, etc.),
    • end-system software configuration information
    • router configurations and routing tables
  • how would this change things?
    • the design of protocols
    • the design of networked applications
    • network and system management (performance and security)
    • the economy (and policy) of nw clients and isp’s
    • etc.
a dirty not so secret
a dirty (not-so) secret
  • we’re sneaking up on the oracle already
    • overlays are a subversive attempt to wrest control from ISPs
    • overlays compute and disseminate measurements
    • measurement and functionality appetite growing
      • everybody’s favorite planetlab exercise: all-pairs ping
      • detour routing a la RON
      • custom routing a la i3/ROSE
  • but this is not being done systematically
    • every overlay does its own thing, opaquely
    • granularity of aggregation in time and space not well explored
    • measurement & dissemination often 2ndary/implicit
    • algorithmic/architectural choices abound, little exploration
  • and the brass ring remains…
wrapping up 3 visions
wrapping up: 3 visions
  • multiple rationales to pursue this agenda
  • commonalities
    • many networked sensors
    • many computational agents for data processing
    • many destinations for result dissemination
    • decentralized infrastructure:
      • organic scaling
      • no centralized maintenance
      • no single unified repository of raw data (privacy ramifications)
  • differences (invariably!)
    • desired data granularities, in time and space
    • “reach” of querying and dissemination
    • sensitivity to privacy issues
  • goal: a shared infrastructure
    • shared effort to develop and extend it, seeded by intel research
    • shared bootstrap deployment (planetlab and beyond)
  • three visions driving j
  • building block: the PIER query engine
  • challenges, synergies
pier p2p information exchange retrieval
pier: p2p information exchange & retrieval
  • a wide-area distributed dataflow engine
    • designed to scale to thousands or millions of nodes
    • outfitted with “streaming” relational operators, recursive graph queries
    • fully extensible dataflow graphs, SQL-like interface for convenience
  • built on distributed hash table (DHT) overlays
    • a put()/get() hashtable interface for the Internet.
    • content-based routing, soft-state semantics
    • pier is DHT-agnostic (CAN  chord  bamboo)
  • a very different design point than DB2, Oracle, etc.
    • scale = # machines, not necessarily # bytes
    • relaxed consistency a requirement (not really a dataBASE at all)
    • organic scaling
    • data lives in its natural habitat
initial pier applications
initial pier applications
  • φintrusion app
    • real-time snort aggregation from ~300 planetlab nodes
    • identification of top-10 attackers (validating DOMINO)
    • real time joins: “who are my attackers attacking”
    • plausible end-user visualizations
  • transitive closures and other graph algorithms
    • distributed gnutella crawler
    • distributed web crawler
    • shortest paths queries (distance vector routing)
  • improved filesharing for rare items
    • deployed as hybrid gnutella ultrapeer on 50 planetlab nodes
    • intercepts gnutella queries, identifies “rare items and publishes”
    • 18% decrease in number of unnecessarily empty query results
      • 66% possible with better “rare item” identification
  • upshot: reasons to believe the generality is real
pier in the j context
pier in the j context
  • goal is for pier to serve as an information plane
    • gather data from “sensors”
    • perform basic filtering, aggregation, combination
      • though aggregation can be rather fancy (e.g. wavelet encoding)
    • disseminate the right “cooked” data to the right people
  • and do so in a “trusted” way
    • privacy and security
    • manageability
  • but … only a piece of the puzzle
    • active probing
    • mapping
    • backbone monitors
    • network forensics, tomography
    • honeypots
    • etc.
  • we won’t do all of this ourselves!
    • gathering playmates
  • three visions driving j
  • building block: the PIER query engine
  • challenges, synergies





Quality of Service


Overlay Network

Query Plan

Query Optimization

Multi-Query Optimization


Persistent Storage

Recursion on graphs

Physical Network

Query Dissemination



Quality of Service

Net-Embedded functions


Route Flapping


current limitations of pier
current limitations of pier
  • query per client
    • no systematic sharing of computation/results across queries
  • locality control forfeited to dht
    • difficult to express local gossiping rules
  • queries, not triggers
    • alerts currently supported via polling
  • loose query semantics
    • network dynamics and timing make guarantees hard
  • active monitoring
    • we can do it, but it’s not systematic
  • security/privacy
  • we’re attacking many of these now
so is pier the right infrastructure
so, is pier the “right” infrastructure
  • not today
  • though many of the decisions seem sound
    • level of indirection between task specification and execution
    • non-hierarchical model provides flexibility and simplicity
      • vs. domain hierarchy (a la ip naming)
      • vs. data hierarchies (a la xml)
    • extensible aggregation + relational operators covers a lot of territory
      • monitoring
      • routing
potential synergies
potential synergies
  • design of shared info plane
    • scenarios & requirements
    • architectural brickbats
    • built-in components
    • complementary components
      • and requirements for integration
  • understanding the opportunity
    • what if the network oracle existed
  • fostering the community
    • leveraging each other’s efforts to get mindshare
  • resources
    • if the intel genie granted you a wish…
      • (think about building/leveraging community)
a note on structured data on networks
A Note on Structured Data on Networks
  • Industrial Revolution for Information
    • Mechanized data generation
      • Sensing the physical world
      • Monitoring software, networks, machines
      • Tracking objects, processes, behaviors
    • Uniformity of products
    • Mass Transport of Data and Computation
      • Data generators and consumers spread over the Internet and the Planet
  • Happening at both extremes
  • Compare to hand-generation of text