1 / 25

Public Health of the Internet - PowerPoint PPT Presentation

  • Updated On :

phi. public health for the internet joe hellerstein intel research & uc berkeley. agenda. three visions driving j building block: the PIER query engine challenges, synergies. vision 1: shift network security from medicine to public health. security tools focused on “medicine”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Public Health of the Internet' - liam

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


public health for the internet

joe hellerstein

intel research & uc berkeley


  • three visions driving j

  • building block: the PIER query engine

  • challenges, synergies

Vision 1 shift network security from medicine to public health
vision 1: shift network security from medicine to public health

  • security tools focused on “medicine”

    • vaccines for viruses

    • improving the world one patient at a time

  • weakness/opportunity in the “public health” arena

    • public health: population-focused, community-oriented

    • epidemiology: incidence, distribution, and control in a population

  • j: a new approach

    • enable population-wide measurement

    • engage end users: education and prevention

    • understand risky behaviors, at-risk populations.

A center for disease control
a center for disease control? health

  • [staniford/paxson/weaver 2002]

    • am I being targeted?

    • is this remote host a “bad guy”?

    • is there a new type of activity?

    • is there global-scale activity

  • who owns the center? what do they control?

  • this will be unpopular at best

    • electronic privacy for individuals

      • the internet as “a broadly surveilled police state”?

        • dan geer, former cto of @Stake

    • provider disincentives

      • Transparency = maintenance cost

  • and hardly ubiquitous

    • can monitor the chokepoints (isp’s)

    • but inside intranets??

      • e.g. corporate IT

      • e.g. berkeley dorms

      • e.g. grassroots WiFi agglomerations?

Energizing the end users
energizing the end-users health

  • endpoints are ubiquitous

    • internet, intranet, hotspot

    • toward a uniform architecture

  • end-users will help

    • populist appeal to home users is timely

    • enterprise IT can dictate endpoint software

    • differentiating incentives for endpoint vendors

  • the connection: peer-to-peer technology

    • harnessed to the good!

    • ease of use

    • built-in scaling

    • decentralization of trust and liability

p2p technology is ripe. a noble app here with significant uptake?

Demo time
demo time health

Vision 2 shared network monitoring
vision 2: shared network monitoring health

  • endpoint monitoring becoming a trend

    • [email protected] (GA Tech)

    • DIMES (TAU)

    • ForNet (Polytechnic)

    • DShield

    • DOMINO (Wisconsin)

  • we share the vision!

  • but all facing key challenges in getting uptake

    • what’s in it for the community members?

    • disincentives: privacy & security risks

A communal approach
a communal approach health

  • enable multiple efforts with a single distributed infrastructure

    • extensible endpoint “sensors” and visualizations

    • shared engine connecting them up

  • a group bands together on the hard systems and crypto

    • cost-effective data processing and analysis

    • verifiable data and processing

    • distributed resource limiting

    • toolkit of privacy-preserving, distributed dataflow components

  • a theme: dissemination is as important as collection

    • attract end-users with visible community information

    • enable real-time swapping across research teams

    • there may be much more here (see next vision!)

  • intel research is prepared to invest in this community

    • as we did with planetlab

Vision 3 the network oracle
vision 3: the network oracle health

  • imagine that you knew everything about the internet, at every moment

    • network maps

    • link loading

    • point-to-point latency and bandwidth

    • event detections (e.g., from firewalls)

    • naming (DNS, ASes, etc.),

    • end-system software configuration information

    • router configurations and routing tables

  • how would this change things?

    • the design of protocols

    • the design of networked applications

    • network and system management (performance and security)

    • the economy (and policy) of nw clients and isp’s

    • etc.

A dirty not so secret
a dirty (not-so) secret health

  • we’re sneaking up on the oracle already

    • overlays are a subversive attempt to wrest control from ISPs

    • overlays compute and disseminate measurements

    • measurement and functionality appetite growing

      • everybody’s favorite planetlab exercise: all-pairs ping

      • detour routing a la RON

      • custom routing a la i3/ROSE

  • but this is not being done systematically

    • every overlay does its own thing, opaquely

    • granularity of aggregation in time and space not well explored

    • measurement & dissemination often 2ndary/implicit

    • algorithmic/architectural choices abound, little exploration

  • and the brass ring remains…

Wrapping up 3 visions
wrapping up: 3 visions health

  • multiple rationales to pursue this agenda

  • commonalities

    • many networked sensors

    • many computational agents for data processing

    • many destinations for result dissemination

    • decentralized infrastructure:

      • organic scaling

      • no centralized maintenance

      • no single unified repository of raw data (privacy ramifications)

  • differences (invariably!)

    • desired data granularities, in time and space

    • “reach” of querying and dissemination

    • sensitivity to privacy issues

  • goal: a shared infrastructure

    • shared effort to develop and extend it, seeded by intel research

    • shared bootstrap deployment (planetlab and beyond)

agenda health

  • three visions driving j

  • building block: the PIER query engine

  • challenges, synergies

Pier p2p information exchange retrieval
pier: p2p information exchange & retrieval health

  • a wide-area distributed dataflow engine

    • designed to scale to thousands or millions of nodes

    • outfitted with “streaming” relational operators, recursive graph queries

    • fully extensible dataflow graphs, SQL-like interface for convenience

  • built on distributed hash table (DHT) overlays

    • a put()/get() hashtable interface for the Internet.

    • content-based routing, soft-state semantics

    • pier is DHT-agnostic (CAN  chord  bamboo)

  • a very different design point than DB2, Oracle, etc.

    • scale = # machines, not necessarily # bytes

    • relaxed consistency a requirement (not really a dataBASE at all)

    • organic scaling

    • data lives in its natural habitat

Initial pier applications
initial pier applications health

  • φintrusion app

    • real-time snort aggregation from ~300 planetlab nodes

    • identification of top-10 attackers (validating DOMINO)

    • real time joins: “who are my attackers attacking”

    • plausible end-user visualizations

  • transitive closures and other graph algorithms

    • distributed gnutella crawler

    • distributed web crawler

    • shortest paths queries (distance vector routing)

  • improved filesharing for rare items

    • deployed as hybrid gnutella ultrapeer on 50 planetlab nodes

    • intercepts gnutella queries, identifies “rare items and publishes”

    • 18% decrease in number of unnecessarily empty query results

      • 66% possible with better “rare item” identification

  • upshot: reasons to believe the generality is real

Pier in the j context
pier in the healthj context

  • goal is for pier to serve as an information plane

    • gather data from “sensors”

    • perform basic filtering, aggregation, combination

      • though aggregation can be rather fancy (e.g. wavelet encoding)

    • disseminate the right “cooked” data to the right people

  • and do so in a “trusted” way

    • privacy and security

    • manageability

  • but … only a piece of the puzzle

    • active probing

    • mapping

    • backbone monitors

    • network forensics, tomography

    • honeypots

    • etc.

  • we won’t do all of this ourselves!

    • gathering playmates

agenda health

  • three visions driving j

  • building block: the PIER query engine

  • challenges, synergies


Declarative health




Quality of Service


Overlay Network

Query Plan

Query Optimization

Multi-Query Optimization


Persistent Storage

Recursion on graphs

Physical Network

Query Dissemination



Quality of Service

Net-Embedded functions


Route Flapping



Current limitations of pier
current limitations of pier health

  • query per client

    • no systematic sharing of computation/results across queries

  • locality control forfeited to dht

    • difficult to express local gossiping rules

  • queries, not triggers

    • alerts currently supported via polling

  • loose query semantics

    • network dynamics and timing make guarantees hard

  • active monitoring

    • we can do it, but it’s not systematic

  • security/privacy

  • we’re attacking many of these now

So is pier the right infrastructure
so, is pier the “right” infrastructure health

  • not today

  • though many of the decisions seem sound

    • level of indirection between task specification and execution

    • non-hierarchical model provides flexibility and simplicity

      • vs. domain hierarchy (a la ip naming)

      • vs. data hierarchies (a la xml)

    • extensible aggregation + relational operators covers a lot of territory

      • monitoring

      • routing

Potential synergies
potential synergies health

  • design of shared info plane

    • scenarios & requirements

    • architectural brickbats

    • built-in components

    • complementary components

      • and requirements for integration

  • understanding the opportunity

    • what if the network oracle existed

  • fostering the community

    • leveraging each other’s efforts to get mindshare

  • resources

    • if the intel genie granted you a wish…

      • (think about building/leveraging community)

A note on structured data on networks
A Note on Structured Data on Networks health

  • Industrial Revolution for Information

    • Mechanized data generation

      • Sensing the physical world

      • Monitoring software, networks, machines

      • Tracking objects, processes, behaviors

    • Uniformity of products

    • Mass Transport of Data and Computation

      • Data generators and consumers spread over the Internet and the Planet

  • Happening at both extremes

  • Compare to hand-generation of text