Tacc retrospective contributions non contributions and what we really learned
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned. Armando Fox University of California,Berkeley [email protected] Vision: “The Content You Want”. What do above apps have in common? Adapt (collect, filter, transform) existing content…

Download Presentation

TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tacc retrospective contributions non contributions and what we really learned

TACC Retrospective:Contributions, Non-Contributions, and What We Really Learned

Armando FoxUniversity of California,Berkeley

[email protected]


Vision the content you want

Vision: “The Content You Want”

What do above apps have in common?

  • Adapt (collect, filter, transform) existing content…

    • according to client constraints

    • respecting network limitations

    • according to per-user preferences

  • But: Lack of unified framework for designing apps that exploit this observation


Contributions

Contributions

  • TACC, a model for structuring services

    • Transformation, Aggregation, Caching, Customization of Internet content

  • Scalable TACC server

    • Based on clusters of commodity PC’s

    • Easy to author “industrial strength” services

    • Scalable Network Service (SNS) platform maps app semantics onto cluster-based availability mechanisms

  • Experience with real users

    • ~15,000 today at UCB


What s tacc

What’s TACC?

  • Transformation (“local”, “one-to-one”)

    • TranSend, Anonymizer

  • Aggregation (“nonlocal”, “many-to-one”)

    • Search engines, crawlers, newswatchers

  • Caching

    • Both original and locally-generated content

  • Customization

    • Per user: for content generation

    • Per device: data delivery, content “packaging”


Tacc example transend

C

T

TACC Example: TranSend

  • Transparent HTTP proxy

  • On-the-fly, lossy compression of specific MIME types (GIF, JPG...)

  • Cache both original & transformed

  • User specifies aggressiveness and “refinement” UI

    • Parameters to HTML & image transformers

$


Top gun wingman

T

C

Top Gun Wingman

  • PalmPilot web browser

  • Intermediate-form page layout

  • Image scaling & transcoding

    • Controlled by layout engine

  • Device-specific ADU marshalling

    • Including client versioning

    • Originals and device-specific pages cached

html

$

A

ADU


Application partitioning

Application Partitioning

  • Client competence

    • Styled text, images, widgets are fine

    • Bitmaps unnecessary

  • Client responsiveness

    • Scrolling, etc. shouldn’t require roundtrip to server

  • Client independence

    • Very late conversion to client-specific format


Tacc conceptual data flow

$

C

W

W

W

W

W

W

T

A

TACC Conceptual Data Flow

To Internet

FE

User

request

  • Front end accepts RPC-like user requests

  • User’s customization profile retrieved

  • Original data fetched from cache or Internet

  • Aggregation/transformation workers operate on data according to customization profile


Tacc model summary

TACC Model Summary

  • Mostly stateless, composable workers

  • Unifies previously ad hoc applications under one framework

  • Encourages re-use through modularization

    • Composition enables both new services and new clients

  • TACC breakdown provides unified way to think about app structure


Services should be easy to write

Services Should Be Easy To Write

  • Rapid prototyping

    • Insulate workers from “mundane” details

  • Easy to incorporate existing/legacy code

    • Few assumptions about code structure

    • Must support variety of languages

    • May be fragile

  • Composition to leverage existing code


Building a tacc server

Building a TACC Server

  • Challenge: Scalable Network Service (SNS) requirements

    • Scalability to 100K’s of users with high availability

    • Cost effective to deploy & administer

  • But, services should remain easy to write

    • Server provides some bug robustness

    • Server provides availability

    • Server handles load balancing and scaling

    • Preserve modularity (& componentwise upgradability) when deploying


Layered model of internet services

Layered Model of Internet Services

httpd, etc.

  • TACC Layer

    • Programming model based on composable building blocks

  • SNS Layer: “large virtual server”

    • Implements SNS requirements

    • Cluster computing for hardware F/T and incremental scaling

TACC

ScalableNetwork Svc

  • Exploit TACC model semantics for software F/T

  • SNS layer is reusable and isolated from TACC

    • Application “content” orthogonal to SNS mechanisms

    • Key to making apps easy to write


  • Why use a cluster

    Why Use a Cluster?

    • Incremental scalability, low cost components

    • High availability through hardware redundancy

      Goals:

    • Demonstrate that clusters and TACC fit well together

    • Separate SNS from TACC


    Cluster based tacc server

    C

    FE

    $

    $

    $

    FE

    W

    W

    W

    A

    Interconnect

    W

    W

    FE

    W

    GUI

    LB/FT

    T

    Cluster-Based TACC Server

    • Component replication for scaling and availability

    • High-bandwidth, low-latency interconnect

    • Incremental scaling: commodity PC’s

    User ProfileDatabase

    Caches

    Front Ends

    Workers

    Load Balancing &Fault Tolerance

    AdministrationInterface


    Starfish availability lb death

    W

    W

    W

    A

    Interconnect

    W

    W

    W

    T

    “Starfish” Availability: LB Death

    • FE detects via broken pipe/timeout, restarts LB

    C

    FE

    $

    $

    $

    FE

    FE

    LB/FT


    Starfish availability lb death1

    W

    W

    W

    A

    Interconnect

    W

    W

    W

    T

    LB/FT

    “Starfish” Availability: LB Death

    • FE detects via broken pipe/timeout, restarts LB

    • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables

    • If partition heals, extra LB’s commit suicide

    • FE’s operate using cached LB info during failure

    C

    FE

    $

    $

    $

    FE

    FE

    LB/FT


    Starfish availability lb death2

    W

    W

    W

    A

    Interconnect

    W

    W

    W

    T

    “Starfish” Availability: LB Death

    • FE detects via broken pipe/timeout, restarts LB

    • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables

    • If partition heals, extra LB’s commit suicide

    • FE’s operate using cached LB info during failure

    C

    FE

    $

    $

    $

    FE

    FE

    LB/FT


    Fault recovery latency

    Fault Recovery Latency

    Task queue length


    Behavior in the large

    Behavior in the Large

    • TranSend: 160 image transformations/sec = 10 Ultra-1 servers

      • Peak seen during UCB traces on 700-modem bank: 15/sec

      • Amortized hardware cost <$0.35/user/month (one $5K PC serving ~15,000 subscribers)

    • Wingman: factor of 6-8 worse

    • Administration: one undergraduate part-time


    Building a big system

    Building a Big System

    • Restartable, atomic workers

      • Read-only data from other origin server(s)

    • Orthogonal separation of scalability/availability from application “content”

      • Multiple lines of defense

      • App modules agree to obey semantics compatible with these mechanisms

      • Common-case failure behavior compatible with users’ Internet experience

      • Enables reuse of whole workers, however diverse


    Availability scalability summary

    Availability & Scalability Summary

    • Pervasive strategy: timeout, retry, restart

      • Transient failures usually invisible to user

      • Process peers watch each other

      • Mostly stateless workers, xact support possible

    • Simplicity from exploiting soft state

      • Piggyback status info on multicast beacons

      • Use of stale LB info fine in practice

    • “Starfish” availability works in practice


    Service authoring

    Service Authoring

    • Keyword hiliting: < 1 day

    • Wingman: 2-3 weeks

    • Various apps from graduate seminar projects

      • Safe worker upload

      • Annotate the Web

      • “Channel aggregators”


    New services by composition

    New Services By Composition

    • Compose existing services to create a new one

      • ~2.5 hours to implement

      • Composes with TranSend or Wingman

    Internet

    TranSend

    Metasearch


    Experience with real users

    Experience With Real Users

    • Transparent enhancements

    • Minimal downtime

    • Low administration cost

      • Multicast-based administration GUI

    • Virtually no dedicated resources at UCB

      • “Overflow pool” of ~100 UltraSPARC servers

    • Users don’t mind relying on middleware proxy


    Why now

    Why Now?

    • Internet’s critical mass

    • Commercial push for many device types (transistor curves)

    • Cluster computing economically viable

    • A good time for infrastructural services


    Related work

    Related Work

    • Transformational proxy services: WBI, Strands

    • Application partitioning: Wit, InfoPad, PARC Ubiquitous Computing

    • Computing in the infrastructure: Active Networks

    • Soft state for simplicity and robustness: Microsoft Tiger, multicast routing protocols


    Summary of contributions

    Summary of Contributions

    • TACC, a composition-based Internet services programming model

      • captures rich variety of apps

      • one view of customization

    • No-hassle deployment on a cluster

      • Automatic and robust partial-failure handling

      • Availability & scaling strategies work in practice

    • New apps are easy to write, deploy, debug

      • SNS behaviors are free

      • Compose existing services to enable new clients


    Non contributions a k a future work

    Non-Contributions (a/k/a Future Work)

    Accidental contributions:

    • Legacy code glue

    • Cheap test rig for next project (prototyping path discovery; a bare bones “cluster OS”)

      Non-contributions:

    • Fair resource allocation over cluster

    • Built-in security abstractions

    • Rich state management abstractions


    What we really learned

    What We Really Learned

    • Design for failure

      • It will fail anyway

      • End-to-end argument applied to availability

    • Orthogonality is even better than layering

      • Narrow interface vs. no interface

      • A great way to manage system complexity

      • The price of orthogonality

      • Techniques: Refreshable soft state; watchdogs/timeouts; sandboxing


    Future work

    Future Work

    • TACC as test rig for Ninja

    • Taxonomy of app structure and platforms

      • What is the “big picture” of different types of Internet services, and where does TACC fit in?

      • Joint work with Dr. Murray Mazer at the Open Group Research Institute

    • Apply TACC lessons to building reliable distributed systems

    • Formalize programming model


  • Login