Verified computation with probabilities
Download
1 / 68

Verified computation with probabilities - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Verified computation with probabilities. This presentation is animated. (Press F5 to display.). Scott Ferson, scott@ramas.com Applied Biomathematics Scientific Computing, Computer Arithmetic and Verified Numerical Computations El Paso, Texas, 3 October 2008. Perspective.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Verified computation with probabilities' - tait


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Verified computation with probabilities l.jpg

Verified computation with probabilities

This presentation is animated. (Press F5 to display.)

Scott Ferson, scott@ramas.com

Applied Biomathematics

Scientific Computing, Computer Arithmetic and Verified Numerical Computations

El Paso, Texas, 3 October 2008


Perspective l.jpg
Perspective

  • Very elementary methods of interval analysis

    • Low-dimensional, static

    • Verified computing (but not roundoff error)

    • Huge uncertainties

  • Intervals combined with probability theory

    • Total probabilities (events)

    • Probability distributions (random variables)


Bounding probability is an old idea l.jpg
Bounding probability is an old idea

  • Boole and de Morgan

  • Chebyshev and Markov

  • Borel and Fréchet

  • Kolmogorov and Keynes

  • Berger and Walley

  • Williamson and Downs


Terminology l.jpg
Terminology

  • Dependence = stochastic dependence

    • More general than repeated variables

  • Independence = stochastic independence

  • Best possible = tight (almost)

    • some elements in the set may not be possible


  • Incertitude l.jpg
    Incertitude

    • Arises from incomplete knowledge

    • Incertitude arises from

      • limited sample size

      • measurement uncertainty or surrogate data

      • doubt about the model

    • Reducible with empirical effort


    Variability l.jpg
    Variability

    • Arises from natural stochasticity

    • Variability arises from

      • scatter and variation

      • spatial or temporal fluctuations

      • manufacturing or genetic differences

    • Not reducible by empirical effort


    They must be treated differently l.jpg
    They must be treated differently

    • Variability is randomness

      Needs probability theory

    • Incertitude is ignorance

      Needs interval analysis

    • Imprecise probabilities can do both at once


    Risk assessment applications l.jpg
    Risk assessment applications

    • Environmental pollution

      heavy metals, pesticides, PMx, ozone, PCBs, EMF, RF, etc.

    • Engineered systems

      traffic safety, bridge design, airplanes, spacecraft, nuclear plants

    • Financial investments

      portfolio planning, consultation, instrument evaluation

    • Occupational hazards

      manufacturing and factory workers, farm workers, hospital staff

    • Food safety and consumer product safety

      benzene in Perrier, E. coli in beef, salmonella in tomato, children’s toys

    • Ecosystems and biological resources

      endangered species, fisheries and reserve management

    $



    Total probabilities events l.jpg
    Total probabilities (events)

    • Fault or event trees

    • Logical expressions (Hailperin 1986)

    • Reliability analyses

      • Nuclear power plants

      • Aircraft safety system design

      • Gene technology release assessments

      • etc.


    Interval arithmetic for probabilities l.jpg
    Interval arithmetic for probabilities

    x + y = [x1 + y1, x2 + y2]

    xy = [x1 y2, x2 y1]

    xy = [x1 y1, x2 y2]

    xy = [x1 y2, x2 y1]

    min(x, y) = [min(x1, y1), min(x2, y2)]

    max(x, y) = [max(x1, y1), max(x2, y2)]

    Rules are simpler because intervals confined to [0,1]


    Probabilistic logic13 l.jpg
    Probabilistic logic

    • Conjunctions (and)

    • Disjunctions (or)

    • Negations (not)

    • Exclusive disjunctions (xor)

    • Modus ponens (if-then)

    • etc.


    Conjunction and l.jpg
    Conjunction (and)

    P(A |&| B) = P(A)  P(B)

    Example: P(A) = [0.3, 0.5]

    P(B) = [0.1, 0.2]

    A and B are independent

    P(A |&| B) = [0.03, 0.1]


    Stochastic dependence l.jpg
    Stochastic dependence

    Independence

    Probabilities are areas in the Venn diagrams


    Fr chet case l.jpg
    Fréchet case

    P(A & B)=[max(0, P(A) + P(B)–1), min(P(A), P(B))]

    • Makes no assumption about the dependence

    • Rigorous (guaranteed to enclose true value)

    • Best possible (cannot be any tighter)


    Fr chet examples l.jpg

    }certain

    uncertain

    Fréchet examples

    Examples: P(A) = [0.3, 0.5]

    P(B) = [0.1, 0.2]

    P(A & B) = [0, 0.2]

    P(C) = 0.29

    P(D) = 0.22

    P(C & D) = [0, 0.22]


    Example pump system l.jpg

    Switch

    S1

    Relay

    K2

    Relay

    K1

    Timer

    relay R

    Motor

    Pressure

    switch S

    From

    Pump

    reservoir

    Pressure tank T

    Outlet

    valve

    Example: pump system

    What’s the risk the tank ruptures under pumping?


    Fault tree l.jpg

    R

    E5

    or

    K2

    E4

    K1

    or

    E2

    or

    S1

    E1

    E3

    or

    and

    T

    S

    Fault tree

    Boolean expression of “tank rupturing under pressure” E1

    Unsure about the probabilities and their dependencies


    Results l.jpg
    Results

    Points, independence

    Mixed dependencies

    Fréchet

    Intervals and Fréchet

    105

    104

    103

    Probability of tank rupturing under pumping


    Interval probabilities l.jpg
    Interval probabilities

    • Allow verified computing of reliabilities

    • Distinguish two forms of uncertainty

    • Rigorous bounds always easy to get

    • Best possible bounds may need mathematical programming because of repeated variables



    Typical problems l.jpg
    Typical problems

    • Sometimes little or even no actual data

      • Updating is rarely used

  • Very simple arithmetic calculations

    • Occasionally, finite element meshes or differential equations

  • Usually small in number of inputs

    • Nuclear power plants are the exception

  • Results are important and often high profile

    • But the approach is being used ever more widely


  • Example pesticides farmworkers l.jpg
    Example: pesticides & farmworkers

    • Total dose is decomposed by pathway

      • Dermal exposure on hands

      • Exposures to rest of body

      • Inhalation

        (concentration in air, exposure duration, breathing rate, penetration factor, absorption efficiency)

    • Takes account of related factors

      • Acres, gallons per acre, mixing time

      • Body mass, frequency of hand washing, etc.


    Worst case analysis l.jpg
    Worst-case analysis

    • Traditional method

    • Mix of deterministic and extreme values

    • Actually a kind of interval analysis

    • Says how bad it could it be, but not how unlikely that outcome is


    Probabilistic analysis l.jpg
    Probabilistic analysis

    • State-of-the-art method

    • Usually via Monte Carlo simulation

    • Requires the full joint distribution

      • All the distributions for every input variable

      • All their intervariable dependencies

    • Often we need to guess about a lot of it


    What s needed l.jpg
    What’s needed

    • Reliable, conservative assessments of tail risks

    • Using available information but without forcing analysts to make unjustified assumptions

    • Neither computationally expensive nor intellectually taxing


    Probability bounds analysis l.jpg
    Probability bounds analysis

    • Marries intervals with probability theory

    • Distinguishes variability and incertitude

    • Solves many problems in uncertainty analysis

      • Input distributions unknown

      • Imperfectly known correlation and dependency

      • Large measurement error, censoring, small sample sizes

      • Model uncertainty


    Calculations l.jpg
    Calculations

    • All standard mathematical operations

      • Arithmetic operations(+, , ×, ÷, ^, min, max)

      • Logical operations(and, or, not, if, etc.)

      • Transformations(exp, ln, sin, tan, abs, sqrt, etc.)

      • Backcalculation (tolerance solutions)(deconvolutions, updating)

      • Magnitude comparisons(<, ≤, >, ≥, )

      • Other operations(envelope, mixture, etc.)

    • Faster than Monte Carlo

    • Guaranteed to bounds answer

    • Good solutions often easy to compute


    Probability box p box l.jpg
    Probability box (p-box)

    Interval bounds on an cumulative distribution function (CDF)

    1

    Cumulative probability

    0

    0.0

    1.0

    2.0

    3.0

    X


    Duality l.jpg
    Duality

    • Bounds on the probability at a value

      Chance the value will be 15 or less is between 0 and 25%

    • Bounds on the value at a probability

      95th percentile is between 40 and 70

    1

    Cumulative

    Probability

    0

    0

    20

    40

    60

    80

    X


    Uncertain numbers l.jpg

    1

    1

    0

    0

    10

    20

    30

    40

    10

    20

    30

    Uncertain numbers

    Probability

    distribution

    Probability

    box

    Interval

    1

    Cumulative probability

    0

    0

    10

    20

    30

    40


    Probability bounds arithmetic l.jpg
    Probability bounds arithmetic

    1

    1

    A

    B

    Cumulative Probability

    Cumulative Probability

    0

    0

    0

    1

    2

    3

    4

    5

    6

    0

    2

    4

    6

    8

    10

    12

    14

    What’s the sum of A+B?


    Cartesian product l.jpg

    A+B[4,12]

    prob=1/9

    A+B[5,13]

    prob=1/9

    A+B[7,13]

    prob=1/9

    A+B[8,14]

    prob=1/9

    A+B[9,15]

    prob=1/9

    A+B[9,15]

    prob=1/9

    A+B[10,16]

    prob=1/9

    A+B[11,17]

    prob=1/9

    Cartesian product

    A = mix(1,[1,3], 1,[2,4], 1, [3,5])

    B = mix(1, [2,8], 1, [6,10], 1,[8,12])

    A |+| B

    ~(range=[3,17], mean=[7.3,14], var=[0,22])

    A + B

    ~(range=[3,17], mean=[7.3,14], var=[0,38])

    A+B

    independence

    A[1,3]

    p1 = 1/3

    A[2,4]

    p2 = 1/3

    A[3,5]

    p3 = 1/3

    B[2,8]

    q1 = 1/3

    A+B[3,11]

    prob=1/9

    B[6,10]

    q2 = 1/3

    B[8,12]

    q3 = 1/3


    A b under independence l.jpg
    A+B under independence

    1.00

    0.75

    0.50

    Cumulative probability

    0.25

    0.00

    15

    0

    3

    6

    9

    12

    18

    A+B


    Generalization of methods l.jpg
    Generalization of methods

    • When inputs are distributions, the answers conform with probability theory

    • When inputs are intervals, it agrees with interval analysis


    Where do we get p boxes l.jpg
    Where do we get p-boxes?

    • Assumption

    • Modeling

    • Robust Bayes analysis

    • Constraint propagation

    • Data with incertitude

      • Measurement error

      • Sampling error

      • Censoring


    A tale of two data sets l.jpg
    A tale of two data sets

    Skinny (n = 6)

    Puffy (n = 9)

    0

    0

    2

    2

    4

    4

    6

    6

    8

    8

    10

    10

    0

    0

    2

    2

    4

    4

    6

    6

    8

    8

    10

    10

    x

    x


    Empirical distributions l.jpg
    Empirical distributions

    1

    1

    1

    Skinny

    Puffy

    Cumulative probability

    Cumulative probability

    0

    0

    0

    0

    0

    2

    2

    4

    4

    6

    6

    8

    8

    10

    10

    0

    0

    2

    2

    4

    4

    6

    6

    8

    8

    10

    10

    x

    x

    x

    x


    Fitted distributions l.jpg
    Fitted distributions

    1

    1

    Skinny

    Puffy

    Cumulative probability

    0

    0

    0

    10

    20

    0

    10

    20

    x

    x

    Statisticians often ignore the incertitude, or treat it as though it were (uniform) variability.


    Constraint propagation l.jpg
    Constraint propagation

    (lognormal with interval parameters)


    Slide42 l.jpg

    Maximum entropy erases uncertainty

    1

    1

    1

    mean, std

    positive, quantile

    min,max

    0

    0

    0

    2

    4

    6

    8

    10

    -10

    0

    10

    20

    0

    10

    20

    30

    40

    50

    1

    1

    1

    min,

    max, mean

    sample data, range

    integral, min,max

    0

    0

    0

    2

    4

    6

    8

    10

    2

    4

    6

    8

    10

    2

    4

    6

    8

    10

    1

    1

    1

    (lognormal with interval parameters)

    min, mean

    min,max, mean,std

    p-box

    0

    0

    0

    0

    10

    20

    30

    40

    50

    2

    4

    6

    8

    10

    2

    4

    6

    8

    10


    Example pcbs and duck hunters l.jpg
    Example: PCBs and duck hunters

    Location: Massachusetts and Connecticut

    Receptor: Adult human hunters of waterfowl

    Contaminant: PCBs (polychorinated biphenyls)

    Exposure route: dietary consumption of contaminated waterfowl

    Based on the assessment for non-cancer risks from PCB to adult hunters who consume contaminated waterfowl described in Human Health Risk Assessment: GE/Housatonic River Site: Rest of River, Volume IV, DCN: GE-031203-ABMP, April 2003, Weston Solutions (West Chester, Pennsylvania), Avatar Environmental (Exton, Pennsylvania), and Applied Biomathematics (Setauket, New York).


    Hazard quotient l.jpg
    Hazard quotient

    EF = mmms(1, 52, 5.4, 10) meals per year // exposure frequency, censored data, n = 23

    IR = mmms(1.5, 675, 188, 113) grams per meal //poultry ingestion rate from EPA’s EFH

    C = [7.1, 9.73] mg per kg //exposure point (mean) concentration

    LOSS = 0 // loss due to cooking

    AT = 365.25 days per year //averaging time (not just units conversion)

    BW = mixture(BWfemale, BWmale) // Brainard and Burmaster (1992)

    BWmale = lognormal(171, 30) pounds // adult male n = 9,983

    BWfemale = lognormal(145, 30) pounds // adult female n = 10,339

    RfD = 0.00002 mg per kg per day // reference dose considered tolerable

    HQ = (EF |*| IR |*| C |*| (1|-| LOSS)) |/| (AT |*| BW |*| RfD)


    Inputs l.jpg

    = 1 - CDF

    Inputs

    1

    1

    Exceedance risk

    EF

    IR

    0

    0

    0

    10

    20

    30

    40

    50

    60

    0

    200

    400

    600

    meals per year

    grams per meal

    1

    1

    males

    females

    C

    BW

    0

    0

    0

    100

    200

    300

    0

    10

    20

    pounds

    mg per kg


    Automatically verified results l.jpg
    Automatically verified results

    EF = mmms(1, 52, 5.4, 10) *units('meals per year') // exposure frequency, censored data, n = 23

    IR = mmms(1.5, 675, 188, 113) *units('grams per meal') // poultry ingestion rate from EPA’s EFH

    C = [7.1, 9.73] *units('mg per kg') // exposure point (mean) concentration

    LOSS = 0 // loss due to cooking

    AT = 365.25 *units('days per year') // averaging time (not just units conversion)

    BWmale = lognormal(171, 30) *units('pounds') // adult male n = 9,983

    BWfemale = lognormal(145, 30) *units('pounds') // adult female n = 10,339

    BW = mixture(BWfemale, BWmale) // Brainard and Burmaster (1992)

    RfD = 0.00002 *units('mg per kg per day') // reference dose considered tolerable

    HQ = (EF |*| IR |*| C |*| (1|-| LOSS)) / (AT |*| BW |*| RfD)

    HQ

    ~(range=[5.52769,557906], mean=[1726,13844], var=[0,7e+09]) meals grams meal-1 days-1 pounds-1 day

    HQ = HQ + 0

    HQ

    ~(range=[0.0121865,1229.97], mean=[3.8,30.6], var=[0,34557])

    mean(HQ)

    [ 3.8072899212, 30.520271393]

    sd(HQ)

    [ 0, 185.89511004]

    median(HQ)

    [ 0.66730580299, 54.338212132]

    cut(HQ,95%)

    [ 3.5745830321, 383.33656983]

    range(HQ)

    [ 0.012186461038, 1229.9710013]

    1

    mean [3.8, 31]

    standard deviation [0, 186]

    median [0.6, 55]

    95th percentile [3.5 , 384]

    range [0.01, 1230]

    Exceedance risk

    0

    0

    500

    1000

    HQ


    Uncertainty about distribution shape l.jpg
    Uncertainty about distribution shape

    • PBA propagates uncertainty about distribution shape comprehensively

    • Neither sensitivity studies nor second-order Monte Carlo can really do this

    • Maximum entropy hides uncertainty



    Dependence l.jpg
    Dependence

    • Not all variables are independent

      • Body size and skin surface area

      • Common-cause variables

      • Default risks for mortgages

    • Known dependencies should be modeled

    • What can we do when we don’t know them?


    Uncertainty about dependence l.jpg
    Uncertainty about dependence

    • Sensitivity analyses usually used

      • Vary correlation coefficient between 1 and +1

    • But this underestimates the true uncertainty

      • Example: suppose X, Y ~ uniform(0,25) but we don’t know the dependence between X and Y


    Unknown dependence l.jpg
    Unknown dependence

    1

    Fréchet

    Cumulative probability

    X,Y ~ uniform(0,25)

    0

    0

    10

    20

    30

    40

    50

    X + Y


    Varying correlation between 1 and 1 l.jpg
    Varying correlation between 1 and +1

    1

    Pearson

    Cumulative probability

    X,Y ~ uniform(0,25)

    0

    0

    10

    20

    30

    40

    50

    X + Y


    Unknown but positive dependence l.jpg
    Unknown but positive dependence

    1

    Positive

    Cumulative probability

    X,Y ~ uniform(0,25)

    0

    0

    10

    20

    30

    40

    50

    X + Y


    Uncertainty about dependence54 l.jpg
    Uncertainty about dependence

    • Can’t be studied with sensitivity analysis since it’s an infinite-dimensional problem

    • Fréchet bounding lets you be sure

    • Intermediate knowledge can be exploited

    • Dependence can have large or small effect



    Backcalculation56 l.jpg
    Backcalculation

    Generalization of tolerance solutions

    E.g., from p-boxes for A and C, finds B such that A+B  C

    Needed for planning environmental cleanups, designing structures, etc.


    Backcalculation with p boxes l.jpg
    Backcalculation with p-boxes

    A = normal(5, 1)

    C = {0  C, median  1.5, 90th%ile  35, max  50}

    1

    A

    0

    2

    3

    4

    5

    6

    7

    8

    1

    C

    0

    0

    10

    20

    30

    40

    50

    60


    Getting the answer l.jpg
    Getting the answer

    Basically reverses the forward convolution

    Any distribution totally inside B is sure to satisfy the constraint

    Many possible B’s

    1

    B

    0

    -10

    0

    10

    20

    30

    40

    50


    Check it by plugging it back in l.jpg
    Check it by plugging it back in

    A + B = C*  C

    1

    C*

    C

    0

    -10

    0

    10

    20

    30

    40

    50

    60


    Backcalculation60 l.jpg
    Backcalculation

    Backcalculation wider under independence (narrower under Fréchet)

    Monte Carlo methods don’t generally work except in a trial-and-error approach

    Precise distributions can’t express the target



    Probability vs intervals l.jpg
    Probability vs. intervals

    • Probability theory

      • Handles likelihoods and dependence well

      • Has an inadequate model of ignorance

      • Lying: saying more than you really know

    • Interval analysis

      • Handles epistemic uncertainty (ignorance) well

      • Inadequately models frequency and dependence

      • Cowardice: saying less than you know


    Probability bounds analysis63 l.jpg
    Probability bounds analysis

    • Generalizes them to escape the limits of each

    • Makes verified calculations about probabilities

      • Using whatever knowledge is available

      • Without requiring unjustified assumptions

    • Well developed methodology

      • But plenty of interesting and important questions remain for study


    Diverse applications l.jpg
    Diverse applications

    • Human health risk analyses

    • Conservation biology extinction/reintroduction

    • Wildlife contaminant exposure analyses

    • Chemostat dynamics

    • Global warming forecasts

    • Design of spacecraft

    • Safety of engineered systems (e.g., bridges)


    What p boxes can t do l.jpg
    What p-boxes can’t do

    • Give best-possible bounds on non-tail risks

    • Conveniently get best-possible bounds when dependencies are subtle

    • Show what’s most likely within the box


    Acknowledgments l.jpg

    Lev Ginzburg

    Rüdiger Kuhn

    Vladik Kreinovich

    David Myers

    Bill Oberkampf

    Janos Hajagos

    Dan Berleant

    Chris Paredis

    Mark Stadtherr

    NIH

    Sandia National Labs

    NASA

    Applied Biomathematics

    Electric Power Research Institute

    Acknowledgments



    Research topics l.jpg
    Research topics

    • Incorporation of ancillary information

    • Propagation through black boxes

    • Handling subtle dependencies

    • Computing non-tail risks

    • Combination with fuzzy numbers

    • Decision theory

    • Info-gap models

    • Finite-element models

    • ODEs and PDEs