primer on probability
Skip this Video
Download Presentation
Primer on Probability

Loading in 2 Seconds...

play fullscreen
1 / 33

Primer on Probability - PowerPoint PPT Presentation

  • Uploaded on

Primer on Probability. Sushmita Roy BMI/CS 576 Sushmita Roy Sep 25 th , 2012. BMI/CS 576. Definition of probability.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Primer on Probability' - josiah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
primer on probability

Primer on Probability

Sushmita Roy

BMI/CS 576

Sushmita Roy

Sep 25th, 2012

BMI/CS 576

definition of probability
Definition of probability
  • frequentist interpretation: the probability of an event from a random experiment is the proportion of the time events of same kind will occur in the long run, when the experiment is repeated
  • examples
    • the probability my flight to Chicago will be on time
    • the probability this ticket will win the lottery
    • the probability it will rain tomorrow
  • always a number in the interval [0,1]

0 means “never occurs”

1 means “always occurs”

sample spaces
Sample spaces
  • sample space: a set of possible outcomes for some event
  • event: a subset of sample space
  • examples
    • flight to Chicago: {on time, late}
    • lottery: {ticket 1 wins, ticket 2 wins,…,ticket n wins}
    • weather tomorrow:

{rain, not rain} or

{sun, rain, snow} or

{sun, clouds, rain, snow, sleet} or…

random variables
Random variables
  • random variable: a function associating a value with an attribute of the outcome of an experiment
  • example
    • X represents the outcome of my flight to Chicago
    • we write the probability of my flight being on time as P(X = on-time)
    • or when it’s clear which variable we’re referring to, we may use the shorthand P(on-time)
  • uppercase letters and capitalized words denote random variables
  • lowercase letters and uncapitalized words denote values
  • we’ll denote a particular value for a variable as follows
  • we’ll also use the shorthand form
  • for Boolean random variables, we’ll use the shorthand
probability distributions








Probability distributions
  • if X is a random variable, the function given by P(X = x)for each x is the probability distribution of X
  • requirements:
joint distributions
Joint distributions
  • joint probability distribution: the function given by P(X = x, Y = y)
  • read “X equals xandY equals y”
  •  example

probability that it’s sunny

and my flight is on time

marginal distributions
Marginal distributions
  • the marginal distribution of X is defined by

“the distribution of X ignoring other variables”

  • this definition generalizes to more than two variables, e.g.
marginal distribution e xample
Marginal distribution example

joint distribution

marginal distribution for X

conditional distributions
Conditional distributions
  • the conditional distribution of Xgiven Y is defined as:

“the distribution of X given that we know the value of Y”

conditional distribution e xample
Conditional distribution example

conditional distribution for X


joint distribution

  • two random variables, X and Y, are independent if 
independence example 1
Independence example #1

joint distribution

marginal distributions

Are X and Y independent here?


independence example 2
Independence example #2

joint distribution

marginal distributions

Are X and Y independent here?


conditional independence
Conditional independence
  • two random variables X and Y are conditionally independent given Z if 
    • “once you know the value of Z, knowing Y doesn’t tell you anything about X”
  • alternatively
conditional independence e xample
Conditional independence example

Are Fever andHeadache independent?


conditional independence e xample1
Conditional independence example

Are Fever and Vomitconditionally independent given Flu:


chain rule of probability
Chain rule of probability
  • for two variables
  • for three variables
  • etc.
  • to see that this is true, note that
bayes theorem
Bayes theorem
  • this theorem is extremely useful
  • there are many cases when it is hard to estimate P(x| y) directly, but it’s not too hard to estimate P(y| x) andP(x)
bayes theorem e xample
Bayes theorem example
  • MDs usually aren’t good at estimating P(Disorder| Symptom)
  • they’re usually better at estimating P(Symptom| Disorder)
  • if we can estimate P(Fever| Flu) and P(Flu) we can use Bayes’ Theorem to do diagnosis
expected values
Expected values
  • the expected value of a random variable that takes on numerical values is defined as:

this is the same thing as the mean

  • we can also talk about the expected value of a function of a random variable
expected value e xamples
Expected value examples
  • Suppose each lottery ticket costs $1 and the winning ticket pays out $100. The probability that a particular ticket is the winning ticket is 0.001.
the binomial d istribution
The binomial distribution
  • distribution over the number of successes in a fixed number n of independent trials (with same probability of success p in each)
  • e.g. the probability of x heads in ncoin flips






the multinomial d istribution
The multinomial distribution
  • k possible outcomes on each trial
  • probability pifor outcome xi in each trial
  • distribution over the number of occurrences xifor each outcome in a fixed number n of independent trials
  • e.g. with k=6 (a six-sided die) and n=30

vector of outcome


statistics of alignment s cores
Statistics of alignment scores

Q: How do we assess whether an alignment provides good evidence for homology?

A: determine how likely it is that such an alignment score would result from chance.

What is “chance”?

  • real but non-homologous sequences
  • real sequences shuffled to preserve compositional properties
  • sequences generated randomly based upon a DNA/protein sequence model
model for u nrelated s equences
Model forunrelatedsequences
  • we’ll assume that each position in the alignment is sampled randomly from some distribution of amino acids
  • let be the probability of amino acid a
  • the probability of an n-character alignment of x and y is given by
model for r elated s equences
Model forrelatedsequences
  • we’ll assume that each pair of aligned amino acids evolved from a common ancestor
  • let be the probability that evolution gave rise to amino acid a in one sequence and b in another sequence
  • the probability of an alignment of x and y is given by
probabilistic model of alignments
taking the log, we getProbabilistic model of alignments
  • How can we decide which possibility (U or R) is more likely?
  • one principled way is to consider the relative likelihood of the two possibilities
probabilistic model of alignments1
Probabilistic model of alignments
  • the score for an alignment is thus given by:
  • the substitution matrix score for the pair a, b should thus be given by:
scores from random a lignments
Scores from random alignments
  • suppose we assume
    • sequence lengths m and n
    • a particular substitution matrix and amino-acid frequencies
  • and we consider generating random sequences of lengths m and n and finding the best alignment of these sequences
  • this will give us a distribution over alignment scores for random pairs of sequences
the extreme v alue d istribution
The extreme value distribution
  • but we’re picking thebest alignments, so we want to know what the distribution of max scores for alignments against a random set of sequences looks like
  • this is given by an extreme value distribution
distribution of scores
Distribution of scores
  • the expected number of alignments, E, with score at least S is given by:
  • S is a given score threshold
  • m and n are the lengths of the sequences under consideration
  • K and are constants that can be calculated from
    • the substitution matrix
    • the frequencies of the individual amino acids
statistics of alignment s cores1
Statistics of alignment scores
  • to generalize this to searching a database, have n represent the summed length of the sequences in the DB (adjusting for edge effects)
  • the NCBI BLAST server does just this
  • theory for gapped alignments not as well developed
  • computational experiments suggest this analysis holds for gapped alignments (but K and must be estimated from data)