1 / 25

SNLP Chapter 2 Mathematical Foundation

SNLP Chapter 2 Mathematical Foundation. 인공지능연구실 정 성 원. Contents – Part 1. 1. Elementary Probability Theory Conditional probability Bayes’ theorem Random variable Joint and conditional distribution. Probability spaces.

sunee
Download Presentation

SNLP Chapter 2 Mathematical Foundation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNLP Chapter 2 Mathematical Foundation 인공지능연구실 정 성 원

  2. Contents – Part 1 1. Elementary Probability Theory • Conditional probability • Bayes’ theorem • Random variable • Joint and conditional distribution

  3. Probability spaces • Probability theory deals with predicting how likely it is that something will happen. • The collection of basic outcomes (or sample points) for our experiment is called the sample space(Ω). • An event is a subset of the sample space. • σ-field • Probabilities are numbers between 0 and 1, where 0 indicates impossibility and 1, certainty. • A probability function/distribution distributes a probability mass of 1 throughout the sample space. • A well-founded probability space consists of a sample space Ω, σ-field of event F, and a probability function P.

  4. Conditional probability (1/2) • P(A) : the probability of the event A • Ex1> A coin is tossed 3 times. W = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} A = {HHT, HTH, THH} : 2 heads, P(A)=3/8 B = {HHH, HHT, HTH, HTT} : first head, P(B)=1/2 : conditional probability

  5. Conditional probability (2/2) • Multiplication rule • Chain rule • Two events A, B are independent • Conditionally independent If

  6. Bayes’ theorem (1/2)

  7. Bayes’ theorem (2/2) • Ex2> G : the event of the sentence having a parasitic gap T : the event of the test being positive • This poor result comes about because the prior probability of a sentence containing a parasitic gap is so low.

  8. Random variable (확률 변수) • Ex3>Random variable X for the sum of two dice. Expectation : (기대값) Variance : (분산) S={2,…,12} probability mass function(pmf) : p(x) = p(X=x), X ~ p(x) (확률 질량 함수)

  9. Joint and conditional distributions • The joint pmf for two discrete random variables X, Y • Marginal pmfs, which total up the probability mass for the values of each variable separately. • Conditional pmf • Chain rule for y such that

  10. Contents – Part 2 2. Essential Information Theory • Entropy • Joint entropy and conditional entropy • Mutual information • The noisy channel model • Relative entropy or Kullback-Leibler divergence

  11. Shannon’s Information Theory • Maximizing the amount of information that one can transmit over an imperfect communication channel such as a noisy phone line. • Theoretical maxima for data compression • Entropy H • Theoretical maxima for the transmission rate • Channel Capacity

  12. Entropy (1/4) • The entropy H (or self-information) is the average uncertainty of a single random variable X. • Entropy is a measure of uncertainty. • The more we know about something, the lower the entropy will be. • We can use entropy as a measure of the quality of our models. • Entropy measures the amount of information in a random variable (measured in bits). where, p(x) is pmf of X

  13. Entropy (2/4) • The entropy of a weighted coin. The horizontal axis shows the probability of a weighted coin to come up heads. The vertical axis shows the entropy of tossing the corresponding coin once. P

  14. Entropy (3/4) • Ex7> The result of rolling an 8-sided die.(uniform distribution) • Entropy : The average length of the message needed to transmit an outcome of that variable. • For expectation E

  15. Entropy (4/4) • Ex8> Simplified Polynesian • We can design a code that on average takes bits to transmit a letter • Entropy can be interpreted as a measure of the size of the ‘search space’ consisting of the possible values of a random variable. bits

  16. Joint entropy and conditional entropy (1/2) • The joint entropy of a pair of discrete random variable X,Y~ p(x,y) • The conditional entropy • The chain rule for entropy

  17. Joint entropy and conditional entropy (2/2)

  18. p t k a i u p t k a i 0 u 0 1 Joint entropy and conditional entropy (2/3) • Ex9> Simplified Polynesian revisited • All words of consist of sequence of CV(consonant-vowel) syllables Marginal probabilities (per-syllable basis) Per-letter basis probabilities double back 8 page

  19. p t k a i 0 u 0 1 Joint entropy and conditional entropy (3/3)

  20. Mutual information (1/2) • By the chain rule for entropy • : mutual information • Mutual information between X and Y • The amount of information one random variable contains about another. (symmetric, non-negative) • It is 0 only when two variables are independent. • It grows not only with the degree of dependence, but also according to the entropy of the variables. • It is actually better to think of it as a measure of independence.

  21. Mutual information (2/2) • Since (entropy is called self-information) • Conditional MI and a chain rule =I(x,y) Pointwise MI

  22. Noisy channel model (1/2) • Channel capacity : the rate at which one can transmit information through the channel (optimal) • Binary symmetric channel • since entropy is non-negative,

  23. Noisy channel model (2/2)

  24. Relative entropy or Kullback-Leibler divergence • Relative entropy for two pmfs, p(x), q(x) • A measure of how close two pmfs are. • Non-negative, and D(p||q)=0 if p=q • Conditional relative entropy and chain rule

  25. The relation of language :Cross entropy • Use entropy as a measure of the quality of our models • Pointwise Entropy • Minimize D(p||m) • Cross Entropy

More Related