Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1

Quiz 3:Mean: 9.2Median: 9.75Go over problem 1

Go over Adaboost examples

Fix to C4.5 data formatting problem?

Quiz 4

Alternative simple (but effective) discretization method(Yang & Webb, 2001) Let n = number of training examples. For each attribute Ai , create bins. Sort values of Ai in ascending order, and put of them in each bin. Don’t need add-one smoothing of probabilities This gives good balance between discretization bias and variance.

Alternative simple (but effective) discretization method(Yang & Webb, 2001) Let n = number of training examples. For each attribute Ai , create bins. Sort values of Ai in ascending order, and put of them in each bin. Don’t need add-one smoothing of probabilities This gives good balance between discretization bias and variance. Humidity: 25, 38, 50, 80, 93, 98, 98,, 99

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifer(P. Domingos and M. Pazzani) Naive Bayes classifier is called “naive” because it assumes attributes are independent of one another.

This paper asks: why does the naive (“simple”) Bayes classifier, SBC, do so well in domains with clearly dependent attributes?

Experiments • Compare five classification methods on 30 data sets from the UCI ML database. SBC = Simple Bayesian Classifier Default = “Choose class with most representatives in data” C4.5 = Quinlan’s decision tree induction system PEBLS = An instance-based learning system CN2 = A rule-induction system

For SBC, numeric values were discretized into ten equal-length intervals.

Number of domains in which SBC was more accurate versus less accurate than corresponding classifier Same as line 1, but significant at 95% confidence Average rank over all domains (1 is best in each domain)

Measuring Attribute Dependence They used a simple, pairwise mutual information measure: For attributes Am and An,dependence is defined as where AmAnis a “derived attribute”, whose values consist of the possible combinations of values of Am and An Note: If Am and An are independent, then D(Am, An | C) = 0.

Results: (1) SBC is more successful than more complex methods, even when there is substantial dependence among attributes. (2) No correlation between degree of attribute dependence and SBC’s rank. But why????

An Example • Let C = {+, −}, and attributes = {A, B, C}. • Let P(+) = P(−) = 1/2. • Suppose A and C are completely independent, and A and B are completely dependent (e.g., A = B). • Optimal classification procedure:

This leads to the followingOptimal Classifier conditions: If P(A|+) P(C|+) > P(A | −) P(C| −) then class = + = else class = − • SBC conditions If P(A|+)2 P(C|+) > P(A | −)2 P(C| −) then class = + else class = −

In the paper, the authors use Bayes Theorem to rewrite these conditions, and plot the “decision boundaries” for the optimal classifier and for the SBC. + p = P(+ |A) q= P(+ | C) Optimal SBC −

Even though A and B are completely dependent, and the SBC assumes they are completely independent, the SBC gives the optimal classification in a very large part of the problem space! But why?

Explanation: SupposeC= {+,−} are the possible classes. Letxbe a new example with attributes <a1, a2, ..., an>. What the naive Bayes classifier does is calculates two probabilities, and returns the class that has the maximum probability givenx.

The probability calculations are correct only if the independence assumption is correct. • However, the classification is correct in all cases in which the relative ranking of the two probabilities, as calculated by the SBC, is correct! • The latter covers a lot more cases than the former. • Thus, the SBC is effective in many cases in which the independence assumption does not hold.

More on Bias and Variance

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt Bias

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt Variance

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt Noise

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt Sources of Bias and Variance • Bias arises when the classifier cannot represent the true function – that is, the classifier underfits the data • Variance arises when the classifier overfits the data • There is often a tradeoff between bias and variance

From knight.cis.temple.edu/~yates/cis8538/.../intro-text-classification.ppt Bias-Variance Tradeoff As a general rule, the more biased a learning machine, the less variance it has, and the more variance it has, the less biased it is.

From: http://www.ire.pw.edu.pl/~rsulej/NetMaker/index.php?pg=e06

From knight.cis.temple.edu/~yates/cis8538/.../intro-text-classification.ppt Bias-Variance Tradeoff As a general rule, the more biased a learning machine, the less variance it has, and the more variance it has, the less biased it is. Why?

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt SVM Bias and Variance • Bias-Variance tradeoff controlled by s • Biased classifier (linear SVM) gives better results than a classifier that can represent the true decision boundary!

From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt Effect of Boosting • In the early iterations, boosting is primary a bias-reducing method • In later iterations, it appears to be primarily a variance-reducing method

Bayesian NetworksReading: S. Wooldridge, Bayesian belief networks(linked from class website)

A patient comes into a doctor’s office with a fever and a bad cough. Hypothesis space H: h1: patient has flu h2: patient does not have flu Data D: coughing= true, fever = true,, smokes = true

Naive Bayes Cause smokes cough fever flu Effects

Full joint probability distribution smokes Sum of all boxes is 1. In principle, the full joint distribution can be used to answer any question about probabilities of these combined parameters. However, size of full joint distribution scales exponentially with number of parameters so is expensive to store and to compute with.  smokes

Bayesian networks • Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized. smokes cough fever flu “GraphicalModels”

cough Conditional probability tables for each node flu smoke smoke flu flu smoke cough fever fever flu

Semantics of Bayesian networks • If network is correct, can calculate full joint probability distribution from network. where parents(Xi) denotes specific values of parents of Xi.

Example • Calculate

Another (famous, though weird) Example Rain Wet grass Question: If you observe that the grass is wet, what is the probability it rained?

Sprinkler Rain Wet grass Question: If you observe that the sprinkler is on, what is the probability that the grass is wet? (Predictive inference.)

Question: If you observe that the grass is wet, what is the probability that the sprinkler is on? (Diagnostic inference.) Note that P(S) = 0.2. So, knowing that grass is wet increased probability that sprinkler is on.

Now assume the grass is wet and it rained. What is the probability that the sprinkler was on? Knowing that it rained decreases the probability that the sprinkler was on, given that the grass is wet.

Cloudy Sprinkler Rain Wet grass Question: Given that it is cloudy, what is the probability that the grass is wet?

In general... • If network is correct, can calculate full joint probability distribution from network. where parents(Xi) denotes specific values of parents of Xi. But need efficient algorithms to do this (e.g., “belief propagation”, “Markov Chain Monte Carlo”).

Complexity of Bayesian Networks For n random Boolean variables: • Full joint probability distribution: 2n entries • Bayesian network with at most k parents per node: • Each conditional probability table: at most 2kentries • Entire network: n 2k entries

Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1

Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1

Presentation Transcript

CHAPTER 3 :

Anatomy Quiz of Frog

Quiz 1-B Question Menu

Finding Regulatory Motifs in DNA Sequences

Finding Regulatory Motifs in DNA Sequences

Movie Quiz Series (April)

MOMENT OF A COUPLE (Section 4.6)

Sample Mink Dissection Quiz

Median F/U: 77 mos. Febrile Neutropenia : 5.5% for TAC with G

Quiz

Biology: Study for quiz!

AP Computer Science DYRT Quiz 09.01-07 Key

Splints for the NBCOT

PROBLEM SET 5: MORTGAGE-EQUITY VALUATION

Maths Pop Quiz

Get homework out, prepare for quiz!

QUIZ QUIZ QUIZ QUIZ QUIZ