1 / 15

Data and its Distribution

Data and its Distribution. The popular table. Table (relation) propositional, attribute-value Example record, row, instance, case Table represents a sample from a larger population independent, identically distributed Attribute variable, column, feature, item Target attribute, class

Download Presentation

Data and its Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data and its Distribution

  2. The popular table • Table (relation) • propositional, attribute-value • Example • record, row, instance, case • Table represents a sample from a larger population • independent, identically distributed • Attribute • variable, column, feature, item • Target attribute, class • Sometimes rows and columns are swapped • bioinformatics

  3. Example: play tennis data attributes examples

  4. Example: play tennis data attributes examples target attribute

  5. Example: play tennis data three examples covered, 100% correct if Outlook = sunny and Humidity = high then play = no

  6. Numeric tennis data numeric attributes

  7. Numeric tennis data numeric attributes

  8. Numeric tennis data if Outlook = sunny and Humidity > 83 then play = no if Temperature < Humidity then play = no

  9. Types • Nominal, categorical, symbolic, discrete • only equality (=) • no distance measure • Numeric • inequalities (<, >, ≤, ≥) • arithmetic • distance measure • Ordinal • inequalities • no arithmetic or distance measure • Binary • like nominal, but only two values, and True (1, yes, y) plays special role.

  10. Distributions

  11. Univariate (probability) distribution • What values occur for an attribute and how often? • count occurrences • Counts are complete information about sample • actual data can be ignored from here on • Data is a sample of a population • counts are probability estimates

  12. Attribute information: entropy • How informative is an attribute? • (How informative is an attribute about the value of another attribute?) • if an attribute is not informative, it cannot be informative about another • Entropy • a measure for the amount of information/chaos usefulness 1 bit entropy do you own a Mercedes? social security nr. gender highest degree

  13. Distribution of a Binary Attribute • Only two values • probabilities pand 1-p • Entropy:H(A) = – plg(p) – (1–p)lg(1–p) • lg(p) is the 2-log of p • H(A)is maximal when p = ½ = 1/m(mis the number of values) • uniform distribution • e.g., gender

  14. Entropy, Binary case gender, coin flip, … do you own a Mercedes? do you own a car? are you an alien? Entropy: H(A) = – plg(p) – (1–p)lg(1–p)

  15. Distribution of nominal attribute • Multiple values (m) • each with probability pi • Entropy:H(A) = Σ–pilg(pi) • notice binary as special case • H is maximal when p = 1/m • uniform distribution • Hmax = –m1/m lg(1/m) = –lg(1/m) = lg m • e.g. season of booking date • m = 4 • at most lg(m) = lg(4) = 2bits • Q: what if only summer and winter? bar chart

More Related