1 / 14

CSE 6392 – Data Exploration and Analysis in Relational Databases

CSE 6392 – Data Exploration and Analysis in Relational Databases. January 31, 2006. Example Problem. Suppose you had the following tables:. Employee. Employee-Sample. Possible Queries. Some possible queries to get the average salary of all females in the company:

aric
Download Presentation

CSE 6392 – Data Exploration and Analysis in Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 6392 – Data Exploration and Analysis in Relational Databases January 31, 2006

  2. Example Problem Suppose you had the following tables: Employee Employee-Sample

  3. Possible Queries • Some possible queries to get the average salary of all females in the company: • Select avg(salary) from Employee where gender = “F” • Select avg(salary) from Employee-Sample where gender = “F” • Select count(*) as C, sum(salary) as S, S/C from Employee-Sample where gender = “F” • Is there a difference between 2 and 3 in terms of results? No.

  4. Estimator • What is an estimator? • Ex. count of a sample * (population/count) • On the previous slide, 2 and 3 are estimators for 1. • What is an unbiased estimator? • Basically, an estimator that is not tilted towards the lower or higher side of the estimation • Formally: • is the estimator for some quantity x • is an unbiased estimator if E[ ] = x.

  5. Unbiased Estimators • Example • select count(*) as FC from Employee where gender = “F” • select count(*) * (N/n) as EFC from Employee-Sample with gender = “F” • EFC is an unbiased estimator • (N/n) is called the ‘ratio scale’

  6. Unbiased Estimators (1) • Example • select sum(salary) as TFS from Employee where gender = “F” • select sum(salary)*(N/n) as ETFS from Employee-Sample where gender = “F” • ETFS is an unbiased estimator • Note: This is important to statisticians, but secondary for our purposes; we are more concerned about the error

  7. Unbiased Estimators (2) • Example • Select avg(salary) as AFS from Employee where gender = “F” • Select count(*) as C, sum(salary) as S, EAFS=S/C from Employee-Sample where gender = “F” • Is EAFS unbiased? Not necessarily. The use of 2 unbiased estimators does not make it unbiased (ratio estimation).

  8. Probability • Example: roll a die. How many times will you get 1, 2, 3, 4, 5 or 6?

  9. Probability Density • What is the probability that a random number generator will generate .43 (of numbers between 0 and 1)? • Answer: 0% (1/infinity) • What about between .43 and .53? • Answer: 10% (1/10) • The probability density is the area under the curve (integral) = 1. • Any single number has a 0% probability, but an interval has a chance.

  10. Probability Density Function Proper distribution if integral = 1

  11. Probability Example • How many female employees (out of 50K employees)?

  12. Probability Sample • If we sampled another company where the actual number of females is 5K, the variance would decrease:

  13. Relative Error • In Approximate Query Processing, people use absolute error statistically, but relative error practically. relative error2 = (ETFC – TFC)2 TFC2

  14. Central Limit Theorem • The main point of this theorem is that it does not matter how it was originally distributed – the sample distribution will be normal. • Normal distribution:

More Related