1 / 13

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing. Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu. What is sample selection bias?.

walker-clay
Download Presentation

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Type Independent Correction of Sample Selection Bias viaStructural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu

  2. What is sample selection bias? • Inductive learning: training data (x,y) is sampled from the universe of examples. • In many applications: training data (x,y) is not sampled randomly. • Insurance and mortgage data: you only know those people you give a policy. • School data: self-select

  3. What is sample selection bias?

  4. Ubiquitous • Loan Approval • Drug screening • Weather forecasting • Ad Campaign • Fraud Detection • User Profiling • Biomedical Informatics • Intrusion Detection • Insurance • etc

  5. Different types of sample selection bias • There are different possibilities of how (x,y) is selected • S=1 denotes (x,y) is chosen. • S is independent from x and y. Total random sample. • S is dependent on y not x. Class bias • S is dependent on x not on y. Feature bias. • S is dependent on both x and y. Both class and feature.

  6. Our method Structural Discovery Original Dataset Structural Re-balancing Corrected Dataset

  7. Our method • Structural Discovery via automatic clustering • Key Idea: • Binary divide. • Stop dividing when most of the labeled data in the cluster have the same label

  8. Our method • Structural Re-balancing via sample selection Key idea: (1)Select the same proportion from each cluster. (2)Select those confident and representative examples. (3)Label the unlabeled examples by neighbors

  9. Our method • Theoretical analysis:Lemma 3.1 answers that why select the same proportion of examples from each cluster can reduce sample selection bias? Lemma 3.2 derives a criterion to select confident examples.

  10. Feature Bias Accuracy of corrected minus Accuracy of original

  11. Class Bias Accuracy of corrected minus Accuracy of original

  12. Complete Bias Corrected VS. Original

  13. Advantages: 1. Type Independent 2. Model Independent 3. Straightforward Experiment Dataset and the related matlab code can be downloaded at: ftp://202.116.65.69/sxx/SDM08 Or http://www.weifan.info

More Related