Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications. Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology Updated PPT is available: http//www.weifan.info/tutorial.htm. Overview of Sample Selection Bias Problem. A Toy Example.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Wei Fan, IBM T.J.Watson Research
Masashi Sugiyama, Tokyo Institute of Technology
Updated PPT is available:
http//www.weifan.info/tutorial.htm
Unbiased 96.9% Solutions, and Applications
Unbiased 97.1%
Unbiased 96.405%
Biased 95.9%
Biased 92.7%
Biased 92.1%
Effect on LearningThe Yale Face Database B
Figure provided by Fraunhofer FIRST, Berlin, Germany
Movie provided by Fraunhofer FIRST, Berlin, Germany
Movie provided by Fraunhofer FIRST, Berlin, Germany
Bandpower differences between
training and test phases
Features extracted from brain activity
during training and test phases
Figures provided by Fraunhofer FIRST, Berlin, Germany
Khepera Robot
Robot moves autonomously
= goes forward without hitting wall
Evaluate
control policy
Improve
control policy
(Weak) extrapolation:
Predict output values outside training region
Training samples
Test samples
Training samples
Test samples
True function
Learned function
: expectation over noise
If model is correct: Solutions, and Applications
OLS minimizes bias asymptotically
If model is misspecified:
OLS does not minimize bias even asymptotically.
We want to reduce bias!
Ordinary LeastSquares (OLS)(cf. importance sampling)
Even for misspedified models, IWLS Solutions, and Applicationsminimizes bias asymptotically.
We need to estimate importance in practice.
ImportanceWeighted LS(Shimodaira, JSPI2000)
:Assumed strictly positive
When solving a problem,
more difficult problems shouldn’t be solved.
(e.g., support vector machines)
Knowing densities
Knowing ratio
(objective function)
(constraint)
(Sugiyama et al., NIPS2007)
KDE
Normalized MSE
KLIEP
dim
Averaging of estimated class probabilities weighted by posterior
Posterior
weighting
Integration Over
Model Space
Class
Probability
Removes model uncertainty by averaging
Multiple models
Structural Discovery
Original Dataset
Structural Rebalancing
Corrected Dataset
Good input location
Poor input location
Target
Learned
(Fedorov 1972; Cohn et al., JAIR1996)
(Wiens JSPI2001; Kanamori & Shimodaira JSPI2003; Sugiyama JMLR2006)
Importance
Polynomial of order 1
Polynomial of order 2
Polynomial of order 3
Model complexity
Model complexity
…
Group 1
Group 2
Group k1
Group k
Training
Validation
(Zadrozny ICML2004; Sugiyama et al., JMLR2007)
Set 1
Set 2
Set k1
Set k
…
Training
Testing
MA 0.4
MBA
MAA
Labeled
test data
MBB
MB
MAB
A
A
DA
B
B
DB
Reserve Testing (Fan and Davidson’06)Train
Test
Train
Estimate the performance of MA and MB based on the order of MAA, MAB, MBA and MBB
Sparse Region
Challenges as a Data Mining Problem 0.4
P(YX) = P(YXr) only if the data is exhaustive.
1 0.4
1
2
+
2
+
+
+

3
3

+
+
Testing Distribution
Training Distribution
Ma 0.4
Mb
VE
Precision
Estimated
probability
values
1 fold
Estimated
probability
values
10 fold
Estimated
probability
values
2 fold
Concatenate
10CV
Recall
“Probability
TrueLabel”
file
PrecRec
plot
Decision
threshold
VE
TrainingSet Algorithm
…..
10CV
Concatenate
1
1
2
+
2
+
+
+
P(y=“ozoneday”x,θ) Lable
7/1/98 0.1316 Normal
7/3/98 0.5944 Ozone
7/2/98 0.6245 Ozone
………
P(y=“ozoneday”x,θ) Lable
7/1/98 0.1316 Normal
7/2/98 0.6245 Ozone
7/3/98 0.5944 Ozone
………

3
3

+
+
Testing Distribution
Training Distribution
Classification on future days 0.4
Whole TrainingSet
if P(Y = “ozonedays”X,θ ) ≥ VE
θ
Predict “ozonedays”
Addressing Data Mining ChallengesNO User
or Movie
Arrival
User Arrival
Movie Arrival
Task 1
17K movies
Task 2
Training Data
1998 Time 2005 2006
Qualifier
Dataset
3M
…… 0.4
Movie5 .0011
……
Movie3 .001
……
Movie4 .0007
….
1488844,3,20050906
822109,5,20050513
885013,4,20051019
30878,4,20051226
823519,3,20040503
…
……
Movie5 User 7
……
Movie3 User 7
……
Movie4 .User 8
……
User7 .0007
……
User6 .00012
……
User8 .00003
……
Task 1: Effective Sampling StrategiesMovies
Samples
History
Users
KL divergence from training
to test input distributions
Active learning problem!
Agnostic setup!
(Sugiyama & Nakajima ECMLPKDD2008)
Mean squared error of wafer position estimation