1 / 19

Predictive Modeling with Heterogeneous Sources

Predictive Modeling with Heterogeneous Sources. Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji University, China 3 IBM T. J. Watson Research Center 4 Hong Kong University of Science and Technology.

nieve
Download Presentation

Predictive Modeling with Heterogeneous Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi1 Qi Liu2 Wei Fan3 Qiang Yang4 Philip S. Yu1 1 University of Illinois at Chicago 2 Tongji University, China 3 IBM T. J. Watson Research Center 4Hong Kong University of Science and Technology

  2. Why learning with heterogeneous sources? Standard Supervised Learning Training (labeled) Test (unlabeled) Classifier 85.5% New York Times New York Times

  3. Why heterogeneous sources? How to improve the performance? In Reality… Training (labeled) Test (unlabeled) 47.3% Labeled data are insufficient! New York Times New York Times

  4. Why heterogeneous sources? Labeled data from other sources Target domain test (unlabeled) 82.6% 47.3% New York Times Reuters • Different distributions • Different outputs • Different feature spaces

  5. Real world examples • Social Network: • Can various bookmarking systems help predict social tags for a new system given that their outputs (social tags) and data (documents) are different? Wikipedia ODP Backflip Blink …… ? 4/18

  6. Real world examples • Applied Sociology: • Can the suburban housing price census data help predict the downtown housing prices? ? #rooms #bathrooms #windows price 5 2 12 XXX 6 3 11 XXX #rooms #bathrooms #windows price 2 1 4 XXXXX 4 2 5 XXXXX 5/18

  7. Other examples • Bioinformatics • Previous years’ flu data  new swine flu • Drug efficacy data against breast cancer  drug data against lung cancer • …… • Intrusion detection • Existing types of intrusions  unknown types of intrusions • Sentiment analysis • Review from SDM Review from KDD 6/18

  8. Learning with Heterogeneous Sources • The paper mainly attacks two sub-problems: • Heterogeneous data distributions • Clustering based KL divergence and a corresponding sampling technique • Heterogeneous outputs (to regression problem) • Unifying outputs via preserving similarity. 7/18

  9. Learning with Heterogeneous Sources • General Framework Source data Unifying data distributions Unifying outputs Target data Target data Source data 8/18

  10. Unifying Data Distributions • Basic idea: • Combine the source and target data and perform clustering. • Select the clusters in which the target and source data are similarly distributed, evaluated by KL divergence. 9/18

  11. An Example T D Adaptive Clustering Combined Data 10/18

  12. Unifying Outputs • Basic idea: • Generate initial outputs according to the regression model • For the instances similar in the original output space, make their new outputs closer. 11/18

  13. Modification Modification Initial Outputs Initial Outputs 21.25 31.75 37 16 26.5 12/18

  14. Experiment • Bioinformatics data set: 13/18

  15. Experiment 14/18

  16. Experiment • Applied sociology data set: 15/18

  17. Experiment 16/18

  18. Conclusions • Problem: Learning with Heterogeneous Sources: • Heterogeneous data distributions • Heterogeneous outputs • Solution: • Clustering based KL divergence help perform sampling • Similarity preserving output generation help unify outputs

  19. Thanks!

More Related