1 / 40

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers. Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign. What is domain adaptation?. Example: named entity recognition. persons, locations, organizations, etc.

renate
Download Presentation

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

  2. What is domain adaptation? CIKM2007

  3. Example: named entity recognition persons, locations, organizations, etc. train (labeled) test (unlabeled) standard supervised learning NER Classifier 85.5% New York Times New York Times CIKM2007

  4. labeled data not available New York Times Reuters Example: named entity recognition persons, locations, organizations, etc. train (labeled) test (unlabeled) non-standard (realistic) setting NER Classifier 64.1% New York Times CIKM2007

  5. Domain difference  performance drop train test ideal setting NER Classifier NYT NYT 85.5% New York Times New York Times realistic setting NER Classifier NYT Reuters 64.1% Reuters New York Times CIKM2007

  6. Another NER example train test ideal setting gene name recognizer 54.1% mouse mouse realistic setting gene name recognizer 28.1% fly mouse CIKM2007

  7. Other examples • Spam filtering: • Public email collection  personal inboxes • Sentiment analysis of product reviews • Digital cameras  cell phones • Movies  books • Can we do better than standard supervised learning? • Domain adaptation: to design learning methods that are aware of the training and test domain difference. CIKM2007

  8. How do we solve the problem in general? CIKM2007

  9. Observation 1 domain-specific features wingless daughterless eyeless apexless … CIKM2007

  10. Observation 1 domain-specific features wingless daughterless eyeless apexless … • describing phenotype • in fly gene nomenclature • feature “-less” weighted high CD38 PABPC5 … feature still useful for other organisms? No! CIKM2007

  11. …decapentaplegic and winglessare expressed in analogous patterns in each… …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. Observation 2 generalizable features CIKM2007

  12. …decapentaplegic and winglessare expressed in analogous patterns in each… …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. Observation 2 generalizable features feature “X be expressed” CIKM2007

  13. General idea: two-stage approach domain-specific features Source Domain Target Domain generalizable features features CIKM2007

  14. Goal Source Domain Target Domain features CIKM2007

  15. Regular classification Source Domain Target Domain features CIKM2007

  16. Generalization:to emphasize generalizable features in the trained model Source Domain Target Domain features Stage 1 CIKM2007

  17. Adaptation:to pick up domain-specific features for the target domain Source Domain Target Domain features Stage 2 CIKM2007

  18. Regular semi-supervised learning Source Domain Target Domain features CIKM2007

  19. Comparison with related work • We explicitly model generalizable features. • Previous work models it implicitly [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007]. • We do not need labeled target data but we need multiple source (training) domains. • Some work requires labeled target data [Daumé III 2007]. • We have a 2nd stage of adaptation, which uses semi-supervised learning. • Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007]. CIKM2007

  20. Implementation of the two-stage approach with logistic regression classifiers CIKM2007

  21. Logistic regression classifiers 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 0 1 0 0 1 : : 0 1 0 -less p binary features X be expressed … and wingless are expressed in… wy wyTx x CIKM2007

  22. Learning a logistic regression classifier 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 0 1 0 0 1 : : 0 1 0 regularization term penalize large weights control model complexity log likelihood of training data wyTx CIKM2007

  23. Generalizable features in weight vectors D1 D2 DK K source domains 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 3.2 0.5 4.5 -0.1 3.5 : : 0.1 -1.0 -0.2 0.1 0.7 4.2 0.1 3.2 : : 1.7 0.1 0.3 domain-specific features … generalizable features w1 w2 wK CIKM2007

  24. We want to decompose w in this way h non-zero entries for h generalizable features 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 0 0 4.6 0 3.2 : : 0 0 0 0.2 4.5 0.4 -0.3 -0.2 : : 2.1 -0.9 0.4 = + CIKM2007

  25. Feature selection matrix A 0 1 0 0 1 : : 0 1 0 matrix A selects h generalizable features 0 0 1 0 0 … 0 0 0 0 0 1 … 0 : : 0 0 0 0 0 … 1 0 1 : : 0 h z = Ax A x CIKM2007

  26. Decomposition of w weights for domain-specific features weights for generalizable features 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 0 1 0 0 1 : : 0 1 0 0.2 4.5 0.4 -0.3 -0.2 : : 2.1 -0.9 0.4 0 1 0 0 1 : : 0 1 0 4.6 3.2 : : 3.6 0 1 : : 0 + = wTx = vTz + uTx CIKM2007

  27. Decomposition of w wTx = vTz + uTx = vTAx + uTx = (Av)Tx + uTx w = ATv + u CIKM2007

  28. Decomposition of w shared by all domains domain-specific 0.2 4.5 5 -0.3 3.0 : : 2.1 -0.9 0.4 0.2 4.5 0.4 -0.3 -0.2 : : 2.1 -0.9 0.4 0 0 … 0 0 0 … 0 1 0 … 0 0 0 … 0 0 1 … 0 : : 0 0 … 0 0 0 … 0 0 0 … 0 4.6 3.2 : : 3.6 = + w = ATv + u CIKM2007

  29. Source Domain Target Domain Framework for generalization Fix A, optimize: regularization term likelihood of labeled data from K source domains wk λs >> 1: to penalize domain-specific features CIKM2007

  30. Source Domain Target Domain Framework for adaptation Fix A, optimize: likelihood of pseudo labeled target domain examples λt = 1 << λs : to pick up domain-specific features in the target domain CIKM2007

  31. How to find A? (1) • Joint optimization • Alternating optimization CIKM2007

  32. How to find A? (2) • Domain cross validation • Idea: training on (K – 1) source domains and test on the held-out source domain • Approximation: • wfk: weight for feature f learned from domain k • wfk: weight for feature f learned from other domains • rank features by • See paper for details CIKM2007

  33. Intuition for domain cross validation domains D1 D2 … Dk-1 Dk (fly) w 1.5 0.05 w 2.0 1.2 … … … expressed … … … -less … … -less … … expressed … … 1.8 0.1 … … … expressed … … … -less CIKM2007

  34. Experiments • Data set • BioCreative Challenge Task 1B • Gene/protein name recognition • 3 organisms/domains: fly, mouse and yeast • Experiment setup • 2 organisms for training, 1 for testing • F1 as performance measure CIKM2007

  35. Source Domain Target Domain Source Domain Target Domain Experiments: Generalization using generalizable features is effective F: fly M: mouse Y: yeast domain cross validation is more effective than joint optimization CIKM2007

  36. Source Domain Source Domain Target Domain Target Domain Experiments: Adaptation F: fly M: mouse Y: yeast domain-adaptive bootstrapping is more effective than regular bootstrapping CIKM2007

  37. Experiments: Adaptation domain-adaptive SSL is more effective, especially with a small number of pseudo labels CIKM2007

  38. Conclusions and future work • Two-stage domain adaptation • Generalization: outperformed standard supervised learning • Adaptation: outperformed standard bootstrapping • Two ways to find generalizable features • Domain cross validation is more effective • Future work • Single source domain? • Setting parameters h and m CIKM2007

  39. References • S. Ben-David, J. Blitzer, K. Crammer & F. Pereira. Analysis of representations for domain adaptation. NIPS 2007. • J. Blitzer, R. McDonald & F. Pereira. Domain adaptation with structural correspondence learning. EMNLP 2006. • H. Daumé III. Frustratingly easy domain adaptation. ACL 2007. CIKM2007

  40. Thank you! CIKM2007

More Related