1 / 24

Introduction to Machine Learning for Information Retrieval

Introduction to Machine Learning for Information Retrieval. Xiaolong Wang. What is Machine Learning. In short, tricks of m aths Two major tasks: Supervised Learning: a.k.a. Regression, Classification… Unsupervised Learning: a.k.a. data manipulation, clustering …. Supervised Learning.

Download Presentation

Introduction to Machine Learning for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Machine Learning for Information Retrieval Xiaolong Wang

  2. What is Machine Learning • In short, tricks of maths • Two major tasks: • Supervised Learning: • a.k.a. Regression, Classification… • Unsupervised Learning: • a.k.a. data manipulation, clustering …

  3. Supervised Learning • Label:usually manually labeled • Data: data representation, usually as a vector • Prediction Function:selecting one from a predefined family of functions that has the best prediction classification regression

  4. Supervised Learning • Two formulations: • F1: Given a set of Xi, Yi, learn a function • Yi • Binary: Spam v.s. Non-spam • Numeric: Very relevant(5), somewhat relevant(4), marginal relevant(3), somewhat irrelevant(2), very irrelevant(1) • Xi • Number of words, occurrence of each word, … • f • usually linear function

  5. Supervised Learning • Two formulations: • F2: Give a set of Xi, Yi ,learn a function such that • Yi: more complex label than binary or numeric • Multiclass learning: entertainment v.s. sports v.s. politics… • Structural learning: syntactic parsing more general Y X

  6. Supervised Learning • Training • Optimization: • Loss: difference b/w true label Yiand predicted label wTXi • Squared Loss (regression): (Yi – wTXi)2 • Hinge Loss (classification): max(0, 1 – Yi .wTXi) • Logistic Loss (classification): log(1 + exp(-Yi .wTXi))

  7. Supervised Learning • Training • Optimization: • Regularization: Without regularization: overfitting

  8. Supervised Learning • Training • Optimization: • Regularization: Large margin, small ||w||

  9. Supervised Learning • Optimization: • Art of maximization • Unconstraint: • First order: Gradient descent • Second order: Newtonian method • Stochastic: stochastic gradient descent (SGD) • Constraint: • Active set method • Interior Point Method • Alternative Direction Method of Multiplier (ADMM)

  10. Unsupervised Learning • Clustering: • PCA • kNN

  11. Machine Learning for Information Retrieval • Learning to Rank • Topic Modeling

  12. Learning to Rank http://research.microsoft.com/en-us/people/hangli/li-acl-ijcnlp-2009-tutorial.pdf

  13. Learning to Rank • X = (q, d) • Features: e.g. Matching between Query and Document

  14. Learning to Rank

  15. Learning to Rank • Labels: • Pointwise: relevant vs. irrelevant; 5,4,3,2,1 • Pairwise: doc A > doc B, doc C > doc D • Listwise: permutation • Acquisition: • Expert Annotation • Clickthrough: click ,skip above

  16. Learning to Rank

  17. Learning to Rank • Prediction function: • Extract Xq,d from (q, d) • Ranking document by sorting wTXq,d • Loss function: • Pointwise • Pairwise • Listwise

  18. Learning to Rank • Pointwise: • Regression: Square loss • Pairwise: • Classification: (q, d1) > (q, d2) => positive example Xq,d1 – Xq, d2 • Listwise: • Optimization: NDCG@j Relevance (0/1) of document at rank i Gain Discount of rank i Normalized Cumulative

  19. Topic Modeling • Topic Modeling • Factorization of Words * Documents matrix • Clustering of document • Projectdocuments(vectorof#vocabulary)intolowerdimension(vectorof#topics) • WhatisTopic? • Linearcombinationofwords • Nonnegative weights,sumto1=>probability

  20. Topic Modeling • Generative models: story-telling • Latent Semantic Analysis, LSA • Probabilistic Latent Semantic Analysis, PLSA • Latent DirichletAllocation, LDA

  21. Topic Modeling • Latent Semantic Analysis (LSA): • Deerwesteret al (1990) • Singular Value Decomposition (SVD) applied to words * documents matrix • How to interpret negative values?

  22. Topic Modeling • Probabilistic Latent Semantic Analysis (PLSA): • Thomas Hofmann (1999) • How words/documents are generated (as described by probability) topics documents documents topics documents words Maximal Likelihood: …… d1, voyage d2, sky d1, fish d3, trip d1, boat d2, voyage

  23. Topic Modeling • Latent Dirichlet Allocation (LDA) • David Blei et al. (2003) • PLSA with a Dirichlet prior • What is Bayesian inference? Conjugate Prior? Posterior? Frequentistv.s. Bayesian • Tossing a Coin prior likelihood Parameter to be estimated Posterior probability • Canonical Maximal Likelihood (Frequentist) as a special form of Bayesian Maximal a Posterior (MAP) when g(r) is uniform prior • Bayesian as an inference method: • Estimate r: posterior mean, or MAP • Estimate new toss to be head:

  24. Topic Modeling • Latent Dirichlet Allocation (LDA) • David Blei et al. (2003) • PLSA with a Dirichletprior • What additional info we know about ? • Sparsity: • each topic has nonzero probability on few words; • each document has nonzero probability on few topics; topics documents documents topics documents words Dirichlet distribution defines probability on simplex • Parameter of Multinomial: • Nonnegative • Sum to 1 simplex Dirichlet can encourage sparsity

More Related