slide1
Download
Skip this Video
Download Presentation
Seminar: Statistical NLP

Loading in 2 Seconds...

play fullscreen
1 / 89

Seminar: Statistical NLP - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Seminar: Statistical NLP. Machine Learning for Natural Language Processing. Lluís Màrquez TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya. Girona, June 2003. Outline. Machine Learning for NLP. The Classification Problem Three ML Algorithms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Seminar: Statistical NLP' - tausiq


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Seminar: Statistical NLP

Machine Learning for Natural Language Processing

Lluís Màrquez

TALP Research Center

Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Girona, June 2003

outline
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms
  • Applications to NLP
outline1
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms
  • Applications to NLP
machine learning
There are many general-purpose definitions of Machine Learning (or artificial learning):

Making a computer automatically acquire some kind of knowledge from a concrete data domain

ML4NLP

Machine Learning
  • Learners are computers: we study learning algorithms
  • Resources are scarce: time, memory, data, etc.
  • It has (almost) nothing to do with: Cognitive science, neuroscience, theory of scientific discovery and research, etc.
  • Biological plausibility is welcome but not the main goal
machine learning1
We will concentrate on:

Supervisedinductive learning for classification

= discriminative learning

ML4NLP

Machine Learning
  • Learning... but what for?
    • To perform some particular task
    • To react to environmental inputs
    • Concept learning from data:
      • modelling concepts underlying data
      • predictingunseen observations
      • compacting the knowledge representation
      • knowledge discovery for expert systems
machine learning2
What to read?

Machine Learning (Mitchell, 1997)

Obtaining a description of the concept in some representation language that explains observations and helps predicting new instances of the same distribution

ML4NLP

Machine Learning

A more precise definition:

empirical nlp
Lexical and structural ambiguity problems

Word selection (SR, MT)

Part-of-speech tagging

Semantic ambiguity (polysemy)

Prepositional phrase attachment

Reference ambiguity (anaphora)

etc.

Clasification

problems

ML4NLP

Empirical NLP

90’s: Application of Machine Learning techniques

(ML) to NLP problems

  • What to read? Foundations of Statistical Language Processing (Manning & Schütze, 1999)
nlp classification problems
Ambiguity is a crucial problem for natural language understanding/processing. Ambiguity Resolution = Classification

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

(The Wall Street Journal Corpus)

nlp classification problems1
Morpho-syntactic ambiguity

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

nlp classification problems2
Morpho-syntactic ambiguity: Part of Speech Tagging

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

nlp classification problems3
Semantic (lexical) ambiguity

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

body-part

clock-part

(The Wall Street Journal Corpus)

nlp classification problems4
Semantic (lexical) ambiguity: Word Sense Disambiguation

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

body-part

clock-part

(The Wall Street Journal Corpus)

nlp classification problems5
Structural (syntactic) ambiguity

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chased the robbers in the back street

(The Wall Street Journal Corpus)

nlp classification problems6
Structural (syntactic) ambiguity

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he chasedthe robbersin the back street

(The Wall Street Journal Corpus)

nlp classification problems7
Structural (syntactic) ambiguity:PP-attachment disambiguation

ML4NLP

NLP “classification” problems
  • He was shot in the hand as he (chased (the robbers)NP(in the back street)PP)

(The Wall Street Journal Corpus)

outline2
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms in detail
  • Applications to NLP
feature vector classification
An instance is a vector: x=<x1,…, xn>whose components, called features (or attributes), are discrete or real-valued.

Let X be the space of all possible instances.

Let Y={y1,…, ym}be the set of categories (or classes).

The goal is to learn an unknown target function, f : X Y

A training exampleis an instance xbelonging to X, labelled with the correct value for f(x), i.e., a pair <x, f(x)>

Let D be the set of all training examples.

Classification

Feature Vector Classification

IA

perspective

feature vector classification1
The goal is to find a function h belonging to H such that for all pair <x,f(x)>belonging to D, h(x) = f(x)

Classification

Feature Vector Classification
  • The hypotheses space, H, is the set of functions h: X Y that the learner can consider as possible definitions
an example

Decision Tree

Rules

COLOR

(COLOR=red) Ù

(SHAPE=circle) Þ positive

blue

red

SHAPE

negative

circle

triangle

positive

negative

Classification

An Example

otherwise Þ negative

an example1

Decision Tree

Rules

SIZE

(SIZE=small)Ù(SHAPE=circle) Þ positive

small

big

(SIZE=big)Ù(COLOR=red) Þ positive

SHAPE

COLOR

otherwise Þ negative

red

circle

triang

blue

neg

pos

pos

neg

Classification

An Example
some important concepts
Inductive Bias

“Any means that a classification learning system uses to choose between to functions that are both consistent with the training data is called inductive bias” (Mooney & Cardie, 99)

Language / Search bias

Decision Tree

COLOR

blue

red

SHAPE

negative

circle

triangle

positive

negative

Classification

Some important concepts
some important concepts1
Inductive Bias

Training error and generalization error

Classification

Some important concepts
  • Generalization ability and overfitting
  • Batch Learning vs. on-line Leaning
  • Symbolic vs. statistical Learning
  • Propositional vs. first-order learning
slide23

Relational learning = ILP (induction of logic programs)

course(X) Ù person(Y) Ù link_to(Y,X) Þinstructor_of(X,Y)

research_project(X) Ù person(Z) Ù link_to(L1,X,Y) Ù

link_to(L2,Y,Z)Ù neighbour_word_people(L1)Þmember_proj(X,Z)

Classification

Propositional vs. Relational Learning

  • Propositional learning

color(red) Ù shape(circle) ÞclassA

the classification setting class point example data set

Classification

The Classification SettingClass, Point, Example, Data Set, ...

CoLT/SLT

perspective

  • Input Space: XRn
  • (binary) Output Space: Y = {+1,-1}
  • A point, pattern or instance:x  X, x = (x1, x2, …, xn)
  • Example: (x, y)with x  X, y Y
  • Training Set: a set of m examples generated i.i.d. according to an unknown distribution P(x,y)S = {(x1, y1), …, (xm, ym)}  (X  Y)m
the classification setting learning error

Classification

The Classification SettingLearning, Error, ...
  • The hypotheses space, H, is the set of functions h: XY that the learner can consider as possible definitions. In SVM are of the form:
  • The goal is to find a function h belonging to H such that the expected misclassification error on new examples, also drawn from P(x,y), is minimal (Risk Minimization, RM)
the classification setting learning error1

Classification

The Classification SettingLearning, Error, ...
  • Expected error (risk)
  • Problem: P itself is unknown. Known are training examples an induction principle is needed
  • Empirical Risk Minimization (ERM): Find the function h belonging to H for which the training error (empirical risk) is minimal
the classification setting error over under fitting

Overfitting

Underfitting

Classification

The Classification SettingError, Over(under)fitting,...
  • Low training error  low true error?
  • The overfitting dilemma:

(Müller et al., 2001)

  • Trade-off between training error and complexity
  • Different learning biases can be used
outline3
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms
  • Applications to NLP
outline4
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms
    • Decision Trees
    • AdaBoost
    • Support Vector Machines
  • Applications to NLP
learning paradigms

Algorithms

Learning Paradigms
  • Statistical learning:
    • HMM, Bayesian Networks, ME, CRF, etc.
  • Traditional methods from Artificial Intelligence (ML, AI)
    • Decision trees/lists, exemplar-based learning, rule induction, neural networks, etc.
  • Methods from Computational Learning Theory (CoLT/SLT)
    • Winnow, AdaBoost, SVM’s, etc.
learning paradigms1

Algorithms

Learning Paradigms
  • Classifier combination:
    • Bagging, Boosting, Randomization, ECOC, Stacking, etc.
  • Semi-supervised learning: learning from labelled and unlabelled examples
    • Bootstrapping, EM, Transductive learning (SVM’s, AdaBoost), Co-Training, etc.
  • etc.
decision trees

Algorithms

Decision Trees
  • Decision trees are a way to represent rules underlying training data, with hierarchical structures that recursively partition the data.
  • They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
  • From a machine-learning perspective: Decision Trees are n-ary branching trees that represent classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes
decision trees1

Algorithms

Decision Trees
  • Acquisition: Top-Down Induction of Decision Trees (TDIDT)
  • Systems:

CART (Breiman et al. 84),

ID3, C4.5, C5.0 (Quinlan 86,93,98),

ASSISTANT, ASSISTANT-R (Cestnik et al. 87) (Kononenko et al. 95)

etc.

an example2

A1

v1

v3

v2

...

A2

A2

A3

...

...

v5

v4

Decision Tree

A5

A2

...

SIZE

v6

small

big

C3

A5

SHAPE

COLOR

v7

red

circle

triang

blue

C1

C2

C1

neg

pos

pos

neg

Algorithms

An Example
learning decision trees

Training

DT

Training

Set

+

TDIDT

=

Test

DT

+

=

Example

Class

Algorithms

Learning Decision Trees
general induction algorithm

functionTDIDT (X:set-of-examples; A:set-of-features)

var: tree1,tree2: decision-tree;

X’: set-of-examples;

A’: set-of-features

end-var

if (stopping_criterion(X)) then

tree1 := create_leaf_tree(X)

else

amax := feature_selection(X,A);

tree1 := create_tree(X, amax);

for-all val invalues(amax) do

X’ := select_examples(X,amax,val);

A’ := A - {amax};

tree2 := TDIDT(X’,A’);

tree1 := add_branch(tree1,tree2,val)

end-for

end-if

return(tree1)

end-function

Algorithms

General Induction Algorithm
general induction algorithm1

functionTDIDT (X:set-of-examples; A:set-of-features)

var: tree1,tree2: decision-tree;

X’: set-of-examples;

A’: set-of-features

end-var

if (stopping_criterion(X)) then

tree1 := create_leaf_tree(X)

else

amax := feature_selection(X,A);

tree1 := create_tree(X, amax);

for-all val invalues(amax) do

X’ := select_examples(X,amax,val);

A’ := A - {amax};

tree2 := TDIDT(X’,A’);

tree1 := add_branch(tree1,tree2,val)

end-for

end-if

return(tree1)

end-function

Algorithms

General Induction Algorithm
feature selection criteria
Functions derived from Information Theory:

Information Gain, Gain Ratio (Quinlan 86)

Functions derived from Distance Measures

Gini Diversity Index (Breiman et al. 84)

RLM (López de Mántaras 91)

Statistically-based

Chi-square test (Sestito & Dillon 94)

Symmetrical Tau (Zhou & Dillon 91)

RELIEFF-IG: variant of RELIEFF (Kononenko 94)

Algorithms

Feature Selection Criteria
extensions of dts

Algorithms

Extensions of DTs

(Murthy 95)

  • Pruning (pre/post)
  • Minimize the effect of the greedy approach: lookahead
  • Non-lineal splits
  • Combination of multiple models
  • Incremental learning (on-line)
  • etc.
decision trees and nlp

Algorithms

Decision Trees and NLP
  • Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
  • POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
  • Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
  • Parsing (Magerman 95,96; Haruno et al. 98,99)
  • Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
  • Text summarization (Mani & Bloedorn 98)
  • Dialogue act tagging (Samuel et al. 98)
decision trees and nlp1

Algorithms

Decision Trees and NLP
  • Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
  • Discourse analysis in information extraction (Soderland & Lehnert 94)
  • Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
  • Verb classification in Machine Translation (Tanaka 96; Siegel 97)
decision trees pros cons

Algorithms

Decision Trees: pros&cons
  • Advantages
    • Acquires symbolic knowledge in a understandable way
    • Very well studied ML algorithms and variants
    • Can be easily translated into rules
    • Existence of available software: C4.5, C5.0, etc.
    • Can be easily integrated into an ensemble
decision trees pros cons1

Algorithms

Decision Trees: pros&cons
  • Drawbacks
    • Computationally expensive when scaling to large natural language domains: training examples, features, etc.
    • Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
    • DTs is a model with high variance (unstable)
    • Tendency to overfit training data: pruning is necessary
    • Requires quite a big effort in tuning the model
boosting algorithms

Algorithms

Boosting algorithms
  • Idea

“to combine many simple and moderately accurate hypotheses (weak classifiers) into a single and highly accurate classifier”

  • AdaBoost(Freund & Schapire 95) has been theoretically and empirically studied extensively
  • Many other variants extensions (1997-2003)

http://www.lsi.upc.es/~lluism/seminari/ml&nlp.html

adaboost general scheme

Linear

combination

TEST

F(h1,h2,...,hT)

a1

a2

aT

hT

h1

h2

. . .

Weak

Learner

Weak

Learner

Weak

Learner

Probability

distribution

updating

TS1

TST

TS2

. . .

D1

DT

D2

Algorithms

AdaBoost: general scheme

TRAINING

adaboost example

Algorithms

AdaBoost: example

Weak hypotheses = vertical/horizontal hyperplanes

adaboost and nlp

Algorithms

AdaBoost and NLP
  • POS Tagging(Abney et al. 99; Màrquez 99)
  • Text and Speech Categorization(Schapire & Singer 98; Schapire et al. 98; Weiss et al. 99)
  • PP-attachment Disambiguation(Abney et al. 99)
  • Parsing(Haruno et al. 99)
  • Word Sense Disambiguation(Escudero et al. 00, 01)
  • Shallow parsing(Carreras & Màrquez, 01a; 02)
  • Email spam filtering(Carreras & Màrquez, 01b)
  • Term Extraction(Vivaldi, et al. 01)
adaboost pros cons

Algorithms

AdaBoost: pros&cons
  • Easy to implement and few parameters to set
  • Time and space grow linearly with number of examples. Ability to manage very large learning problems
  • Does not constrain explicitly the complexity of the learner
  • Naturally combines feature selection with learning
  • Has been succesfully applied to many practical problems
adaboost pros cons1

Algorithms

AdaBoost: pros&cons
  • Seems to be rather robust to overfitting (number of rounds) but sensitive to noise
  • Performance is very good when there are relatively few relevant terms (features)
  • Can perform poorly when there is insufficient training data relative to the complexity of the base classifiers, the training errors of the base classifiers become too large too quickly
slide55

Algorithms

SVM: A General Definition

  • “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)
svm a general definition

Algorithms

SVM: A General Definition
  • “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linearfunctions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)

Key Concepts

linear classifiers

+

+

+

+

+

_

w

_

_

_

+

_

_

_

_

_

Algorithms

Linear Classifiers
  • Hyperplanesin RN.
  • Defined by a weight vector (w) and a threshold (b).
  • They induce a classification rule:
optimal hyperplane geometric intuition1

These are the

Support

Vectors

Algorithms

Optimal Hyperplane: Geometric Intuition

Maximal

Margin

Hyperplane

linearly separable data

Seminari SVMs 22/05/2001

Algorithms

Linearly separable data

Quadratic

Programming

non linear svms

Non-linear mapping

Set of hypotheses

Dual formulation

Kernel function

Evaluation

Seminari SVMs 22/05/2001

Algorithms

Non-linear SVMs
  • Implicit mapping into feature space via kernel functions
non linear svms1

Seminari SVMs 22/05/2001

Algorithms

Non-linear SVMs
  • Kernel functions
    • Must be efficiently computable
    • Characterization via Mercer’s theorem
    • One of the curious facts about using a kernel is that we do not need to know the underlying feature map in order to be able to learn in the feature space! (Cristianini & Shawe-Taylor, 2000)
    • Examples: polynomials, Gaussian radial basis functions, two-layer sigmoidal neural networks, etc.
non linear svms2

Seminari SVMs 22/05/2001

Algorithms

Non linear SVMs

Degree 3 polynomial kernel

lin. non-separable

lin. separable

toy examples

Algorithms

Toy Examples
  • All examples have been run with the 2D graphic interface of SVMLIB (Chang and Lin, National University of Taiwan)

“LIBSVMis an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, un-SVR) and distribution estimation (one-class SVM). It supports multi-class classification. The basic algorithm is a simplification of both SMO by Platt and SVMLight by Joachims. It is also a simplification of the modification 2 of SMO by Keerthy et al. Our goal is to help users from other fields to easily use SVM as a tool. LIBSVM provides a simple interface where users can easily link it with their own programs…”

  • Available from: www.csie.ntu.edu.tw/~cjlin/libsvm(it icludes a Web integrated demo tool)
toy examples i

.

What happens if we add

a blue training example

here?

Algorithms

Toy Examples (I)

Linearly separable data set

Linear SVM

Maximal margin Hyperplane

toy examples i1

Algorithms

Toy Examples (I)

(still) Linearly separable data set

Linear SVM

High value of C parameter

Maximal margin Hyperplane

The example is

correctly classified

toy examples i2

Algorithms

Toy Examples (I)

(still) Linearly separable data set

Linear SVM

Low value of C parameter

Trade-off between: margin and training error

The example is

now a bounded SV

svm summary

Algorithms

SVM: Summary
  • SVMs introduced in COLT’92 (Boser, Guyon, & Vapnik, 1992). Great developement since then
  • Kernel-induced feature spaces: SVMs work efficiently in very high dimensional feature spaces (+)
  • Learning bias: maximal margin optimisation. Reduces the danger of overfitting. Generalization bounds for SVMs (+)
  • Compact representation of the induced hypothesis. The solution is sparse in terms of SVs (+)
svm summary1

Algorithms

SVM: Summary
  • Due to Mercer’s conditions on the kernels the optimi-sation problems are convex. No local minima (+)
  • Optimisation theory guides the implementation. Efficient learning (+)
  • Mainly for classification but also for regression, density estimation, clustering, etc.
  • Success in many real-world applications: OCR, vision, bioinformatics, speech recognition, NLP: TextCat, POS tagging, chunking, parsing, etc. (+)
  • Parameter tuning (–). Implications in convergence times, sparsity of the solution, etc.
outline5
Outline
  • Machine Learning for NLP
  • The Classification Problem
  • Three ML Algorithms
  • Applications to NLP
nlp problems

Applications

NLP problems
  • Warning! We will not focus on final NLP applications, but on intermediate tasks...
  • We will classify the NLP tasks according to their (structural) complexity
nlp problems structural complexity

Applications

NLP problems: structural complexity
  • Decisional problems
    • Text Categorization, Document filtering, Word Sense Disambiguation, etc.
  • Sequence tagging and detection of sequential structures
    • POS tagging, Named Entity extraction, syntactic chunking, etc.
  • Hierarchical structures
    • Clause detection, full parsing, IE of complex concepts, composite Named Entities, etc.
pos tagging
Morpho-syntactic ambiguity: Part of Speech Tagging

Applications

POS tagging
  • He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

pos tagging1

root

P(IN)=0.81

P(RB)=0.19

Word Form

“As”,“as”

others

...

P(IN)=0.83

P(RB)=0.17

tag(+1)

RB

others

...

P(IN)=0.13

P(RB)=0.87

tag(+2)

Probabilistic interpretation:

IN

^

P( RB | word=“A/as”  tag(+1)=RB  tag(+2)=IN) = 0.987

P(IN)=0.013

P(RB)=0.987

^

P( IN | word=“A/as”  tag(+1)=RB  tag(+2)=IN) = 0.013

leaf

Applications

POS tagging

“preposition-adverb” tree

pos tagging2

root

P(IN)=0.81

P(RB)=0.19

Word Form

“As”,“as”

others

...

P(IN)=0.83

P(RB)=0.17

tag(+1)

RB

others

...

P(IN)=0.13

P(RB)=0.87

tag(+2)

IN

P(IN)=0.013

P(RB)=0.987

leaf

Applications

POS tagging

“preposition-adverb” tree

Collocations:

“as_RB much_RB as_IN”

“as_RB soon_RB as_IN”

“as_RB well_RB as_IN”

pos tagging3

A Sequential Model for Multi-class Classification:

NLP/POS Tagging (Even-Zohar & Roth, 01)

Applications

POS tagging

RTT (Màrquez & Rodríguez 97)

Language

Model

stop?

Filter

Classify

Update

Tagged

text

Raw

text

Morphological

analysis

yes

no

Disambiguation

pos tagging4

LanguageModel

Lexical

probs.

+

The Use of Classifiers in sequential inference:

Chunking (Punyakanok & Roth, 00)

Contextual probs.

Viterbi

algorithm

Tagged

text

Raw

text

Morphological

analysis

Disambiguation

Applications

POS tagging

STT (Màrquez & Rodríguez 97)

summary conclusions

Conclusions

Summary/conclusions
  • We have briefly outlined:
    • The ML setting: “supervised learning for classification”
    • Three concrete machine learning algorithms
    • How to apply them to solve itermediate NLP tasks
slide85
Any ML algorithm for NLP should be:

Robust to noise and outliers

Efficient in large feature/example spaces

Adaptive to new/changing domains: portability, tuning, etc.

Able to take advantage of unlabelled examples: semi-supervised learning

Conclusions

Summary/conclusions

summary conclusions1

Conclusions

Summary/conclusions
  • Statistical and ML-based Natural Language Processing is a very active and multidisciplinary area of research
some current research lines

Conclusions

Some current research lines
  • Appropriate learning paradigm for all kind of NLP problems: TiMBL(DBZ99), TBEDL(Brill95),ME(Ratnaparkhi98),SNoW(Roth98), CRF (Pereira & Singer02).
  • Definition of an adequate (and task-specific) feature space: mapping from the input space to a high dimensional feature space, kernels, etc.
  • Resolution of complex NLP problems: inference with classifiers + constraint satisfaction
  • etc.
bibliografia

Conclusions

Bibliografia
  • You may found additional information at:

http://www.lsi.upc.es/~lluism/

tesi.html

publicacions/pubs.html

cursos/talks.html

cursos/MLandNL.html

cursos/emnlp1.html

  • This talk at:

http://www.lsi.upc.es/~lluism/udg03.ppt.gz

slide89

Seminar: Statistical NLP

Machine Learning for Natural Language Processing

Lluís Màrquez

TALP Research Center

Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Girona, June 2003

ad