Loading in 5 sec....

Seminar: Statistical NLPPowerPoint Presentation

Seminar: Statistical NLP

- 89 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Seminar: Statistical NLP' - tausiq

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Machine Learning for Natural Language Processing

Lluís Màrquez

TALP Research Center

Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Girona, June 2003

There are many general-purpose definitions of Machine Learning (or artificial learning):

Making a computer automatically acquire some kind of knowledge from a concrete data domain

ML4NLP

Machine Learning- Learners are computers: we study learning algorithms
- Resources are scarce: time, memory, data, etc.
- It has (almost) nothing to do with: Cognitive science, neuroscience, theory of scientific discovery and research, etc.
- Biological plausibility is welcome but not the main goal

We will concentrate on: Learning (or artificial learning):

Supervisedinductive learning for classification

= discriminative learning

ML4NLP

Machine Learning- Learning... but what for?
- To perform some particular task
- To react to environmental inputs
- Concept learning from data:
- modelling concepts underlying data
- predictingunseen observations
- compacting the knowledge representation
- knowledge discovery for expert systems

What to read? Learning (or artificial learning):

Machine Learning (Mitchell, 1997)

Obtaining a description of the concept in some representation language that explains observations and helps predicting new instances of the same distribution

ML4NLP

Machine LearningA more precise definition:

Lexical and structural Learning (or artificial learning):ambiguity problems

Word selection (SR, MT)

Part-of-speech tagging

Semantic ambiguity (polysemy)

Prepositional phrase attachment

Reference ambiguity (anaphora)

etc.

Clasification

problems

ML4NLP

Empirical NLP90’s: Application of Machine Learning techniques

(ML) to NLP problems

- What to read? Foundations of Statistical Language Processing (Manning & Schütze, 1999)

Ambiguity Learning (or artificial learning):is a crucial problem for natural language understanding/processing. Ambiguity Resolution = Classification

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

(The Wall Street Journal Corpus)

Morpho-syntactic ambiguity Learning (or artificial learning):

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

Morpho-syntactic ambiguity Learning (or artificial learning):: Part of Speech Tagging

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

Semantic (lexical) ambiguity Learning (or artificial learning):

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

body-part

clock-part

(The Wall Street Journal Corpus)

Semantic (lexical) ambiguity Learning (or artificial learning):: Word Sense Disambiguation

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

body-part

clock-part

(The Wall Street Journal Corpus)

Structural (syntactic) ambiguity Learning (or artificial learning):

ML4NLP

NLP “classification” problems- He was shot in the hand as he chased the robbers in the back street

(The Wall Street Journal Corpus)

Structural (syntactic) ambiguity Learning (or artificial learning):

ML4NLP

NLP “classification” problems- He was shot in the hand as he chasedthe robbersin the back street

(The Wall Street Journal Corpus)

Structural (syntactic) ambiguity: Learning (or artificial learning):PP-attachment disambiguation

ML4NLP

NLP “classification” problems- He was shot in the hand as he (chased (the robbers)NP(in the back street)PP)

(The Wall Street Journal Corpus)

Outline Learning (or artificial learning):

- Machine Learning for NLP

- The Classification Problem
- Three ML Algorithms in detail
- Applications to NLP

An Learning (or artificial learning):instance is a vector: x=<x1,…, xn>whose components, called features (or attributes), are discrete or real-valued.

Let X be the space of all possible instances.

Let Y={y1,…, ym}be the set of categories (or classes).

The goal is to learn an unknown target function, f : X Y

A training exampleis an instance xbelonging to X, labelled with the correct value for f(x), i.e., a pair <x, f(x)>

Let D be the set of all training examples.

Classification

Feature Vector ClassificationIA

perspective

The Learning (or artificial learning):goal is to find a function h belonging to H such that for all pair <x,f(x)>belonging to D, h(x) = f(x)

Classification

Feature Vector Classification- The hypotheses space, H, is the set of functions h: X Y that the learner can consider as possible definitions

Decision Tree Learning (or artificial learning):

Rules

COLOR

(COLOR=red) Ù

(SHAPE=circle) Þ positive

blue

red

SHAPE

negative

circle

triangle

positive

negative

Classification

An Exampleotherwise Þ negative

Decision Tree Learning (or artificial learning):

Rules

SIZE

(SIZE=small)Ù(SHAPE=circle) Þ positive

small

big

(SIZE=big)Ù(COLOR=red) Þ positive

SHAPE

COLOR

otherwise Þ negative

red

circle

triang

blue

neg

pos

pos

neg

Classification

An ExampleInductive Learning (or artificial learning):Bias

“Any means that a classification learning system uses to choose between to functions that are both consistent with the training data is called inductive bias” (Mooney & Cardie, 99)

Language / Search bias

Decision Tree

COLOR

blue

red

SHAPE

negative

circle

triangle

positive

negative

Classification

Some important conceptsInductive Learning (or artificial learning):Bias

Training error and generalization error

Classification

Some important concepts- Generalization ability and overfitting
- Batch Learning vs. on-line Leaning
- Symbolic vs. statistical Learning
- Propositional vs. first-order learning

- Relational learning = ILP (induction of logic programs) Learning (or artificial learning):

course(X) Ù person(Y) Ù link_to(Y,X) Þinstructor_of(X,Y)

research_project(X) Ù person(Z) Ù link_to(L1,X,Y) Ù

link_to(L2,Y,Z)Ù neighbour_word_people(L1)Þmember_proj(X,Z)

Classification

Propositional vs. Relational Learning

- Propositional learning

color(red) Ù shape(circle) ÞclassA

Classification Learning (or artificial learning):

The Classification SettingClass, Point, Example, Data Set, ...CoLT/SLT

perspective

- Input Space: XRn
- (binary) Output Space: Y = {+1,-1}
- A point, pattern or instance:x X, x = (x1, x2, …, xn)
- Example: (x, y)with x X, y Y
- Training Set: a set of m examples generated i.i.d. according to an unknown distribution P(x,y)S = {(x1, y1), …, (xm, ym)} (X Y)m

Classification Learning (or artificial learning):

The Classification SettingLearning, Error, ...- The hypotheses space, H, is the set of functions h: XY that the learner can consider as possible definitions. In SVM are of the form:
- The goal is to find a function h belonging to H such that the expected misclassification error on new examples, also drawn from P(x,y), is minimal (Risk Minimization, RM)

Classification Learning (or artificial learning):

The Classification SettingLearning, Error, ...- Expected error (risk)
- Problem: P itself is unknown. Known are training examples an induction principle is needed
- Empirical Risk Minimization (ERM): Find the function h belonging to H for which the training error (empirical risk) is minimal

Over Learning (or artificial learning):fitting

Underfitting

Classification

The Classification SettingError, Over(under)fitting,...- Low training error low true error?
- The overfitting dilemma:

(Müller et al., 2001)

- Trade-off between training error and complexity
- Different learning biases can be used

Outline Learning (or artificial learning):

- Machine Learning for NLP

- The Classification Problem
- Three ML Algorithms
- Applications to NLP

Outline Learning (or artificial learning):

- Machine Learning for NLP

- The Classification Problem
- Three ML Algorithms
- Decision Trees
- AdaBoost
- Support Vector Machines

- Applications to NLP

Algorithms Learning (or artificial learning):

Learning Paradigms- Statistical learning:
- HMM, Bayesian Networks, ME, CRF, etc.

- Traditional methods from Artificial Intelligence (ML, AI)
- Decision trees/lists, exemplar-based learning, rule induction, neural networks, etc.

- Methods from Computational Learning Theory (CoLT/SLT)
- Winnow, AdaBoost, SVM’s, etc.

Algorithms Learning (or artificial learning):

Learning Paradigms- Classifier combination:
- Bagging, Boosting, Randomization, ECOC, Stacking, etc.

- Semi-supervised learning: learning from labelled and unlabelled examples
- Bootstrapping, EM, Transductive learning (SVM’s, AdaBoost), Co-Training, etc.

- etc.

Algorithms Learning (or artificial learning):

Decision Trees- Decision trees are a way to represent rules underlying training data, with hierarchical structures that recursively partition the data.
- They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
- From a machine-learning perspective: Decision Trees are n-ary branching trees that represent classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes

Algorithms Learning (or artificial learning):

Decision Trees- Acquisition: Top-Down Induction of Decision Trees (TDIDT)
- Systems:
CART (Breiman et al. 84),

ID3, C4.5, C5.0 (Quinlan 86,93,98),

ASSISTANT, ASSISTANT-R (Cestnik et al. 87) (Kononenko et al. 95)

etc.

A1 Learning (or artificial learning):

v1

v3

v2

...

A2

A2

A3

...

...

v5

v4

Decision Tree

A5

A2

...

SIZE

v6

small

big

C3

A5

SHAPE

COLOR

v7

red

circle

triang

blue

C1

C2

C1

neg

pos

pos

neg

Algorithms

An ExampleTraining Learning (or artificial learning):

DT

Training

Set

+

TDIDT

=

Test

DT

+

=

Example

Class

Algorithms

Learning Decision Treesfunction Learning (or artificial learning):TDIDT (X:set-of-examples; A:set-of-features)

var: tree1,tree2: decision-tree;

X’: set-of-examples;

A’: set-of-features

end-var

if (stopping_criterion(X)) then

tree1 := create_leaf_tree(X)

else

amax := feature_selection(X,A);

tree1 := create_tree(X, amax);

for-all val invalues(amax) do

X’ := select_examples(X,amax,val);

A’ := A - {amax};

tree2 := TDIDT(X’,A’);

tree1 := add_branch(tree1,tree2,val)

end-for

end-if

return(tree1)

end-function

Algorithms

General Induction Algorithmfunction Learning (or artificial learning):TDIDT (X:set-of-examples; A:set-of-features)

var: tree1,tree2: decision-tree;

X’: set-of-examples;

A’: set-of-features

end-var

if (stopping_criterion(X)) then

tree1 := create_leaf_tree(X)

else

amax := feature_selection(X,A);

tree1 := create_tree(X, amax);

for-all val invalues(amax) do

X’ := select_examples(X,amax,val);

A’ := A - {amax};

tree2 := TDIDT(X’,A’);

tree1 := add_branch(tree1,tree2,val)

end-for

end-if

return(tree1)

end-function

Algorithms

General Induction AlgorithmFunctions derived from Learning (or artificial learning):Information Theory:

Information Gain, Gain Ratio (Quinlan 86)

Functions derived from Distance Measures

Gini Diversity Index (Breiman et al. 84)

RLM (López de Mántaras 91)

Statistically-based

Chi-square test (Sestito & Dillon 94)

Symmetrical Tau (Zhou & Dillon 91)

RELIEFF-IG: variant of RELIEFF (Kononenko 94)

Algorithms

Feature Selection CriteriaAlgorithms Learning (or artificial learning):

Extensions of DTs(Murthy 95)

- Pruning (pre/post)
- Minimize the effect of the greedy approach: lookahead
- Non-lineal splits
- Combination of multiple models
- Incremental learning (on-line)
- etc.

Algorithms Learning (or artificial learning):

Decision Trees and NLP- Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
- POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
- Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
- Parsing (Magerman 95,96; Haruno et al. 98,99)
- Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
- Text summarization (Mani & Bloedorn 98)
- Dialogue act tagging (Samuel et al. 98)

Algorithms Learning (or artificial learning):

Decision Trees and NLP- Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
- Discourse analysis in information extraction (Soderland & Lehnert 94)
- Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
- Verb classification in Machine Translation (Tanaka 96; Siegel 97)

Algorithms Learning (or artificial learning):

Decision Trees: pros&cons- Advantages
- Acquires symbolic knowledge in a understandable way
- Very well studied ML algorithms and variants
- Can be easily translated into rules
- Existence of available software: C4.5, C5.0, etc.
- Can be easily integrated into an ensemble

Algorithms Learning (or artificial learning):

Decision Trees: pros&cons- Drawbacks
- Computationally expensive when scaling to large natural language domains: training examples, features, etc.
- Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
- DTs is a model with high variance (unstable)
- Tendency to overfit training data: pruning is necessary
- Requires quite a big effort in tuning the model

Algorithms Learning (or artificial learning):

Boosting algorithms- Idea
“to combine many simple and moderately accurate hypotheses (weak classifiers) into a single and highly accurate classifier”

- AdaBoost(Freund & Schapire 95) has been theoretically and empirically studied extensively
- Many other variants extensions (1997-2003)
http://www.lsi.upc.es/~lluism/seminari/ml&nlp.html

Linear Learning (or artificial learning):

combination

TEST

F(h1,h2,...,hT)

a1

a2

aT

hT

h1

h2

. . .

Weak

Learner

Weak

Learner

Weak

Learner

Probability

distribution

updating

TS1

TST

TS2

. . .

D1

DT

D2

Algorithms

AdaBoost: general schemeTRAINING

Algorithms Learning (or artificial learning):

AdaBoost: exampleWeak hypotheses = vertical/horizontal hyperplanes

Algorithms Learning (or artificial learning):

AdaBoost: round 1Algorithms Learning (or artificial learning):

AdaBoost: round 2Algorithms Learning (or artificial learning):

AdaBoost: round 3www.research.att.com/ Learning (or artificial learning):~yoav/adaboost

Algorithms

Combined HypothesisAlgorithms Learning (or artificial learning):

AdaBoost and NLP- POS Tagging(Abney et al. 99; Màrquez 99)
- Text and Speech Categorization(Schapire & Singer 98; Schapire et al. 98; Weiss et al. 99)
- PP-attachment Disambiguation(Abney et al. 99)
- Parsing(Haruno et al. 99)
- Word Sense Disambiguation(Escudero et al. 00, 01)
- Shallow parsing(Carreras & Màrquez, 01a; 02)
- Email spam filtering(Carreras & Màrquez, 01b)
- Term Extraction(Vivaldi, et al. 01)

Algorithms Learning (or artificial learning):

AdaBoost: pros&cons- Easy to implement and few parameters to set
- Time and space grow linearly with number of examples. Ability to manage very large learning problems
- Does not constrain explicitly the complexity of the learner
- Naturally combines feature selection with learning
- Has been succesfully applied to many practical problems

Algorithms Learning (or artificial learning):

AdaBoost: pros&cons- Seems to be rather robust to overfitting (number of rounds) but sensitive to noise
- Performance is very good when there are relatively few relevant terms (features)
- Can perform poorly when there is insufficient training data relative to the complexity of the base classifiers, the training errors of the base classifiers become too large too quickly

Algorithms Learning (or artificial learning):

SVM: A General Definition

- “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)

Algorithms Learning (or artificial learning):

SVM: A General Definition- “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linearfunctions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)

Key Concepts

+ Learning (or artificial learning):

+

+

+

+

_

w

_

_

_

+

_

_

_

_

_

Algorithms

Linear Classifiers- Hyperplanesin RN.
- Defined by a weight vector (w) and a threshold (b).
- They induce a classification rule:

Algorithms Learning (or artificial learning):

Optimal Hyperplane: Geometric IntuitionThese are the Learning (or artificial learning):

Support

Vectors

Algorithms

Optimal Hyperplane: Geometric IntuitionMaximal

Margin

Hyperplane

Seminari SVM Learning (or artificial learning):s 22/05/2001

Algorithms

Linearly separable dataQuadratic

Programming

Seminari SVM Learning (or artificial learning):s 22/05/2001

Algorithms

Non-separable case (soft margin)Non-linear mapping Learning (or artificial learning):

Set of hypotheses

Dual formulation

Kernel function

Evaluation

Seminari SVMs 22/05/2001

Algorithms

Non-linear SVMs- Implicit mapping into feature space via kernel functions

Seminari SVM Learning (or artificial learning):s 22/05/2001

Algorithms

Non-linear SVMs- Kernel functions
- Must be efficiently computable
- Characterization via Mercer’s theorem
- One of the curious facts about using a kernel is that we do not need to know the underlying feature map in order to be able to learn in the feature space! (Cristianini & Shawe-Taylor, 2000)
- Examples: polynomials, Gaussian radial basis functions, two-layer sigmoidal neural networks, etc.

Seminari SVM Learning (or artificial learning):s 22/05/2001

Algorithms

Non linear SVMsDegree 3 polynomial kernel

lin. non-separable

lin. separable

Algorithms Learning (or artificial learning):

Toy Examples- All examples have been run with the 2D graphic interface of SVMLIB (Chang and Lin, National University of Taiwan)
“LIBSVMis an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, un-SVR) and distribution estimation (one-class SVM). It supports multi-class classification. The basic algorithm is a simplification of both SMO by Platt and SVMLight by Joachims. It is also a simplification of the modification 2 of SMO by Keerthy et al. Our goal is to help users from other fields to easily use SVM as a tool. LIBSVM provides a simple interface where users can easily link it with their own programs…”

- Available from: www.csie.ntu.edu.tw/~cjlin/libsvm(it icludes a Web integrated demo tool)

. Learning (or artificial learning):

What happens if we add

a blue training example

here?

Algorithms

Toy Examples (I)Linearly separable data set

Linear SVM

Maximal margin Hyperplane

Algorithms Learning (or artificial learning):

Toy Examples (I)(still) Linearly separable data set

Linear SVM

High value of C parameter

Maximal margin Hyperplane

The example is

correctly classified

Algorithms Learning (or artificial learning):

Toy Examples (I)(still) Linearly separable data set

Linear SVM

Low value of C parameter

Trade-off between: margin and training error

The example is

now a bounded SV

Algorithms Learning (or artificial learning):

Toy Examples (II)Algorithms Learning (or artificial learning):

Toy Examples (II)Algorithms Learning (or artificial learning):

Toy Examples (II)Algorithms Learning (or artificial learning):

Toy Examples (III)Algorithms Learning (or artificial learning):

SVM: Summary- SVMs introduced in COLT’92 (Boser, Guyon, & Vapnik, 1992). Great developement since then
- Kernel-induced feature spaces: SVMs work efficiently in very high dimensional feature spaces (+)
- Learning bias: maximal margin optimisation. Reduces the danger of overfitting. Generalization bounds for SVMs (+)
- Compact representation of the induced hypothesis. The solution is sparse in terms of SVs (+)

Algorithms Learning (or artificial learning):

SVM: Summary- Due to Mercer’s conditions on the kernels the optimi-sation problems are convex. No local minima (+)
- Optimisation theory guides the implementation. Efficient learning (+)
- Mainly for classification but also for regression, density estimation, clustering, etc.
- Success in many real-world applications: OCR, vision, bioinformatics, speech recognition, NLP: TextCat, POS tagging, chunking, parsing, etc. (+)
- Parameter tuning (–). Implications in convergence times, sparsity of the solution, etc.

Outline Learning (or artificial learning):

- Machine Learning for NLP

- The Classification Problem
- Three ML Algorithms
- Applications to NLP

Applications Learning (or artificial learning):

NLP problems- Warning! We will not focus on final NLP applications, but on intermediate tasks...
- We will classify the NLP tasks according to their (structural) complexity

Applications Learning (or artificial learning):

NLP problems: structural complexity- Decisional problems
- Text Categorization, Document filtering, Word Sense Disambiguation, etc.

- Sequence tagging and detection of sequential structures
- POS tagging, Named Entity extraction, syntactic chunking, etc.

- Hierarchical structures
- Clause detection, full parsing, IE of complex concepts, composite Named Entities, etc.

Morpho-syntactic ambiguity Learning (or artificial learning):: Part of Speech Tagging

Applications

POS tagging- He was shot in the hand as he chased the robbers in the back street

NN

VB

JJ

VB

NN

VB

(The Wall Street Journal Corpus)

root Learning (or artificial learning):

P(IN)=0.81

P(RB)=0.19

Word Form

“As”,“as”

others

...

P(IN)=0.83

P(RB)=0.17

tag(+1)

RB

others

...

P(IN)=0.13

P(RB)=0.87

tag(+2)

Probabilistic interpretation:

IN

^

P( RB | word=“A/as” tag(+1)=RB tag(+2)=IN) = 0.987

P(IN)=0.013

P(RB)=0.987

^

P( IN | word=“A/as” tag(+1)=RB tag(+2)=IN) = 0.013

leaf

Applications

POS tagging“preposition-adverb” tree

root Learning (or artificial learning):

P(IN)=0.81

P(RB)=0.19

Word Form

“As”,“as”

others

...

P(IN)=0.83

P(RB)=0.17

tag(+1)

RB

others

...

P(IN)=0.13

P(RB)=0.87

tag(+2)

IN

P(IN)=0.013

P(RB)=0.987

leaf

Applications

POS tagging“preposition-adverb” tree

Collocations:

“as_RB much_RB as_IN”

“as_RB soon_RB as_IN”

“as_RB well_RB as_IN”

A Sequential Model for Multi-class Classification: Learning (or artificial learning):

NLP/POS Tagging (Even-Zohar & Roth, 01)

Applications

POS taggingRTT (Màrquez & Rodríguez 97)

Language

Model

stop?

Filter

Classify

Update

Tagged

text

Raw

text

Morphological

analysis

yes

no

Disambiguation

Language Learning (or artificial learning):Model

Lexical

probs.

+

The Use of Classifiers in sequential inference:

Chunking (Punyakanok & Roth, 00)

Contextual probs.

Viterbi

algorithm

Tagged

text

Raw

text

Morphological

analysis

Disambiguation

Applications

POS taggingSTT (Màrquez & Rodríguez 97)

Applications Learning (or artificial learning):

Detection of sequential and hierarchical structures- Named Entity recognition
- Clause detection

Conclusions Learning (or artificial learning):

Summary/conclusions- We have briefly outlined:
- The ML setting: “supervised learning for classification”
- Three concrete machine learning algorithms
- How to apply them to solve itermediate NLP tasks

Any ML algorithm for NLP should be: Learning (or artificial learning):

Robust to noise and outliers

Efficient in large feature/example spaces

Adaptive to new/changing domains: portability, tuning, etc.

Able to take advantage of unlabelled examples: semi-supervised learning

Conclusions

Summary/conclusions

Conclusions Learning (or artificial learning):

Summary/conclusions- Statistical and ML-based Natural Language Processing is a very active and multidisciplinary area of research

Conclusions Learning (or artificial learning):

Some current research lines- Appropriate learning paradigm for all kind of NLP problems: TiMBL(DBZ99), TBEDL(Brill95),ME(Ratnaparkhi98),SNoW(Roth98), CRF (Pereira & Singer02).
- Definition of an adequate (and task-specific) feature space: mapping from the input space to a high dimensional feature space, kernels, etc.
- Resolution of complex NLP problems: inference with classifiers + constraint satisfaction
- etc.

Conclusions Learning (or artificial learning):

Bibliografia- You may found additional information at:
http://www.lsi.upc.es/~lluism/

tesi.html

publicacions/pubs.html

cursos/talks.html

cursos/MLandNL.html

cursos/emnlp1.html

- This talk at:
http://www.lsi.upc.es/~lluism/udg03.ppt.gz

Seminar: Statistical NLP Learning (or artificial learning):

Machine Learning for Natural Language Processing

Lluís Màrquez

TALP Research Center

Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Girona, June 2003

Download Presentation

Connecting to Server..