machine learning in gate
Download
Skip this Video
Download Presentation
Machine Learning in GATE

Loading in 2 Seconds...

play fullscreen
1 / 11

Machine Learning in GATE - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Machine Learning in GATE. Valentin Tablan. Machine Learning in GATE. Uses classification . [Attr 1 , Attr 2 , Attr 3 , … Attr n ]  Class Classifies annotations . (Documents can be classified as well using a simple trick.) Annotations of a particular type are selected as instances.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Machine Learning in GATE' - avedis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
machine learning in gate1
Machine Learning in GATE
  • Uses classification.

[Attr1, Attr2, Attr3, … Attrn]  Class

  • Classifies annotations.

(Documents can be classified as well using a simple trick.)

  • Annotations of a particular type are selected as instances.
  • Attributes refer to instance annotations.
  • Attributes have a position relative to the instance annotation they refer to.
attributes
Attributes

Attributes can be:

  • Boolean

The [lack of] presence of an annotation of a particular type [partially] overlapping the referred instance annotation.

  • Nominal

The value of a particular feature of the referred instance annotation. The complete set of acceptable values must be specified a-priori.

  • Numeric

The numeric value (converted from String) of a particular feature of the referred instance annotation.

implementation
Implementation

Machine Learning PR in GATE.

Has two functioning modes:

  • training
  • application

Uses an XML file for configuration:

dataset

Token

POS_category(0)

Token

category

0

NN

NNP

NNPS

[]

engine

gate.creole.ml.weka.Wrapper

weka.classifiers.j48.J48

-K 3

0.85

attributes position
Attributes Position

Instances type: Token

machine learning pr
Machine Learning PR
  • Can save a learnt model to an external file for later use.

Saves the actual model and the collected dataset.

  • Can export the collected dataset in .arff format.
standard use scenario
Standard Use Scenario

Application

  • Prepare data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).
  • [ Load the previously saved model. ]
  • Run the ML PR in application mode.
  • [ Save the learnt model. ]

Training

  • Prepare training data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).
  • Run the ML PR in training mode.
  • Export the dataset as .arff and perform experiments using the WEKA interface in order to find the best attribute set / algorithm / algorithm options.
  • Update the configuration file accordingly.
  • Run the ML PR again to collect the actual data.
  • [ Save the learnt model. ]
an example
An Example

Learn POS category from POS context.

using other ml libraries
Using Other ML Libraries

The MLEngine Interface

Method Summary

  • void addTrainingInstance(List attributes) Adds a new training instance to the dataset. 
  • Object classifyInstance(List attributes) Classifies a new instance. 
  • void init() This method will be called after an engine is created and has its dataset and options set. 
  • void setDatasetDefinition(DatasetDefintion definition) Sets the definition for the dataset used. 
  • void setOptions(org.jdom.Element options) Sets the options from an XML JDom element.
  • void setOwnerPR(ProcessingResource pr) Registers the PR using the engine with the engine. 
ad