Machine learning in gate
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Machine Learning in GATE PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Machine Learning in GATE. Valentin Tablan. Machine Learning in GATE. Uses classification . [Attr 1 , Attr 2 , Attr 3 , … Attr n ]  Class Classifies annotations . (Documents can be classified as well using a simple trick.) Annotations of a particular type are selected as instances.

Download Presentation

Machine Learning in GATE

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Machine learning in gate

Machine Learning in GATE

Valentin Tablan


Machine learning in gate1

Machine Learning in GATE

  • Uses classification.

    [Attr1, Attr2, Attr3, … Attrn]  Class

  • Classifies annotations.

    (Documents can be classified as well using a simple trick.)

  • Annotations of a particular type are selected as instances.

  • Attributes refer to instance annotations.

  • Attributes have a position relative to the instance annotation they refer to.


Attributes

Attributes

Attributes can be:

  • Boolean

    The [lack of] presence of an annotation of a particular type [partially] overlapping the referred instance annotation.

  • Nominal

    The value of a particular feature of the referred instance annotation. The complete set of acceptable values must be specified a-priori.

  • Numeric

    The numeric value (converted from String) of a particular feature of the referred instance annotation.


Implementation

Implementation

Machine Learning PR in GATE.

Has two functioning modes:

  • training

  • application

    Uses an XML file for configuration:

    <?xml version="1.0" encoding="windows-1252"?>

    <ML-CONFIG>

    <DATASET> … </DATASET>

    <ENGINE>…</ENGINE>

    <ML-CONFIG>


Dataset

<DATASET>

<DATASET>

<INSTANCE-TYPE>Token</INSTANCE-TYPE>

<ATTRIBUTE>

<NAME>POS_category(0)</NAME>

<TYPE>Token</TYPE>

<FEATURE>category</FEATURE>

<POSITION>0</POSITION>

<VALUES>

<VALUE>NN</VALUE>

<VALUE>NNP</VALUE>

<VALUE>NNPS</VALUE>

</VALUES>

[<CLASS/>]

</ATTRIBUTE>

</DATASET>


Engine

<ENGINE>

<ENGINE>

<WRAPPER>gate.creole.ml.weka.Wrapper</WRAPPER>

<OPTIONS>

<CLASSIFIER>weka.classifiers.j48.J48</CLASSIFIER>

<CLASSIFIER-OPTIONS>-K 3</CLASSIFIER-OPTIONS>

<CONFIDENCE-THRESHOLD>0.85</CONFIDENCE-THRESHOLD>

</OPTIONS>

</ENGINE>


Attributes position

Attributes Position

Instances type: Token


Machine learning pr

Machine Learning PR

  • Can save a learnt model to an external file for later use.

    Saves the actual model and the collected dataset.

  • Can export the collected dataset in .arff format.


Standard use scenario

Standard Use Scenario

Application

  • Prepare data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).

  • [ Load the previously saved model. ]

  • Run the ML PR in application mode.

  • [ Save the learnt model. ]

Training

  • Prepare training data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).

  • Run the ML PR in training mode.

  • Export the dataset as .arff and perform experiments using the WEKA interface in order to find the best attribute set / algorithm / algorithm options.

  • Update the configuration file accordingly.

  • Run the ML PR again to collect the actual data.

  • [ Save the learnt model. ]


An example

An Example

Learn POS category from POS context.


Using other ml libraries

Using Other ML Libraries

The MLEngine Interface

Method Summary

  • void addTrainingInstance(List attributes) Adds a new training instance to the dataset. 

  • Object classifyInstance(List attributes) Classifies a new instance. 

  • void init() This method will be called after an engine is created and has its dataset and options set. 

  • void setDatasetDefinition(DatasetDefintion definition) Sets the definition for the dataset used. 

  • void setOptions(org.jdom.Element options) Sets the options from an XML JDom element.

  • void setOwnerPR(ProcessingResource pr) Registers the PR using the engine with the engine. 


  • Login