Hw7 extracting arguments for
Sponsored Links
This presentation is the property of its rightful owner.
1 / 27

HW7 Extracting Arguments for % PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

HW7 Extracting Arguments for %. Ang Sun [email protected] March 25, 2012. Outline. File Format Training Generating Training Examples Extracting Features Training of MaxEnt Models Decoding Scoring. File Format.

Download Presentation

HW7 Extracting Arguments for %

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


HW7 Extracting Arguments for %

Ang Sun

[email protected]

March 25, 2012


Outline

  • File Format

  • Training

    • Generating Training Examples

    • Extracting Features

    • Training of MaxEnt Models

  • Decoding

  • Scoring


File Format

  • Statistics Canada said service-industry <ARG1> output </ARG1> in August <SUPPORT> rose </SUPPORT> 0.4 <PRED class="PARTITIVE-QUANT"> % </PRED> from July .


  • Generating Training Examples

  • Positive Example

    • Only one positive example for a sentence

    • The one with the annotation ARG1


  • Generating Training Examples

  • Negative Examples

    • Two methods!

    • Method 1: consider any token that has one of the following POSs

      • NN 1150

      • NNS 905

      • NNP 205

      • JJ 25

      • PRP 24

      • CD 21

      • DT 16

      • NNPS 13

      • VBG 2

      • FW 1

      • IN 1

      • RB 1

      • VBZ 1

      • WDT 1

      • WP 1

Too many negative examples!


  • Generating Training Examples

  • Negative Examples

    • Two methods!

    • Method 2: only consider head tokens


  • Extracting Features

    f:candToken=output


  • Extracting Features

    f:tokenBeforeCand=service-industry


  • Extracting Features

    f:tokenAfterCand=in


  • Extracting Features

    f: tokensBetweenCandPRED=in_August_rose_0.4


  • Extracting Features

    f: numberOfTokensBetween=4


  • Extracting Features

    f: exisitVerbBetweenCandPred=true


  • Extracting Features

    f: exisitSUPPORTBetweenCandPred=true


  • Extracting Features

    f:candTokenPOS=NN


  • Extracting Features

    f:posBeforeCand=NN


  • Extracting Features

    f:posAfterCand=IN


  • Extracting Features

    f: possBetweenCandPRED=IN_NNP_VBD_CD


  • Extracting Features

    f: BIOChunkChain=

    I-NP_B-PP_B-NP_B-VP_B-NP_I-NP


  • Extracting Features

    f: chunkChain=

    NP_PP_NP_VP_NP


  • Extracting Features

    f: candPredInSameNP=False


  • Extracting Features

    f: candPredInSameVP=False


  • Extracting Features

    f: candPredInSamePP=False


  • Extracting Features

    f: shortestPathBetweenCandPred=

    NP_NP-SBJ_S_VP_NP-EXT


  • Training of MaxEnt Model

  • Each training example is one line

    • candToken=output . . . . . class=Y

    • candToken=Canada . . . . . Class=N

  • Put all examples in one file, the training file

  • Use the MaxEnt wrapper or the program you wrote in HW5 to train your relation extraction model


Decoding

  • For each sentence

    • Generate testing examples as you did for training

      • One example per feature line (without class=(Y/N))

    • Apply your trained model to each of the testing examples

    • Choose the example with the highest probability returned by your model as the ARG1

    • So there should be and must be one ARG1 for each sentence


Scoring

  • As you are required to tag only one ARG1 for each sentence

  • Your system will be evaluated based on accuracy

    • Accuracy = #correct_ARG1s / #sentences


Good Luck!


  • Login