nlp project 1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
NLP Project 1 PowerPoint Presentation
Download Presentation
NLP Project 1

Loading in 2 Seconds...

play fullscreen
1 / 14

NLP Project 1 - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

NLP Project 1. 程智聪 韩冬 张坚. Agenda. Data Preprocess Feature Selection Classification Result Analysis Summary. Data Preprocess. XML library: XOM 1.1 Missing lexical-sample.dtd Missing POS tags na, nx, Ug, … Customized POS tag: * Handling subword -Xms128M -Xmx512M. 钻研 中医 理论 , 试图从前人. v.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

NLP Project 1


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
nlp project 1

NLP Project 1

程智聪 韩冬 张坚

agenda
Agenda
  • Data Preprocess
  • Feature Selection
  • Classification
  • Result Analysis
  • Summary
data preprocess
Data Preprocess
  • XML library: XOM 1.1
  • Missing lexical-sample.dtd
  • Missing POS tags
    • na, nx, Ug, …
  • Customized POS tag: *
  • Handling subword
  • -Xms128M -Xmx512M
data preprocess1

钻研中医理论,试图从前人

v

n

n

w

*

*

*

Data Preprocess
  • Handling multiple target items in the context
  • Handling punctuation

面临我国心脑血管疾病发病和死亡率逐年上升的严重趋势,现任中国<head>中医</head>研究院长城医院院长的周文志教授历时30余年艰辛探索,采用中医中药方法治疗和预防心脑血管疾病取得显著成果。

feature selection
Feature Selection

IncludeTokenPreOffset = 1

IncludeTokenPostOffset = 1

吴宏权<head>使</head>出全身

EndOffset = 1

StartOffset = -1

0

nr,v,吴,宏权,出,全身,use

classification
Classification
  • Data mining library
    • Weka 3.6, maxent 20041229
  • Classifier
    • MLP
      • L: 0.6 H:12, 4, (adaptive) M: 0.9
    • SMO
      • NormalizedPolyKernel C:1.9
    • NaiveBayes
    • MEM
      • G: 20
result analysis
Result Analysis

68%

  • Only POS
result analysis1
Result Analysis
  • Only POS (con’t)

Context

result analysis2
Result Analysis
  • Including tokens
result analysis3
Result Analysis
  • Including tokens (con’t)
result analysis4
Result Analysis
  • Punctuation optimization
result analysis5
Result Analysis
  • Performance
summary
Summary
  • Less POS features are better
  • Post POS/token features are more important
  • Punctuation matters
  • Possible improvements
    • Typical words in sentence as features
    • Collocations as features
slide14
Q & A

Thanks