1 / 23

Relation Detection And Recognition

Relation Detection And Recognition. *** *** ***. Schema. General Description Name Entity Recognition RDR Training Corpus Generate Relation Detection and Recognition Performance Analysis. General Description-Algorithm. EDR: CRF Character based RDR: SVM Pos is needed.

Download Presentation

Relation Detection And Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relation Detection And Recognition *** *** ***

  2. Schema • General Description • Name Entity Recognition • RDR Training Corpus Generate • Relation Detection and Recognition • Performance Analysis

  3. General Description-Algorithm EDR: CRF Character based RDR: SVM Pos is needed

  4. General Description-Workflow

  5. Schema • General Description • Name Entity Recognition • RDR Training Corpus Generate • Relation Detection and Recognition • Performance Analysis

  6. Name Entity Recognition-Algorithm • CRF++ • Character based • Most naive 北 nb 京 ne 1 non 月 non 1 non 日 non 讯 non 中 nb 华 nm 全 nm 国 nm 总 nm 工 nm 会 ne 今 non 日 non 发 non

  7. Name Entity Recognition-Accuracy • nr precious:100% right:88 error:0 • nt precious:100% right:36 error:0 • ns precious:100% right:56 error:0 • 180/181 • 海湾战争 nz 9 22

  8. Schema • General Description • Name Entity Recognition • RDR Training Corpus Generate • Relation Detection and Recognition • Performance Analysis

  9. RDR Training Corpus Generate • The vector SVM need: e1.type, e2.type,order, dist, w-2,w-1,w0,w1,w2,t-2,t-1,t0,t1,t2, w-2,w-1,w0,w1,w2,t-2,t-1,t0,t1,t2, relation Exp: 国家环保局局长解振华庄重宣布 国家环保局,2,解振华,1,3,11,NULL,NULL,国家环保局,局,长,局,长,解振华,庄,重,null,null,null,NN,NR,NN,NN,null,VA,DEC,E

  10. RDR Training Corpus Generate 1、NLP Pos tag: 国家/NN 环保局/NN 局长/NN 解振华/NR 庄重/VA 宣布:/DEC 2、Compare with Entity: 国家环保局/nt,解振华/nr 3、Find the type front and back null,null,null,NN,NR NN,NN,null,VA,DEC

  11. RDR Training Corpus Generate • 4、Tag the train corpus by hands 国家环保局,2,解振华,1,3,11,NULL,NULL,国家环保局,局,长,局,长,解振华,庄,重,null,null,null,NN,NR,NN,NN,null,VA,DEC,E

  12. RDR Training Corpus Generate • Use Assit Program: Tagged Corpus: 602 sentence 3000+relations

  13. Schema • General Description • Name Entity Recognition • RDR Training Corpus Generate • Relation Detection and Recognition • Performance Analysis

  14. 概述 • 将关系识别问题看作多分类问题 输入:实体对向量集X(x1,x2,……xn) 其中 xi (f1, f2,……fn ) 表示实体对(E1,E2) 输出: xi 所属的类型yi • 使用SVM的方法构造分类器 选取合适的特征集来描述实体对,并映射 到高维实数空间,进行分类

  15. SVM • 支持向量机( Support Vector Machine, SVM) 其主要思想是针对两类分类问题, 在高维空间中寻找一个超平面作为两类的分割, 以保证最小的分类错误率。通过学习, 可以自动寻找那些对分类有较好区分能力的支持向量, 由此构造出的分类器可以最大化类之间的间隔。 • 工具LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) 有可执行的程序来构造多分类器以及训练和预测功能。

  16. 实体对过滤模块 • 关系定义 C:chief(nr-nt) E:employee(nr-nt) L:located in(nt-ns) N:no relation • 实体对过滤模块 将除(nr-nt),(nt-ns)外的实体对过滤,过滤后的实体对作为candidate进行标注(train)或分类(test)

  17. 特征选取和向量化模块 • 选取以下特征构造特征集 e1.type,e2.type,contain,order,dist, w-2,w-1,w1,w2,t-2,t-1,t1,t2, w-2,w-1,w1,w2,t-2,t-1,t1,t2, Relation • 在实际模型训练中有调整 • 映射到向量形式

  18. 向量化模块和scale模块 向量形式 Label index1:value1 …… 1 1:2 2:3 3:4…… Scale(libsvm: svm-scale.exe) 对数据集进行缩放([-1,1]) 便于计算,统一训练集和测试集

  19. 训练模块 • 人工对candidate进行关系标注 • Libsvm: svm-train.exe • 特征集和参数的选择(交叉验证法) • 构造模型

  20. 测试 • SampleTestData • P=76%

  21. Schema • General Description • Name Entity Recognition • RDR Training Corpus Generate • Relation Detection and Recognition • Performance Analysis

  22. 分析与改进 • 前序工作引入的误差 • 训练语料不够大 • 人工标注的语料引入误差 • 特征集的选取(提取语义特征) • 训练参数的选取(网络搜索)

More Related