Loading in 2 Seconds...
Loading in 2 Seconds...
German-Japan NL WS in Sapporo2003/7/4. Ｔｅｒｍ inology E xtraction System based on Vocabulary Space. Hiroshi Nakagawa Information Technology Center, The University of Tokyo. 歩留まり : Bu-Domari: Success rate ?? 横持ち : Side take:
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Information Technology Center,
The University of Tokyo
expressing domain concepts
About ８５％ about１５％
compoundsimple nouns nouns
Our Purpose is
Extracting domain specific terms including compound and
simple nouns from domain corpus automatically.
LN(trigram)=５ n=3 m=2 RN(trigram)=３
Principle:A simple noun which contributes to make a big number of compound nouns has a high score.
GM(CN) is a geometric mean which does not
depend on the length of CN.
if CN occurs independently
where f(CN) means the number of independent occurrences of noun CN
(= CN does not appear as a part of longer CN )
if f(trigram)= 5
Modify Ｃ-ｖａｌｕｅ（Ｆｒａｎｔｚｉ＆Ananiadou,1996) to be able to
score a simple noun
ｌｅｎｇｔｈ（ａ） ：# of simple nouns consisting aｆｒｅｑ（ａ）：frequency of ａ
ｔ（ａ）：frequency of candidate compound nouns including ａ
ｃ（ａ）：frequency of distinct candidate compound nouns including ａ
Data used in our experiment is developed by NII.
（Artificial Intelligence field：1,870 paper abstracts）
MCval - GM
of extracted terms
11. 推論(inference) 162 ○
12. 支援(assistance) 87 ×
13. 知識表現(knowledge representation) 74 ○
14. エージェント(agent) 256 ○
15. 学習者モデル(learner’s model) 57 ○
16. 機能(function) 294 ×
17. 設計者(designer) 69 ○
18. 対話(dialogue) 205 ○
19. 言語(language) 75 ○
20. 対象(object) 293 ○
11. 方法(method, way to do) 426 ×
12. 支援システム(assistance system) 18 ×
13. 計算機(computer) 128 ○
14. 情報(information) 382 ○
15. モデル(model) 356 ○
16. 自然言語(natural language) 63 ○
17. 我々(we) 332 ×
18. 有効性(effectiveness) 160 ×
19. エキスパートシステム(expert system) 78 ○
20. ユーザ(user) 297 ○
N1,N2： top two systems of ＮＴＣＩＲ１
N1, N2： top two systems of ＮＴＣＩＲ１
New statistical methods for ATR, which are basically how many nouns adjoin the single-noun in question to form compound nouns.
・best in extracting small number( up to 1400) of high quality domain specific terms
・longer terms including correct terms are better extracted by FGM or GM
Strong in extracting large number (up to 6000) of domain specific terms