1 / 39

Korean script searching in Korean Library OPACs

Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation

caesar
Download Presentation

Korean script searching in Korean Library OPACs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

  2. Indexing Method • N-Gram • Morphological Analysis

  3. N-Gram Indexing • N-Gram : Unigram, Bigram, Trigram, N-Gram • E.g.) 아버지가 방에 들어가신다 • 12 Index by Bigram Segmentation • 아버, 버지, 지가, 가0 , 0방, 방에, 에0 , 0들, 들어, 어가, 가신, 신다 • Many index terms-many results but lots of noise • High recall ratio but low precision ratio

  4. Morphological Analysis • Requires a morphological analysis dictionary • E.g.) 아버지가 방에 들어가신다 • Three Index by morphological analysis • 아버지, 방, 들어가다 • Ability to match linguistically similar terms • Faster performance with a smaller index • Accurate matches that meet user expectations • High precision ratio but low recall ratio

  5. N-Gram Vs. Morphological Analysis

  6. A Case Study Yonsei University Library • Library System: Maestro-Y • Search Engine: K2 by Verity • Indexing Method • N-Gram (bigram) + Morphological Analysis • Indexing Rules • Rule1: Divide Strings by space • Rule2: Extract index using bigram indexing method • Rule3: Add the whole string excluding spaces between strings • Rule4: Add words from Korean morphological analysis dictionary

  7. A Case Study Yonsei University Library • E.g.) ‘국어문법의 이해’ • 국어문법의/ 이해(rule1) • 국어, 어문, 문법, 법의, 이해(rule2) • 국어문법의이해(rule3) • 국어문법(rule4) • Index: 국어, 어문, 문법, 법의, 이해, 국어문법, 국어문법의이해

  8. Search Tips

  9. Search Tips(1) • Keyword Search • 키워드검색, 임의검색 • Default Search Option • Use at most 3 keywords • Use Boolean operators • Omit Stop-words

  10. Search Tips(2) • Keyword Search • Follow the Korean Word Division Rules • E.g.) 동해물과 백두산이(O) 동해물과백두산이(X)

  11. Search Tips(3) • Keyword Search • Compound Nouns • do not use spaces between nouns • E.g.) 서울대학교(O), 서울 대학교(X )

  12. Search Tips(4) • Browse Search • Begin with or Truncation • 전방일치검색, 우측절단검색 • When you already know the first word of the title, author, or publisher • E.g.) 한글과

  13. Search Tips(5) • Browse Search • Korean Classics • E.g.) 열여춘향슈절가라

  14. Search Tips(6) • Exact Match • Precise Search • 완전일치검색 • Known items • E.g.) 난중일기

  15. Search Tips(7) • Exact Match • Single character words • E.g.) ‘산’, ‘흙’, ‘C’

  16. Search Tips(8) • Support Hangul/Hancha Searching • E.g.) 中國歷史文選/중국역사문선

  17. Search Tips(9) • Japanese Kana • Archaic Korean • Russian • Special characters : Choose scripts from Multi-language Input Table

  18. E.g.) Multi-Script Input Table

  19. Search Tips(10) • Japanese Kana • 日本の歷史/일본の역사/일본노역사 • 日本デザイン論 일본デザイン론 일본데자인론

  20. Search Tips(11) • Personal names • 윤동주 • 이광수 ; 춘원 • Shakespeare ; 셰익스피어 • Murakami, Haruki ; 村上春樹 ; 촌상춘수, 무라카미 하루키

  21. Search Tips(12) • Space • Considered as AND • E.g.) 한국 역사=한국 AND 역사 • In some OPACs, spaces in the character fields do make a difference in retrieval

  22. Comparative search with and without space

  23. 謝謝 Thank You 감사합니다 ありがとうございます junglim.chae@yale.edu

More Related