Problems with non roman character korean searching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Problems with Non-roman Character (Korean) Searching PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

Problems with Non-roman Character (Korean) Searching. Prepared by Young Ki Lee Senior Cataloging Specialist

Download Presentation

Problems with Non-roman Character (Korean) Searching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Problems with non roman character korean searching

Problems with Non-roman Character (Korean)Searching

Prepared by

Young Ki Lee

Senior Cataloging Specialist

Korean/Chinese Team

RCCD

Library of Congress


Topics to be covered

Topics to be covered

1.Non-roman script (Korean) searching under CJK data fields without spacing

2.No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character)

3.Microsoft Korean IME

4.Display of search results

5.CJK Compatibility Database


Title word search for

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 363

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are picked up by System, such as

‘한국경제’ : 한국/경제

‘중국경기공’ : 중국/경기공

‘미국경제’ : 미국/경제

‘한국경문대전집’ : 한국/경문/대전집

‘약국경영,’ : 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Search9

Search9


Title word search for1

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are picked up by System, such as

‘한국경제’ : 한국/경제

‘중국경기공’ : 중국/경기공

‘미국경제’ : 미국/경제

‘한국경문대전집’ : 한국/경문/대전집

‘약국경영,’ : 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Title word search for2

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’ = 한국/경제

‘중국경기공’ = 중국/경기공

‘미국경제’ = 미국/경제

‘한국경문대전집’ = 한국/경문/대전집

‘약국경영,’ = 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Title word search for3

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’ = ‘한국 경제’

‘중국경기공’ = ‘중국 경기공’

‘미국경제’ = ‘미국 경제’

‘한국경문대전집’ = ‘한국 경문 대전집’

‘약국경영,’ = ‘약국 경영,’ etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Title word search for4

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’ = 한국/경제

‘중국경기공’ = 중국/경기공

‘미국경제’ = 미국/경제

‘한국경문대전집’ = 한국/경문/대전집

‘약국경영,’ = 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Title word search for5

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’ = 한국/경제

‘중국경기공’ = 중국/경기공

‘미국경제’ = 미국/경제

‘한국경문대전집’ =한국/경문/대전집

‘약국경영,’ = 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Title word search for6

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’ = 한국/경제

‘중국경기공’ = 중국/경기공

‘미국경제’ = 미국/경제

‘한국경문대전집’ =한국/경문/대전집

‘약국경영’ = 약국/경영, etc.

-In Voyager (currently with space), same search (tkey 국경) retrieves only 9 hits


Problems with non roman character korean searching

국경7


Title word search for7

Title Word Search for 국경

Search 국경 (國境: the border):

-the number of hits on this ‘ti:’ search is 360

-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)

-the records which have the word ‘국경’ in any position in the title fields (includes between subfields) are retrieved, such as

‘한국경제’= 한국/경제

‘중국경기공’ = 중국/경기공

‘미국경제’= 미국/경제

‘한국경문대전집’= 한국/경문/대전집

‘약국경영,’= 약국/경영, etc.

-In LC Online Catalog: (currently with space), title word search retrieves only 9 hits


Title word search for8

Title Word Search for 어학

Search 어학 (語學 : philology):

-In OCLC, the number of hits on ‘ti:’ search is 308

-the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991)

-Includes

‘국어학적’= 국어학적

‘단어학습’= 단어/학습

‘언어학’ = 언어학

‘일본어학교’= 일본어/학교

‘영어학원,’= 영어/학원, etc.

-In Voyager (currently with space), same search (tkey 어학) retrieves 32 hits


Title word search for9

Title Word Search for 고조선

Search 고조선 (古朝鮮 : name of ancient Korean country)

retrieves irrelevant records, such as

‘일본가와사끼쇼와댄고조선인숙사’ =일본/가와사끼/쇼와/댄고/조선인/숙사

‘CD-ROM을타고조선에가다’ = CD-ROM/을/타고/조선/에/가다

‘중국그리고조선족’ = 중국/그리고/조선족

‘하멜일지그리고조선국에관한기술’

= 하멜/일지/그리고/조선국/에/관한/기술

‘조선도자명고’ = 조선/도자/명고

‘조선로동당제5차대회에서한중앙위원회사업총화보고’

= 조선/로동당/제/5차/대회/에서/한/중앙/위원회/사업/총화/보고

‘조선아동문학문고’ = 조선/아동/문학/문고, etc.


Problems with non roman character korean searching

고조선2


Problems with non roman character korean searching

고조선4


Problems with non roman character korean searching

고조선7


Kochoson8

Kochoson8


Komunso1

komunso1


Komunso2

Komunso2


Komunso3

Komunso3


Title word search for10

Title Word Search for 한국경제

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 300

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,490

Title Phrase search for 한국경제: ‘ti=’ search


Title word search for11

Title Word Search for 韓國經濟

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 295

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,490

Title Phrase search for 한국경제: ‘ti=’ search


Title word search for12

Title Word Search for 한국經濟

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 295

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,490

Title Phrase search for 한국경제: ‘ti=’ search


Title word search for13

Title Word Search for 韓國경제

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 295

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,490

Title Phrase search for 한국경제: ‘ti=’ search


Title word search for14

Title Word Search for 韓國경제

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 295

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,499

Title Phrase search for 한국경제: ‘ti=’ search


Title phrase search for

Title Phrase Search for 한국경제

한국경제 (韓國經濟 : Korean Economy): ‘ti:’ search

-search 한국경제 : the number of hits 295

-search 韓國經濟 : the number of hits 652

-search 한국經濟 : the number of hits 3

-search 韓國경제 : the number of hits 0

-search ‘Hanguk kyongje’ : the number of hits 1,490

-search 한국#경제 : the number of hits : 461 (ti: 한국 AND ti: 경제)

Title Phrase search for 한국경제: ‘ti=’ search


Search ti nodongja or or or

Search ti: nodongja or 勞動者 or 노동자 or 로동자


Search ti nodongja or or or1

Search ti: nodongja or 勞動者 or 노동자 or 로동자


Korean ime problems

Korean IME Problems

1. Personal name search with invalid character from Korean IME

-Search 李光洙in ‘pn:’ : 0 hit. 李 (F9E1) is invalid character from Korean IME

-Search 李光洙in ‘pn:’ : 157 hits. 李 (674E) is valid MARC21 character

2. Title search with invalid character from Korean IME

-Search 論文集 in ‘ti:’ : 0 hit. 論 (F941) is invalid character from Korean IME

-Search 論文集 in ‘ti:’ : 21,393 hits. 論 (8AD6) is valid MARC21 character

3. Korean Family name “曺”

-No MARC 21 equivalent


Display order

Display Order

1.Browse search: sorted by Unicode value

number – roman – Japanese – Hancha – Hangul

2.Keyword search: sorted by alphabet order of Romanization form

number -- Romanization

3.Display order : character by character on designated value


Sort2

sort2

Unicode total strokes radical (# : stroke)

銀: 9280: 14 167 (gold) 8

門: 9580 : 8 169 (gate) 8

養: 990A: 15 184 (eat) 6

魂: 9B42 14 194 (ghost) 10

가: AC00


Sort3

sort3


Display order1

Display Order

  • Browse search: sorted by Unicode value

    number – roman – Japanese – Hancha – Hangul

    2.Keyword search: sorted by alphabet order of Romanization form

    number -- Romanization

    3.Display order : character by character on designated value

    NOT word by word


Sort1

sort1

진: C9C4

침: CE68

중: C911

인: C778


Display order2

Display Order

1.Browse search: sorted by Unicode value

number – roman – Japanese – Hancha – Hangul

2.Keyword search: sorted by alphabet order of Romanization form

number -- Romanization

3.Display order : character by character on designated value

NOT word by word


Cjk compatibility database

CJK Compatibility Database

  • The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents.

  • The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent.

  • The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University.

  • The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at [email protected]


Problems with non roman character korean searching

Thank you


  • Login