1 / 38

Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library -

Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library -. Tsutomu SUZUKI ( tsutomu@waseda.jp ) Waseda University Library 4 th Hong Kong INNOPAC Users Group Meeting December 2003. WASEDA University Overview. Founded in 1882

adamma
Download Presentation

Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI (tsutomu@waseda.jp) Waseda University Library 4th Hong Kong INNOPAC Users Group Meeting December 2003

  2. WASEDA University Overview • Founded in 1882 • Now has:-- 10 undergraduate schools-- 14 graduate schools-- 5 large campus libraries & 27 small libraries-- 2 university museums -- 44,576 undergraduate and 6,147 graduate students (as of end April, 2002)

  3. Library Overview (as of March 31, 2002) • 4,705,597 books(2,980,352 cjk books + 1,725,245 western books) • 49,615 journal titles(Currently subscribing 19,509) • 879,336 items checked out / year • ILL transactions: 13,951 requesets to other libraries: 18,491 requesets received from other libraries • Total number of Central Library visits: 1,197,731 (2002.4 – 2003.3)

  4. Current Status of Our INNOPAC Recent record numbers (Oct. 29, 2003) from M-I-F-S • 1,752,690 bibliographic records • 3,434,122 item records • 52,133 check-in records Public Catalog Searches from “ANALYZE patron searches” • 5,149,322 searches (2002.4- 2003.3)

  5. Unicode Port on WEBPAC • On November 17th, Unicode OPAC was released to the public. ( some character code troubles still remain....) • Downloading Chinese & Korean bib data from OCLC. • Record Maitainance: AnzioWin • Number of the C & K bib records (as of 11th Nov.):15,971 bibs of Chinese materials:157 bibs of Korean materials

  6. Appearance - Chinese record -

  7. Appearance - Korean record -

  8. Character code issues

  9. Case1: Mapping Error The screen below shows my patron record on Millennium Circulation. One of Katakana character “Zu” is not displayed properly.

  10. Case1: Mapping Error If I search “suzuki” on Unicode-OPAC, “zu” is ignored and “suki” hit.

  11. Case1: Mapping Error SJIS EACC UNICODE EACC: 69253A SJIS: 253A UNICODE:30BA This EACC character is NOT mapped to any UNICODE character. It should be mapped to 30BA in UNICODE.

  12. Case2: Shift-JIS to EACC Issue When I search for this hanji on Shift_JIS OPAC, then Innopac returns only 9 records.

  13. Case2: Shift-JIS to EACC Issue SJIS EACC UNICODE EACC: 214930 SJIS: 97E9 The EACC character ”215D58” is not assigned any glyph, according to the OCLC CJK 3.11. But the mapping from S-JIS to EACC works fine.

  14. Case2: Shift-JIS to EACC Issue On the other hand, I searched this hanji on Unicode OPAC, then Innopac returned more than 2,000 records!

  15. Case2: Shift-JIS to EACC Issue SJIS EACC UNICODE SJIS: 97E9 EACC: 214930 These Shift_JIS and Unicode characters have the same glyph, but Innopac stored them into two different EACC code positions. Therefore we can NOT search both characters at once. No relationship EACC: 455564 UNICODE: 6FDB

  16. Case2: Shift-JIS to EACC Issue SJIS EACC UNICODE SJIS: 97E9 EACC: 214930 One of the solutions Change the mapping of this Shift_JIS character from 214930 to 455564. EACC: 455564 UNICODE: 6FDB

  17. Case3: EACC Layers Related Issue Shift_JIS Telnet Screen Sample (my record). The data is displayed correctly.

  18. Case3: EACC Layers Related Issue SJIS EACC UNICODE EACC: 215D58 SJIS: 97E9 In Shift_JIS environment, there is no troubles in searching and displaying this character.

  19. Case3: EACC Layers Related Issue We can see the same data properly on Millennium. {69253a} is other problem already mentioned in case 1.

  20. Case3: EACC Layers Related Issue Reviewing the same data AFTER editing an element (NOTE) on Millennium. EACC character codes are displayed directly at one of name field and address.

  21. Case3: EACC Layers Related Issue We can see the data correctly on Millennium even after editting.

  22. Case3: EACC Layers Related Issue SJIS EACC UNICODE EACC: 215D58 SJIS: 97E9 Relationship Same code position on other layers UNICODE: 9234 EACC: 4B5D58

  23. Case3: EACC Layers Related Issue SJIS EACC UNICODE EACC: 215D58 SJIS: 97E9 If records including this character are saved on Millennium, this hanji is NOT stored as original EACC code (215D58). Relationship Same code position on other layers UNICODE: 9234 {4B5D58} EACC: 4B5D58 No character assigned

  24. Case4: Duplication codes in EACC

  25. Case4: Duplication codes in EACC There are more than 1,000 records by “matsu” on Shift_JIS OPAC.

  26. Case4: Duplication codes in EACC There is ONLY one record by “matsu” on Unicode OPAC. (The below shows direct hit result.)

  27. Case4: Duplication codes in EACC SJIS EACC UNICODE SJIS: 8FBC EACC: 21442D We can DISPLAY both 21442D and 276163 in Unicode OPAC, but only 276163 is searchable. Because of this EACC code duplication, the search results is NOT same between Shift_JIS OPAC and Unicode OPAC. EACC: 276163 UNICODE: 677E

  28. Case5: Not Unified characters in UNICODE Do you think these two characters are same or not? UNICODE: 5618 UNICODE: 5653

  29. Case5: Not Unified characters in UNICODE The result of searching “uso” on Shift_JIS OPAC.

  30. Case5: Not Unified characters in UNICODE The same search on Unicode OPAC. The result does not seem correct .

  31. Case5: Not Unified Characters in UNICODE Input the other “uso” by picking up from code table, the result is the same as Shift_JIS OPAC.

  32. Case5: Not Unified Characters in UNICODE SJIS EACC UNICODE UNICODE: 5618 SJIS: 8952 NOT HIT! EACC: 21373B UNICODE: 5653

  33. Case5: Not Unified Characters in UNICODE SJIS EACC UNICODE UNICODE: 5618 SJIS: 8952 This 5618 should be normalized as 5653 in searching. EACC: 21373B UNICODE: 5653

  34. Normalization issue This search means “Harry Potter” in Katakana form. Some special characters are ignored at searching on Unicode OPAC. In this sample, “Cho-on” , Japanese prolonged sound symbol does not work.

  35. Example of NOT unified characters (Case5) Unicode:6236,6237,6238

  36. Related Documents & Information • The Library of Congress HomepageMARC 21 Specifications for Record Structure, Character Sets, and Exchange Media -- CHARACTER SETS: Part 3 -- Code Table 9: EAST ASIAN (June 16, 2003)http://www.loc.gov/marc/specifications/specchareacc.html • The Unicode Standard Version 3.0. The Unicode Consortium. ISBN 0201616335 (Version 4.0 released now) • OCLC CJK and it’s contents in HELPhttp://www.oclc.org/cjk/

  37. Unicode Opac in Japan • University of TokyoMultilingual OPAC the University of Tokyo http://mulopac.dl.itc.u-tokyo.ac.jp/ • National Diet LibraryNDL Asian Language Materials OPAC http://asiaopac.ndl.go.jp/index_e.html

  38. The Best Solution Unicode + normalization scheme Thank you!!

More Related