Character Set and Language Negotiation in Z39.50 Version 3 - PowerPoint PPT Presentation

character set and language negotiation in z39 50 version 3 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Character Set and Language Negotiation in Z39.50 Version 3 PowerPoint Presentation
Download Presentation
Character Set and Language Negotiation in Z39.50 Version 3

play fullscreen
1 / 21
Character Set and Language Negotiation in Z39.50 Version 3
114 Views
Download Presentation
cleave
Download Presentation

Character Set and Language Negotiation in Z39.50 Version 3

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Character Set and Language Negotiation in Z39.50 Version 3

  2. Scope • Negotiate language of messages • Negotiate character set of InternationalString • Z39.50 “message” strings • Optionally retrieve records in negotiated character set • Character set negotiation only valid for version 3 Stockholm, 10 August 1999

  3. Negotiation Basics • Carried in UserInfo external object in Init • Similar to option negotiation • origin proposes list of possibilities • target selects one from list • Only a single round of negotiation takes place • Applies to complete session • Cannot change during session Stockholm, 10 August 1999

  4. UserInfoFormat-charSetandLanguageNegotiation-2{1 840 10003 10 2} DEFINITIONS ::= BEGIN CharSetandLanguageNegotiation ::= CHOICE { proposal [1] IMPLICIT OriginProposal, response [2] IMPLICIT TargetResponse } Stockholm, 10 August 1999

  5. Character Sets • ISO 2022 is “code page” approach to character set • ISO 10646 is ~ Unicode • Different procedures for negotiating character sets: • ISO 2022 • ISO 10646 • Can negotiate “private” character set Stockholm, 10 August 1999

  6. OriginProposal ::= SEQUENCE { proposedCharSets [1] IMPLICIT SEQUENCE OF CHOICE{ iso2022 [1] Iso2022, iso10646 [2] IMPLICIT Iso10646, private [3] PrivateCharacterSet} OPTIONAL, -- proposedCharSets must be omitted -- if origin proposes version 2 } Stockholm, 10 August 1999

  7. ISO 2022 • Supports 7- and 8-bit environments • “Page” is 96 graphic characters (“G set”) and 32 control characters (“C set”) • 2 G pages active at any one time (G-Right [hex 20-7F], G-Left [hex A0-FF]) • 2 C sets active (C0 [00-1F], C1 [80-9F]) • Can define 4 G pages and swap into GL, GR as needed Stockholm, 10 August 1999

  8. ISO 2022 Escapes • Assign character sets to pages G0-G3, C0-C1 • Make G pages active in GL, GR • Character sets identified by 1 or 2 characters in the escape sequence • Character sets and the escape sequences to identify them are registered : • http://www.itscj.or.jp/ISO-IR/index.htm Stockholm, 10 August 1999

  9. ISO 2022 negotiation • Negotiate initial assignment of G0-G3 • Negotiate initial assignment of GL, GR • Sequence of origin proposals for all of these • Target response chooses one of these proposals • In absence of negotiation must assume IRV in GL with GR undefined • no characters above hex 7F Stockholm, 10 August 1999

  10. Iso2022 ::= CHOICE{ originProposal [1] IMPLICIT SEQUENCE{ proposedEnvironment [0] Environment OPTIONAL, proposedSets [1] IMPLICIT SEQUENCE OF INTEGER, proposedInitialSets [2] IMPLICIT SEQUENCE OF InitialSet, proposedLeftAndRight [3] IMPLICIT LeftAndRight }, } Environment ::= CHOICE{ sevenBit [1] IMPLICIT NULL, eightBit [2] IMPLICIT NULL } Stockholm, 10 August 1999

  11. InitialSet::= SEQUENCE{ g0 [0] IMPLICIT INTEGER, g1 [1] IMPLICIT INTEGER, g2 [2] IMPLICIT INTEGER, g3 [3] IMPLICIT INTEGER, c0 [4] IMPLICIT INTEGER, c1 [5] IMPLICIT INTEGER } LeftAndRight ::= SEQUENCE{ gLeft [3] IMPLICIT INTEGER {g0 (0), g1 (1), g2 (2), g3 (3)}, gRight [4] IMPLICIT INTEGER {g1 (1), g2 (2), g3 (3)} } Stockholm, 10 August 1999

  12. ISO 10646 • Defines a single set of 1032 possible characters (4+ billion !!!) • Divided into “planes” of 1016 characters • Only first plane currently has characters defined: “Basic Multilingual Plane” (BMP) • BMP is co-terminous with Unicode • Z39.50 negotiates ISO 10646, not Unicode per se Stockholm, 10 August 1999

  13. Unicode Encoding Rules • UCS-4:32-bit characters • UCS-2: 16-bit character encoding with “surrogate” mechanism for characters in planes above 0 • UTF-16: like UCS-2 • UTF-8: 8-bit character encoding, with variable length multi-byte characters for all characters other than first 128 Stockholm, 10 August 1999

  14. UTF-8 • Intended to be a “file system safe” encoding • Guarantees that every character with value below hex 80 is an ASCII character, including hex 00. • All characters with values above 7F are encoded as 2, 3 or 4 bytes • Transformation between UTF-8 and UCS-2 is simple and efficient Stockholm, 10 August 1999

  15. Negotiating ISO 10646 • Specify the “character repertoire” (i.e. the subset of the full UCS that will be used) • Specify the encoding • Handled by object identifiers • For Unicode: • character repertoire is the full BMP • encoding can be UTF-16 or UTF-8 Stockholm, 10 August 1999

  16. Iso10646 ::= SEQUENCE{ collections [1] IMPLICIT OBJECT IDENTIFIER, -- oid of form 1.0.10646.implementationLevel -- .repertoireSubset.arc1.arc2. .... -- [use 1.0.10646.1.2.1.3 for Unicode] encodingLevel [2] IMPLICIT OBJECT IDENTIFIER -- oid of form 1.0.10646.0.form -- where value of 'form' is 2, 4, 5, or 8 -- for ucs-2, ucs-4, utf-16, utf-8 Stockholm, 10 August 1999

  17. Language Negotiation • Instances of InternationalString are either “message” or “name” • Language negotiation applies to “message strings” • Origin proposes one or more language codes • Codes from Z39.53 • Target may choose 1 of these proposed codes Stockholm, 10 August 1999

  18. proposedLanguages [2] IMPLICIT SEQUENCE OF LanguageCode OPTIONAL, recordsInSelectedCharSets [3] IMPLICIT BOOLEAN OPTIONAL -- default 'false’ Stockholm, 10 August 1999

  19. initRequest { -- SEQUENCE referenceId -- "9" --, protocolVersion 'e0'H, options 'eda2'H, preferredMessageSize 15000, exceptionalRecordSize 15000, implementationName -- "Amicus Professional Workstation" --, implementationVersion -- "3.0” --, otherInfo { -- SEQUENCE OF { -- SEQUENCE category { -- SEQUENCE categoryTypeId {1 2 840 10003 10 2}, categoryValue 0 }, information externallyDefinedInfo { -- SEQUENCE direct-reference {1 2 840 10003 10 2}, encoding single-ASN1-type proposal { -- SEQUENCE proposedCharSets { -- SEQUENCE OF iso10646 { -- SEQUENCE collections {1 0 10646 1 2 1 3}, encodingLevel {1 0 10646 1 0 8} }, Stockholm, 10 August 1999

  20. iso2022 originProposal { -- SEQUENCE proposedEnvironment eightBit NULL, proposedSets { -- SEQUENCE OF 2, 1000, 1001, 1002, 1003, 1, 67 }, proposedInitialSets { -- SEQUENCE OF { -- SEQUENCE g0 2, g1 1001, g2 1001, g3 1001, c0 1, c1 67 } }, proposedLeftAndRight { -- SEQUENCE gLeft 0, gRight 1 } }, Stockholm, 10 August 1999

  21. proposedlanguages { -- SEQUENCE OF -- “ENG” }, recordsInSelectedCharSets TRUE } } } } } Stockholm, 10 August 1999