1 / 28

Standardization of Internationalized Domain Name at IETF

Standardization of Internationalized Domain Name at IETF. 24 Jan 2002 Yoshiro YONEYA <yone@nic.ad.jp> JPNIC. What is IDN?. I nternationalized D omain N ame. Current domain name is represented with ASCII alpha-numeric and hyphen characters.

whitley
Download Presentation

Standardization of Internationalized Domain Name at IETF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Standardization of Internationalized Domain Name at IETF 24 Jan 2002 Yoshiro YONEYA <yone@nic.ad.jp> JPNIC

  2. What is IDN? • Internationalized Domain Name. • Current domain name is represented with ASCII alpha-numeric and hyphen characters. • IDN is a technical challenge to represent domain name with not only ASCII but also NON-ASCII characters. APAN2002 Conference

  3. What is Internationalization? • Framework to extend character repertoire for domain names. • Need to be a Global Standard not to lose global communication. • IETF IDN (Internationalized Domain Name) WG is doing the work. • Some confusion by using the word ‘Multilingualization’. • Character is just one of a component of languages. • Multilingual domain name is a service level’s aspect. APAN2002 Conference

  4. Internationalized Domain Names 华人.公司.cn 華人.商業.tw 高島屋.会社.jp 삼성.회사.kr 三星.회사.kr الاهرام.م viagénie.qc.ca ישראל.קום ทีเอชนิค.พาณิชย์.ไทย 現代.com ヤフー.com http://www.jdna.jp/activities/event/jdn-tutorial/IDNSDK.pdf APAN2002 Conference

  5. Why IDN? • Increases of the Internet users who are not familiar with English. • Easy to memorize, type in, etc. • Drastic changes of usage of domain name. • Domain name is now used as not only host name but also signboard. • Creates new business opportunities. • Many ventures began services. APAN2002 Conference

  6. Drawback of IDN • Loses global acceptability at end-user interface. • Hard to type in or display NON-ASCII characters without appropriate I/O devices and / or softwares. • Cause impact to the operation. • Requires software update and / or additional processing. • Deployment issue. APAN2002 Conference

  7. History of IDN WG • Established on Jan 2000. • Mainly discussion is done on mailing list. • Had 1st meeting at 47th IETF at Adelaide. • From then, having meeting every IETF. • Decided WG’s solution at last (52nd) IETF. • IDNA, NAMEPREP and Punycode (formerly known as AMC-ACE-Z). • Waiting for WG last call. APAN2002 Conference

  8. Scope and priority of IDN WG • Provide standard. • Not to divide the global connectivity and communication of the Internet. • Backward compatibility. • Compatibility with current DNS and application protocols to work with current Internet infrastructure. • No localization. • Independent from certain regions, countries and / or languages • Refer to existing universal standards • Common framework essential to internationalization APAN2002 Conference

  9. IDNA(Internationalizing Domain Names In Applications)draft-ietf-idn-idna-06.txt • An architecture denotes how to process IDN. • Use Unicode which is upper compatible with ASCII as a character codeset. • Normalize internal representation of characters which has multiple code points such as upper/lower, full-width/half-width and composing characters, into a single representation not to fail matching. • Represent NON-ASCII characters which inputted or displayed at user interface as an ASCII Compatible Encoding (ACE) string on the Network. • Those processes be performed in application software. APAN2002 Conference

  10. Important point of IDNA • Representation at the user interface layer and the network layer is different. • Though the same for ASCII domain names. • Application solution. • Least impact to the Internet infrastructure. APAN2002 Conference

  11. To/From Unicode NAMEPREP To/From ACE Image of the IDNA Local User Application UI To/From Unicode NAMEPREP Internal Representation End system To/From ACE Resolver API Int’l DNS servers Application servers APAN2002 Conference

  12. NAMEPREP(Stringprep Profile for Internationalized Host Names)draft-ietf-idn-nameprep-07.txt • Profile for STRINGPREP (Preparation of Internationalized Strings) • draft-hoffman-stringprep-00.txt • Some scripts such as alphabet have multiple representation for a character. • Domain name is case insensitive. • Normalization process to unify representation of strings that is the same in meaning or displaying into a single representation. • Case (upper / lower) • Compatible character (full / half width) • Composing character APAN2002 Conference

  13. Important point of NAMEPREP • Normalize representation of Internationalized domain name string to match correctly. • ‘a’ vs ‘A’ • ‘u’+‘¨’ vs ‘ü’ • ‘ア’ vs ‘ア’ APAN2002 Conference

  14. Processes in NAMEPREP • map • Case folding of upper/lower characters (UTR#21) • normalize • Normalize representation of string (UAX#15 NFKC) • prohibit • Check out inappropriate character as domain name. APAN2002 Conference

  15. ACE(ASCII Compatible Encoding) • Represent NON-ASCII characters by ASCII characters. • Easy to apply current DNS. • Least impact to current applications. • Decreases maximum characters in each label. • Penalty of using only 5bit to represent 8bit data. • Requires some sort of compression algorithm. APAN2002 Conference

  16. ACE Identifier • Requires explicit ACE-identifier. • For reverse conversion. • Choice of ACE-ID is political issue. • ACE-ID itself is ASCII string, so that if any proposal for ACE-ID is raised, it will be registered as ASCII domain name. • Actually happened at gTLD. • IANA will assign the ACE-ID. APAN2002 Conference

  17. Criteria of ACE selection • Simple algorithm. • For ease implementation. • Interoperability. • Effective compression results for practical IDNs. • To accommodate characters as much as possible. • bilateral corresponding between encoding and decoding. • To avoid existence of alternative encoded representation for one IDN. • Security consideration. APAN2002 Conference

  18. Comparison of ACE proposals Encoding sample of ‘日本語ドメイン名試験.JP’ Evaluation result from existing Japanese JP domain names APAN2002 Conference

  19. Punycodedraft-ietf-idn-punycode-00.txt • Selected ACE of IDN WG. • Compression algorithm. • Extract characters by ascending order of codepoint. • Encode difference of codepoint from previously processed character’s and the position into an integer. • Extract Letters, Digits and Hyphen as bootstring. • ASCII conversion algorithm. • Introduced new concept named ‘Generalized variable-length integers’. • BASE36 (A-Z, 0-9). APAN2002 Conference

  20. Compression process of Punycode(simplified for understanding) • “文字列例” • Compression. • 1:U+6587 2:U+5B57 3:U+5217 4:U+4F8B • 4:0x4F8B 3:0x28C 2:0x440 1:0xA30 • 0x13E30 0xA33 0x1102 0x28C1 sort, diff To integer (diff*chars+ position) APAN2002 Conference

  21. Generalized variable-length integers of Punycode • 12345 in decimal is represented as 1*10^4+2*10^3+3*10^2+4*10^1+5*10^0 • Digits in all place are 0-9, so components in sequential 12345 cannot distinguish 123 and 45 or 1234 and 5. • Furthermore, 012345 and 12345 are the same value with different representation. • GVLI (Generalized variable-length integers) is an idea to solve this problem. • Defines threshold for each place, and recognize a number below the threshold is delimiter. • Threshold is an appropriate number smaller than base number. APAN2002 Conference

  22. Encoding process of Punycode (simplified for understanding) • Assign A-Z0-9 to GVLI. • Assume 36 for base, 10, 18, 25, 25 for thresholds. • 0x13E30 0xA33 0x1102 0x28C1 • OIUD • BS4 • CN8 • XML • “文字列例”=>“OUIDBS4CN8XML” . • Real Punycode generates “FSQW5D78MBSK”. 24*1+18*26(=1*(36-10))+30*468(=26*(36-18))+13*5148(=468*(36-25)) 11*1+28*26+4*468 12*1+23*26+8*468 33*1+22*26+21*468 APAN2002 Conference

  23. Standardization of IDN is just the start point of utilization • End users uses IDN with application softwares. • Web, Mail, etc. • IDNA requires application’s correspondence. • Must define how to deal IDNs in application protocols. Standardization of IDN does not mean ready to use. Just a start point for applications incorporating new features. APAN2002 Conference

  24. GET http://ジェーピーニック.JP/ HTTP/1.1 Host: ジェーピーニック.JP Referer: http://ジェーピーニック.JP/ ZQ--HCKQZ9BZB1CYRB.JP Web server’s IP adress Error! HTTP Request(DNS resolve only) Web DNS User http://ジェーピーニック.JP/ APAN2002 Conference

  25. GET http://ZQ--HCKQZ9BZB1CYRB.JP/ HTTP/1.1 Host: ZQ--HCKQZ9BZB1CYRB.JP Referer: http://ZQ--HCKQZ9BZB1CYRB.JP/ ZQ--HCKQZ9BZB1CYRB.JP Web server’s IP address Contents HTTP Request(ACE in HTTP header) Web DNS User http://ジェーピーニック.JP/ APAN2002 Conference

  26. References • IETF IDN WG Web page • http://www.i-d-n.net/ • Unicode Consortium • http://www.unicode.org/ APAN2002 Conference

  27. Acknowledgement • Telecommunications Advancement Organization of Japan (TAO). • JPNIC’s research activity of security investigation of IDN is a part of TAO’s research. • http://www.shiba.tao.go.jp/ APAN2002 Conference

  28. IDN Compliant clients & implementations • Mozilla http://playground.i-dns.net/mozilla/index.html • Plug-in to Mozilla, resolution using RACE • Opera http://www.opera.com/ • Native, Resolution using RACE • Internet Explorer 5 or higher http://www.microsoft.com/windows/ie/default.asp • Uses keyword search engine as RACE converter • mDNkit http://www.nic.ad.jp/jp/research/idn/mdnkit/download/ • Opensource toolkit for developing IDN compliant softwares APAN2002 Conference

More Related