1 / 13

Discussion on Chinese Domain Name technology including encoding, testing

Discussion on Chinese Domain Name technology including encoding, testing. Clean 8 bits & UTF-8 problem. Escape code “ ” rule must be clear. Ex. 成功 成功 Other special character in UNIX shell Ex. 教育 (|) “ 教育 ” will be workable. Clean 8 bits & UTF-8 problem. Windows 9X

maalik
Download Presentation

Discussion on Chinese Domain Name technology including encoding, testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discussion on Chinese Domain Name technologyincluding encoding, testing

  2. Clean 8 bits & UTF-8 problem • Escape code “\” rule must be clear. • Ex. 成功 成功\ • Other special character in UNIX shell • Ex. 教育(|) • “教育”will be workable

  3. Clean 8 bits & UTF-8 problem • Windows 9X • http://user:account@location@host.domain/ • Ex. http://統一企業 will be error • Automatic insertion “\” in DNS, not insertion “\” in DHCP • Ex. ping 成功\大學 • Windows 2K • UTF-8 in resolver(ping,ftp) • Clean 8 bits in nslookup • Double encoding in IE5 and resolver

  4. Windows Client & Server 之轉碼

  5. Suggestion • Chinese character & Alpha numeric character mixed sub-domain name. • if there exist 8 bits character then that sub-domain character is case sensitive

  6. Suggestion • For example: • www.A王.tw • wWw.A王.TW the same • For example: • www.a王.tw • www.A王.tw different

  7. Multi-lingual • Multi-Byte character & single byte character 的問題 • 多國語言使用multi-byte character

  8. Problem (1) • Multi-byte character has the byte code that is equivalent to single byte ASCII code, and some intermediate processing software package(Ex. BIND, sendmail, web proxy) can not recognize them separately. Especially in control character code (“\”,”@”,”|”…)

  9. Solutions • Solution 1 • Multi-byte character: \nnn\nnn. • Solution 2 • Non ASCII code transformation. UTF-8 • Solution 3 • All character transform to pure ASCII code, UTF-7, UTF-5 • Solution 4 • Clear byte stuffing, Escape code rule “\\”,”\@”

  10. Problem (2) • All alphanumeric domain name is case insensitive, Multi-byte character is case sensitive.

  11. Solutions • Solution 1 • Alphanumeric character transfer to lower(or upper) case first. (client iDNS UTF5) • Solution 2 • All Multi-byte character are transformed to UTF-8, so the multi-byte character will 8th bit set (negative byte) and it will be recognized easily. (win 2K DNS server)

  12. Solutions • Solution 3 • If there exists one multi-byte character in sub-domain name, than that sub-domain will be case sensitive. (BIND server) • For ex. : www.A王.tw “A王”is case sensitive

  13. Why need solution 3 • Clear 8 bits is possible • Leading byte encoding has been used popularly (BIG5, GB, JIS…) • Compression ratio and conversion efficiency ? • An intermediate stage toward UNICODE.

More Related