1 / 11

Overview of XML & XHTML

Overview of XML & XHTML. Instructor: Joseph DiVerdi, Ph.D., MBA. Character Sets. A Brief Digression. Character Sets. Character A Unit of a Written Language System ay, bee, see, dee, eff, gee, aych, eye Glyph An Actual Printed or Displayed Character = a b c 5 , $ ó.

dalia
Download Presentation

Overview of XML & XHTML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of XML & XHTML Instructor: Joseph DiVerdi, Ph.D., MBA

  2. Character Sets • A Brief Digression...

  3. Character Sets • Character • A Unit of a Written Language System ay, bee, see, dee, eff, gee, aych, eye • Glyph • An Actual Printed or Displayed Character = a b c 5 , $ ó

  4. Character Sets • A Character May Associate With Several Glyphs • Close Quote - " or » • A Glyph May Correspond to Several Characters • Comma - Pause in Sentence or Decimal Indicator • In Certain Languages

  5. Character Sets • Each Character Is Assigned • A Specific Numeric Value • Number of Characters in a Character Set • Limited by the Bit-depth of Its Encoding • 8-Bit Encoded Character Set - 256 characters • 16-Bit Encoded Character Set - 65,536 characters • HTML v2.0 & v3.2 are based on ISO 8859-1 • 8-Bit Character Set • AKA Latin-1

  6. Character Sets • ISO-8859-1 Character Set • 8-Bit Depth • First 128 Values From US-ASCII Numeric Value Glyph Description 13 CR carriage return 48 0 digit zero 64 A uppercase aye 94 ^ caret 177 ± plus-or-minus 191 ¿ inverted question mark 255 ÿ lowercase wye w/umlaut

  7. Character Sets (continued) • Common 8-bit character sets ISO 8859-1 Latin-1 ISO 8859-5 Cyrillic ISO 8859-6 Arabic ISO 8859-7 Greek ISO 8859-8 Hebrew SHIFT_JIS Japanese EUC_JP Japanese

  8. Uses of Character Sets Languages Countries Character Sets French fr iso-8859-1 Greek el iso-8859-7 Hebrew iw iso-8859-8 Hungarian hu iso-8859-2 Icelandic is iso-8859-1 Italian it iso-8859-1 Japanese ja shift_jis, iso-2022-jp, euc-jp Romanian ro iso-8859-2 Russian ru koi-8-r, iso-8859-5 Serbian sr iso-8859-5 Slovak sk iso-8859-2 Spanish es iso-8859-1 Turkish tr iso-8859-9 Ukrainian uk iso-8859-5

  9. Character Sets (continued) • 256 Characters are Sufficient • For Certain Languages • Insufficient for Others • Japanese (kanji) • Chinese • Korean • Vietnamese • Hence the Need For • 16-Bit Encoded Character Sets

  10. Character Sets • 16-Bit Encoded Character Sets • Two Contiguous Bytes Represent One Character • 65,536 Possible Characters in One Set • Unicode is a 16-bit Character Set • Developed by the Unicode Consortium • Practically Identical to ISO 10646-1 • First 256 Slots Allocated to ISO 8859-1 • Backwards Compatible (woo-hoo!)

  11. Character Sets • A Brief Digression... • Bottom Line • Specify Your Encoding As Required • Important For International Applications • Multi-Lingual Applications • There, now you know about it.

More Related