Content Types: Text and Metadata. Introduction. Text documents come in many forms Article (news, conference, journal, etc.) Email, memo, … Book, manual, manuscript, transcript, … Any part of one of the above Syntax can express Structure Presentation style Semantics (e.g. software code).
pi is the probability of symbol I (symbol frequency over number of symbols)
Need a text model for real language