1 / 39

XML Watermarking & Information Hiding

XML Watermarking & Information Hiding. 孙星明 博士、教授、博士生导师 湖南大学计算机与通信学院 网络与信息安全湖南省重点实验室. Markup Language. SGML (Standard Generalized Markup Language) XML (Extensible Markup Language) HTML (HyperText Markup Language) XHTML. Publishing Information in WWW. Publishing Information in WWW.

penny
Download Presentation

XML Watermarking & Information Hiding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Watermarking & Information Hiding 孙星明 博士、教授、博士生导师 湖南大学计算机与通信学院 网络与信息安全湖南省重点实验室

  2. Markup Language • SGML (Standard Generalized Markup Language) • XML (Extensible Markup Language) • HTML (HyperText Markup Language) • XHTML

  3. Publishing Information in WWW

  4. Publishing Information in WWW

  5. XML Document Corresponding Watermarking and information hiding techniques can be employed • XML element type • text • image • Video • Audio • executive codes • … Can we use its own information to do watermarking or information hiding?

  6. Known content-based technique • Change font size, color • Append white spaces at the end of a line • 0-space ( ) • 1-tab (	)

  7. Shortcomings • white spaces at the end of a line • Increase page size • Layout might be changed • Detect very easily by selection

  8. Specification • Element (Entity) • <nameattribute1 … attributen> contents </name > • <nameattribute1 … attributen> </name > • <nameattribute1 … attributen> • Attribute • name=“value” • Example • <fontface="Verdana" size="4" color="#FFFF00">Student Number: </font>

  9. Properties of markup labels • Property 1: Element and attribute names are case-insensitive • <font face="Verdana" size="4" color="#FFFF00">Student Number: </font> • <Font face="Verdana" size="4" color="#FFFF00">Student Number: </font> • <font face="Verdana" size="4" color="#FFFF00">Student Number: </Font> • <Font face="Verdana" size="4" color="#FFFF00">Student Number: </Font> • …

  10. Properties of markup labels • Property 2: Attributes are order-insensitive • <font face="Verdana" size="4" color="#FFFF00">Student Number: </font> • <font size="4" face="Verdana" color="#FFFF00">Student Number: </font>

  11. Pair attributes technique • pair attributes order (Corinna John) • key attribute, corresponding attribute • key / corresponding (1) corresponding/key (0) • <fontface="Verdana" size="4" color="#FFFF00">Student Name:</font> • <fontsize="4" face="Verdana" color="#FFFF00">Student Name:</Font> • key / corresponding table • size, detect difficultly

  12. Attributes permutation technique • equivalent attributes permutation • <fontface="Verdana" size="4" color="#FFFF00">Student Name:</font> • <fontface="Verdana" color="#FFFF00" size="4">Student Name:</font> • <fontsize="4" face="Verdana" color="#FFFF00">Student Name:</font> • <fontsize="4" color="#FFFF00" face="Verdana" >Student Name:</font> • <fontcolor="#FFFF00" face="Verdana" size="4" >Student Name:</font> • <fontcolor="#FFFF00" size="4" face="Verdana" >Student Name:</font> • lexicographic (alphabetic) order: f precedes a permutation g iff f(k)<g(k) for the minimum value of k such that f(k)<>g(k).

  13. Attributes permutation technique • Generating attributes permutations in lexicographical order • <fontcolor="#FFFF00" face="Verdana" size="4" >Student Name:</font> • <fontcolor="#FFFF00" size="4" face="Verdana" >Student Name:</font> • <fontface="Verdana" color="#FFFF00" size="4">Student Name:</font> • <fontface="Verdana" size="4" color="#FFFF00">Student Name:</font> • <fontsize="4" face="Verdana" color="#FFFF00">Student Name:</font> • <fontsize="4" color="#FFFF00" face="Verdana" >Student Name:</font> • attributes permutations  order numbers • colorfacesize 0 • colorsizeface 1 • facecolorsize 2 • facesizecolor 3 • sizefacecolor 4 • Sizecolorface 5

  14. Attributes permutation technique • If the number of attributes of an element >=2, it may be used to embed hidden information or watermark • Let be the elements, whose number of attributes , in a web page, the embedded capacity is

  15. Embedded capacity example

  16. Perceivability • Can not perceive when browse the page • Hard to perceive through reading the source codes

  17. Robust or resistant against editing • Contents can be changed

  18. Robust or resistant against editing • Font, size, color can be changed

  19. Security • attributes permutations  order numbers • colorfacesize 0 • colorsizeface 1 • facecolorsize 2 • facesizecolor 3 • sizefacecolor 4 • Sizecolorface 5 • Apply hash to concatenation of attributes and key to get order number

  20. Performance comparison

  21. Other potential properties • String delimiters • name=“value” • name=‘value’ • White Space Between the Element’s Name and the First Attribute • <font face=”verdana” size=”3”> • <font face=”verdana” size=”3”> • White Space Between Attributes • <font face=”verdana” size=”3”> • <font face=”verdana” size=”3”>

  22. Other potential properties • White Space after “=“ • <font face=”verdana” size=”3”> • <font face= ”verdana” size=”3”> • White Space Between Elements • <td>con1</td><td>con2</td> • <td>con1</td> <td>con2</td>

  23. Other potential properties • The default value of an attribute • <font face=”verdana” size=”3”> • <font face=”verdana”>

  24. Current progress • Introduce insignificant attributes • <font face=”verdana”> • <font face=”verdana” xyz=“abcd”> • Break through the capacity bottle neck • Web page watermarking • Text watermarking

  25. Our focus on watermarking • Text content security • Funded by NSFC Key Project 60736016 • Funded by NSFC 60373062 • Software watermarking • Funded by NSFC 60573045 • Wireless sensor network security • Funded by 973 Project 2006CB303000 • Funded by NSFC 60873198 • Steganalysis • Funded by 115 Project

  26. 谢谢 联系电话:0731-8821341,13875971258 Email:sunnudt@163.com http://nisl.hnu.cn/

  27. HyperText Markup Language (HTML), version 4.0, the publishing language of the World Wide Web • Recall that in HTML, element and attribute names are case-insensitive; the convention is meant to encourage readability. • Element and attribute names in this document have been marked up and may be rendered specially by some user agents. • http://www.w3.org/TR/1998/REC-html40-19980424/about.html#h-1.2.1

  28. http://www.w3.org/TR/html/#xhtml • HTML 4 [HTML4] is an SGML (Standard Generalized Markup Language) application conforming to International Standard ISO 8879, and is widely regarded as the standard publishing language of the World Wide Web. • SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language defined in SGML. • SGML has been around since the middle 1980's and has remained quite stable. Much of this stability stems from the fact that the language is both feature-rich and flexible. This flexibility, however, comes at a price, and that price is a level of complexity that has inhibited its adoption in a diversity of environments, including the World Wide Web. • HTML, as originally conceived, was to be a language for the exchange of scientific and other technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGML complexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support for hypertext. Multimedia capabilities were added later. • In a remarkably short space of time, HTML became wildly popular and rapidly outgrew its original purpose. Since HTML's inception, there has been rapid invention of new elements for use within HTML (as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.

  29. XML™ is the shorthand name for Extensible Markup Language [XML]. • XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power and richness, and yet still retains all of SGML's commonly used features. • While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.

  30. XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4]. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. The details of this family and its evolution are discussed in more detail in [XHTMLMOD]. • XHTML 1.0 (this specification) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML]. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits: • XHTML documents are XML conforming. As such, they are readily viewed, edited, and validated with standard XML tools. • XHTML documents can be written to operate as well or better than they did before in existing HTML 4-conforming user agents as well as in new, XHTML 1.0 conforming user agents. • XHTML documents can utilize applications (e.g. scripts and applets) that rely upon either the HTML Document Object Model or the XML Document Object Model [DOM]. • As the XHTML family evolves, documents conforming to XHTML 1.0 will be more likely to interoperate within and among various XHTML environments. • The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.

  31. Terrorism http://www.arabteam2000-forum.com/ Jihad信息隐藏技术训练手册(阿拉伯文)的部分英文翻译

  32. Watermark embedding

  33. Watermark detection

  34. Classification of watermarking—by host • Image • Audio • Video • Text (Document) • Software / Executive code • Database

  35. TXT unformatted email web PDF,WORD WPS,PS,etc book Text watermarking & Information Hiding Watermarking Information hiding

  36. Any redundance? NO Character One to one Code

  37. Utilize format information • Line-shift Coding • vertically displacing an entire text line • Word-shift Coding • horizontally shifting the location of a word within a text line • Character feature coding • altering a particular feature of an individual character

  38. Utilize language information • Synonym substitution • Syntactic transform • TMR tree (text meaning representation) • Add spaces at the end of a line

  39. Text recoverable watermarking • Format based watermarking? • Natural language watermarking? • How to combine?? • Text recoverable watermarking???

More Related