1 / 35

eXVisXML , uma ferramenta emblemática na análise documental

eXVisXML , uma ferramenta emblemática na análise documental. Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho. Context. Motivation. Motivation. Motivation. Motivation. Motivation. XML Document Visualization.

lavi
Download Presentation

eXVisXML , uma ferramenta emblemática na análise documental

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXVisXML, uma ferramenta emblemática na análise documental Universidade de Aveiro Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho

  2. Context Universidade de Aveiro

  3. Motivation Universidade de Aveiro

  4. Motivation Universidade de Aveiro

  5. Motivation Universidade de Aveiro

  6. Motivation Universidade de Aveiro

  7. Motivation Universidade de Aveiro

  8. XML DocumentVisualization • The role of the visualization technology (in PC and SE) is recognized as very fruitful. • The use of SV features allows us to capture a great amount of information in a faster way • Graphical representations cause a positive impact in learning process Universidade de Aveiro

  9. XML DocumentVisualization • Retrieve information from plain documents efficiently IS NOT AN EASY TASK • Machine manipulation: • XSL and other production-systems can easily extract information and transform them • Human manipulation: • It is not as easy as desirable • The annotation is complex / Document is too big Universidade de Aveiro

  10. XML DocumentVisualization • Many tools appear to aid in the visualization of XML documents: • XML Schema Designer (Microsoft) • Xpath Analyzer (Altova) • … Although these tools offer highlighted syntax, and easy manipulation (collapse/expand), their view is a hierarchical and textual. Universidade de Aveiro

  11. Traditional XML DocumentVisualization Universidade de Aveiro

  12. OurProposal for XML DocumentVisualization In this context, we want to get a visualization that makes easier the comprehension process. However, we should take care with the graphical or iconic representations hence it depends on problem domain. Inspired in Alma, the eXVisXML interface for the visual inspection of XML documents is divided into 3 main parts: Universidade de Aveiro

  13. OurProposal for XML DocumentVisualization • One window that displays the source document; • One window exhibiting the textual hierarchy • One window to show the tree associated with the source document (graphical); Universidade de Aveiro

  14. OurProposal for XML DocumentVisualization Universidade de Aveiro

  15. XML DocumentSlicing • Slicing concept appears in 1979, by Weiser. • Its applied to a program considering a slicing criterion (a pair composed by a line number and a set of variables). • The objective is to find the statements that possibly affect those variables. • This technique can be also applied to XML documents. How? Universidade de Aveiro

  16. XML DocumentSlicing • XML document + slicing criterion (a Xpath expression can be regarded as a slicing criterion, but simplified) • A document slice is a new XML document composed by those elements that are strictly necessary to maintain the tree structure. Universidade de Aveiro

  17. XML DocumentSlicing It is proved, by Josep Silva, in Slicing XML documents, that slicing techniques applied to XML and DTD documents produce valid XML and DTD slices with the respect to the slicing criterion. Universidade de Aveiro

  18. XML DocumentSlicing • Given the whole XML document of Romeo and Juliet screenplay and • The slicing criterion Greg the result is: Universidade de Aveiro

  19. XML DocumentsSlicing Universidade de Aveiro

  20. XML DocumentMetrics • Effective management of any process requires quantification, measurement, and modeling. • Software metrics provide a quantitative basis for the development and validation of models of the software development process • Metrics can be used to improve software productivity and quality Universidade de Aveiro

  21. XML DocumentMetrics In the field of XML, quality assessment is also relevant because the approach followed by engineers or end-users, to design the annotation schema or even to markup existent tests, is many times improvised and naïf. Concepts like well-formedness or validity are not sufficient to appraise XML documents. So, a set of metrics were defined to form the basis of the quality measurement of a XML document. Universidade de Aveiro

  22. XML DocumentMetrics • Size • Structure Complexity • Structure Depth • Fan-in / Fan-out • Instability • Tree impurity • Attributes per Element • Non-used components • Text length Universidade de Aveiro

  23. XML DocumentMetrics Sucessor Graph Given a DTD, we say that a new component (element/attribute) is an immediate successor of the element under definition. Then, we introduce an arrow (oriented edge) from the element to the component. Example: < !ELEMENT Item (FileName, Artist?) > <!ELEMENT FileName (#PCDATA)> <!ELEMENT Artist (#PCDATA)> Universidade de Aveiro

  24. Sucessor Graph (RomeoandJulietscreenplay) Universidade de Aveiro

  25. XML DocumentMetrics Size Given a DTD, its size (i.e. the value for this metric) is the total number of nodes in the SG (number of DTD components). Universidade de Aveiro

  26. XML DocumentMetrics Structure complexity Where e is the number of edges in the SG, n is the number of nodes in the SG and n_idref is the number of IDREF attributes. Universidade de Aveiro

  27. XML DocumentMetrics Structure Depth According to Meike Klettke, in Metrics for XML document collections, a SG with a depth much higher than 7 is complex and reveals a bad DTD design. Universidade de Aveiro

  28. XML DocumentMetrics Fan-in / Fan-out For the graph as a whole, the average and the maximum values for those parameters can be useful to spot unusual nodes, which can be inspected to detect the anomaly and fix the problem. Elements with a high Fan-in/Fan-out value are more complex than other elements with a lower value. Universidade de Aveiro

  29. XML DocumentMetrics Instability A node with a low instability allows us to conclude that it is less dependent of other nodes, while many nodes are depend on it. Universidade de Aveiro

  30. XML DocumentMetrics Tree Impurity A tree impurity of 0% means that a graph is a tree and a tree impurity of 100% means that it is a fully connected graph. Universidade de Aveiro

  31. XML DocumentMetrics Attributes per Element The AttrsEle(DTD) metric allows us to figure out the average number of attributes defined per element in the DTD. The AttrsEle(XML) metric, applied directly to the XML document, allows us to figure out the average number of attributes actually used per effective elements present in the XML document. Universidade de Aveiro

  32. XML DocumentMetrics Non-used Components if Attr(DTD) represents the set of attributes defined in the DTD, and Attr(XML) represents the set of actual attributes (the attributes used in the XML document instance), then NonAttr(XML) is the set of non-used attributes. Universidade de Aveiro

  33. XML DocumentMetrics Text Length where, length(PCDATA) computes the total length of the document's text (the sum of the length of all text fragments, i.e., text associated with element tags, or untagged text), and nPCDATA is the number of text fragments (the number of PCDATA leaves that appear in the XML document tree). Universidade de Aveiro

  34. MetricResults(RomeoandJulietscreenplay) Universidade de Aveiro

  35. Conclusion Universidade de Aveiro

More Related