1 / 14

TweaXML

TweaXML. A Language to manipulate & extract data from XML files. Kaushal Kumar (kk2457) Srinivasa Valluripalli (sv2232). Contents. Overview and motivation Language features XML handling functionalities Architectural Design Tutorial (with example) Lessons learned Summary.

marinel
Download Presentation

TweaXML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TweaXML A Language to manipulate & extract data from XML files Kaushal Kumar (kk2457) Srinivasa Valluripalli (sv2232)

  2. Contents • Overview and motivation • Language features • XML handling functionalities • Architectural Design • Tutorial (with example) • Lessons learned • Summary

  3. Overview and Motivation • TweaXML is a language to parse and extract data from XML files and create new csv/txt files in user defined data-formats. • XML is a universal language and is used to pass data around between heterogeneous systems. • (But) Parsing an XML file programmatically is not straightforward. • To parse an XML file: • First you need to learn Java (for example) • Then learn APIs like DOM-Parser and SAX-Parser. • These API-usage can be too complicated. • TweaXML provides a much simpler language to parse XML files. Moreover, it provides a way to create output files containing this data in user-defined formats.

  4. Language Features • Carefully chosen set of keywords • Multiple Types (int, string, node, file, array) • Several Operators • Unary Operators (~, !) • Arithmetic Operators (+, -, *, /) • Comparison (<, <=, >, >=, ==, !=) • Logical Operators (&&, ||) • node operators (getchild, getvalue) • file operators (open, create, print, close) • inbuilt functions (add, subtract, multiply, divide, length)

  5. Language Features (cont) • various types of statements • Conditional statements (if … else) • Iterative statements (while) • jump statements (return, continue, break) • I/O statements (open, create, print, close) • inbuilt function calls (add, subtract, multiply, divide, length)

  6. XML Handling functionalities • Open an XML file to read (open) • returns the root node of the xml file • Get the child nodes of a node, using the xpath of the child-nodes (getchild) • returns an array of child-nodes • Get the length of the child nodes array (length) • Get the value of a node (getvalue) • returns the value of the node in string format • add the values of two nodes (add) • implicit checks of data types • subtract the values of two nodes (subtract) • multiply the values of two nodes (multiply) • divide the values of two nodes (divide)

  7. File Handling functionalities • Create an output file to write (create) • returns the file type • Write in the file (print) • close the output file once you are done (close)

  8. Architectural Design Front end (TweaXMLLexer & TweaXMLParser) Tree Walker (TweaXmlWalker & TweaXmlCodeGen) Back End (CodeGen.java) Run time Libraries (Apache’s DOM Parser)

  9. Tutorial - Example (A tweaxml program to extract student’s performance data and create a csv file with the average marks of each student) Input XML file: (marks_data.xml) <students> <student> <name>kaushal</name> <homework1>85</homework1> <homework2>85</homework2> <midterm>70</midterm> <final>90</final> </student> <student> <name>Srini</name> <homework1>80</homework1> <homework2>85</homework2> <midterm>87</midterm> <final>95</final> </student> … … </students>

  10. Tweaxml program: start(){ file output; node rootNode; output = create "AvgMarks.csv"; rootNode = open "marks_data.xml"; node studentNodes[]; studentNodes = getchild rootNode "student"; int len; len = length studentNodes; if(len > 0) { int j; j=0; while(j < len) { node nameNode[], homework1Node[], homework2Node[], midtermNode[], finalNode[]; string name, homework1Marks, homework2Marks, midtermMarks, finalMarks; nameNode = getchild studentNodes[j] "name"; homework1Node = getchild studentNodes[j] "homework1"; homework2Node = getchild studentNodes[j] "homework2"; midtermNode = getchild studentNodes[j] "midterm"; finalNode = getchild studentNodes[j] "final";

  11. name = getvalue nameNode[0]; homework1Marks = getvalue homework1Node[0]; homework2Marks = getvalue homework2Node[0]; midtermMarks = getvalue midtermNode[0]; finalMarks = getvalue finalNode[0]; string totalMarks; totalMarks = add homework1Marks homework2Marks; totalMarks = add totalMarks midtermMarks; totalMarks = add totalMarks finalMarks; string avgMarks; avgMarks = divide totalMarks "4"; print output name; print output "\t"; print output avgMarks; print output "\n"; j = j + 1; } } close output; }

  12. Output Output file: (AvgMarks.csv) kaushal 82.5 Srini 86.75 … …

  13. Lessons Learned • Start early on the project • More functionalities could have been added • More data types could have been provided • User defined functions could have been added

  14. Summary • TweaXML provides an easier way to deal with xml files. • Data can be extracted and written out in user-defined formats. • No need to learn APIs like DOMParser and SAXParser • It’s not perfect, but it’s highly useful. • More functionalities could have been provided if given more time.

More Related