490 likes | 878 Views
XML LONDON 2014. XML Processing in. William Narmontas Dino Fancellu www.scala.contractors. Dino Fancellu 35 years IT Scala • Java • XML. William Narmontas 10 years IT Scala • XML • Web. What is Scala?. Scala processes XML fast. It is powerful. Modular. Concise. Functional.
E N D
XML LONDON 2014 XML Processing in William Narmontas Dino Fancellu www.scala.contractors
Dino Fancellu 35 years IT Scala • Java • XML William Narmontas 10 years ITScala • XML • Web
Modular Concise Functional Type-safe Performant Object-oriented Strongly-typed Statically-typed Unopinionated Composable Java-interoperable First-class XML
eBay eHarmony EDF FourSquare Gawker HSBC ITV Klout Who uses Scala? Apple Bank of America Barclays BBC BSkyB Cisco Citigroup Credit Suisse LinkedIn Morgan Stanley Netflix Novell Rackspace Sky Sony Springer The Guardian TomTom Trafigura Tumblr Twitter UBS VMware Xerox
Projects in Scala - Less code to write = less to maintain - Communication clearer - Testing easier - Software robust - Time to market: fast - Happier developers
let $conferenceName := "XML London 2014" Scala XQuery var conferenceName ="XML London 2014"conferenceName ="XML London 2015" Scala (Mutable) Values val conferenceName ="XML London 2014"
Strings val language ="Scala" s"XML Processing in $language"| XML Processing in Scala s"""An introduction to: |The "$language" programming language""".stripMargin| An introduction to:| The "Scala" programming language s"$language has ${language.length} chars in its name"| Scala has 5 chars in its name
declarefunctionlocal:fun( $x asxs:integer, $y asxs:double) asxs:string {concat($x, ": ", $y)}; Scala XQuery Functions def fun(x:Int, y:Double) = s"$x: $y"
Everything is an expression val trainSpeed =if ( train.speed.mph >= 60 ) "Fast"else"Slow"def divide(numerator:Int, denominator:Int) =try { s"${numerator/denominator}" } catch {case_:java.lang.ArithmeticException => s"Cannot divide $numerator by $denominator" }
Types: Explicit def withTitle(name:String, title:String):String = s"$title. $name"val x:Int = {val y =1000100 + y}| x:Int = 1100
Functions: named parameters Further clarity in method calls: def makeLink(url:String, text:String) = s"""<a href="$url">$text</a>"""makeLink(text ="XML London 2014", url ="http://www.xmllondon.com")| <a href="http://www.xmllondon.com">XML London 2014</a>
Functions: default parameters Reduce repetition in method calls: def withTitle(name:String, title:String = "Mr") = s"$title. $name"withTitle("John Smith")| Mr. John SmithwithTitle("Mary Smith", "Miss")| Miss. Mary Smith
Functional def incrementedByOne(x:Int) = x + 1(1 to 5).map(incrementedByOne)| Vector(2, 3, 4, 5, 6)
Lambdas (1 to 5).map(x => x + 1) | Vector(2, 3, 4, 5, 6)(1 to 5).map(_ + 1) | Vector(2, 3, 4, 5, 6)
For comprehensions for { x <- (1 to 5) }yield x + 1 | Vector(2, 3, 4, 5, 6)
Implicit classes: Enrich types implicitclass stringWrapper(str:String) {def wrapWithParens = s"($str)"} "Text".wrapWithParens| (Text)
Powerful features for scalability - Case classes - Traits - Partial functions - Pattern matching - Implicits - Flexible Syntax - Generics - User defined operators - Call-by-name - Macros
Values: Inline XML val url ="http://www.xmllondon.com"val title ="XML London 2014"val xmlTree = <div> <p>Welcome to <a href={url}>{title}</a>!</p></div>| xmlTree:scala.xml.Elem =| <div>| <p>Welcome to <a href="http://www.xmllondon.com/">XML London 2014</a>!</p>| </div>
XML Lookups val listOfPeople = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person></people>listOfPeople \ "person"| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)listOfPeople \ "_"| NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)
XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable></fact>fact \\ "variable"| NodeSeq(<variable>A</variable>, <variable>A</variable>)fact \ "@type"| :scala.xml.NodeSeq = universalfact \@ "type"| :String = universal
XML Loading val pun ="""<pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer> |</pun>""".stripMargin scala.xml.XML.loadString(pun)| <pun rating="extreme">| <question>Why do CompSci students need glasses?</question>| <answer>To C#.</answer>| </pun>
Collections: expressive val root = <numbers> {for {i <-1 to 10} yield <number>{i}</number>}</numbers>val numbers = root \ "number"numbers(0)| <number>1</number>numbers.head| <number>1</number>numbers.last| <number>10</number>numbers take 3| NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)
Collections: expressive numbers filter (_.text.toInt > 6)| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)numbers(_.text.toInt > 6)| NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>)numbers maxBy (_.text)| <number>9</number>numbers maxBy (_.text.toInt)| <number>10</number>numbers.reverse| NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>, <number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>)numbers.groupBy(_.text.toInt % 3)| Map(| 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>),| 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>),| 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))
++ :\ andThen buildString companion copyToBuffer distinct endsWith flatten genericBuilder headOption inits isTraversableAgain lastIndexWhere max nameToString par product reduceRightOption sameElements seq sorted stringPrefix takeWhile toIndexedSeq toSet union xmlType zipWithIndex ++: \ apply canEqual compose corresponds doCollectNamespaces exists fold getNamespace indexOf intersect iterator lastOption maxBy namespace partition reduce repr scan size span sum text toIterable toStream unzip xml_!= +: \@ applyOrElse child contains count doTransform filter foldLeft groupBy indexOfSlice isAtom label length min nonEmpty patch reduceLeft reverse scanLeft slice splitAt tail theSeq toIterator toString unzip3 xml_== /: \\ asInstanceOf collect containsSlice descendant drop filterNot foldRight grouped indexWhere isDefinedAt last lengthCompare minBy nonEmptyChildren permutations reduceLeftOption reverseIterator scanRight sliding startsWith tails to toList toTraversable updated xml_sameElements /:\ addString attribute collectFirst copy descendant_or_self dropRight find forall hasDefiniteSize indices isEmpty lastIndexOf lift minimizeEmpty orElse prefix reduceOption reverseMap scope sortBy strict_!= take toArray toMap toVector view zip XML Methods: a rich API % :+ aggregate attributes combinations copyToArray diff dropWhile flatMap foreach head init isInstanceOf lastIndexOfSlice map mkString padTo prefixLength reduceRight runWith segmentLength sortWith strict_== takeRight toBuffer toSeq transpose withFilter zipAll
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib>
<bib>{ for { b <- xml \ "book" year = b \@ "year"if b \ "publisher" === "Addison-Wesley" && year > 1991} yield <book year={ year }> { b \ "title" } </book>}</bib> For-comprehensions: similar to XQuery <bib>{for $b in $xml/booklet $year := $b/@yearwhere $b/publisher = "Addison-Wesley" and $year > 1991return<bookyear="{ $year }"> { $b/title }</book>}</bib> Nice! ... yet is general purpose
Hybrid XML - XQuery for Scala - java.xml.* for free - Look up: XPath - Transform: XSLT - Stream: StAX
XQuery for Scala (XQS) - Wraps XQuery API for Java (javax.xml.xquery) - Scala access to XQuery in: - MarkLogic, BaseX, Saxon, Sedna, eXist, … - Converts DOM to Scala XML & vice versa - http://github.com/fancellu/xqs
XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget></widgets>import com.felstar.xqs.XQS._val conn =new net.xqj.basex.local.BaseXXQDataSource().getConnectionval nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets)| NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>, | <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)
XPath import com.felstar.xqs.XQS._val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget></widgets>val xpath = XPathFactory.newInstance().newXPath()val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets), XPathConstants.NODESET).asInstanceOf[NodeList](nodes:NodeSeq)| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)Natively in Scala:(widgets \ "widget")(widget => (widget \ "@id").isEmpty)| NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)
XSLT val stylesheet = <xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:templatematch="john"> <xsl:copy>Hello, John.</xsl:copy> </xsl:template> <xsl:templatematch="node()|@*"> <xsl:copy> <xsl:apply-templatesselect="node()|@*"/> </xsl:copy> </xsl:template></xsl:stylesheet>import com.felstar.xqs.XQS._val xmlResultResource =new java.io.StringWriter()val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet)xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource))xmlResultResource.getBuffer| <?xml version="1.0" encoding="UTF-8"?><people>| <john>Hello, John.</john>| <smith>Smith is here.</smith>| <another>Hello.</another>| </people> val peopleXml = <people> <john>Hello, John.</john> <smith>Smith is here.</smith> <another>Hello.</another> </people>
XML Stream Processing // 4GB file, comes back in a secondval src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml")val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader)implicitclass XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{def hasNext = ev.hasNextdef next = ev.nextEvent()}er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach {case (ev, idx) => println(s"${idx+1}:\t$ev") }src.close() | 1: <feed> | 2: | | 3: <doc> | 4: | | 5: <title> | 6: Wikipedia: Anarchism | 7: </title> | 8: | | 9: <url> | 10: http://en.wikipedia.org/wiki/Anarchism
Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT - Interfacing with XML databases - Flexibility to choose the best tool for the job
Excellent Ecosystem SBT Akka Spark Spray Specs scalaz shapeless scala-xml Scaladin ScalaTest macro-paradise scala-maven-plugin JVM
Conclusion - Practical - Practical for XML processing
Where do I start? - atomicscala.com - typesafe.com/activator - scala-lang.org - scala-ide.org - IntelliJ
Matt Stephens Charles Foster
Open to consulting www.scala.contractors Follow us on Twitter: @DinoFancellu @ScalaWilliam @MaffStephens