Groovying XML - Part 1

By Pan Pantziarka

Introducing Groovy and XML

Younger readers may find this hard to believe, but there was a time before XML hell. There was a time of anticipation and hope, when the idea was that XML was going to solve just about every tricky problem in software development and then some. Problems with swapping data between applications or platforms? No problem, XML would fix that. Need to store complex data structures in a portable format? Hey, this XML stuff can sort that for you? Need a registry, dictionary, serialisation format, properties file… You had a problem and XML was the answer.

And growing up at around the same time was this programming language called Java. Partners in crime, Java and XML have advanced and moved on together, but if we're honest it's never been an easy partnership. You've only got to look at the number of parsers for XML common in the Java world: DOM, JDOM, DOM4J, XOM - and that's just the tree models. Processing XML in Java is a common enough requirement, but it can be extremely verbose, requiring plenty of boiler-plate code and all kinds of extraneous scaffolding that obscures what are often very straightforward functions.

Groovy, as befits a sprightly young scripting language for the Java Virtual Machine, can do XML in a way that is relatively free of bloat and allows the developer to focus on the real problem at hand. In this two part tutorial we'll look at how Groovy can help in reading and writing XML documents. Instructions on installing and running Groovy can be had from the home page at http://groovy.codehaus.org/, or else look at the first part of our Groovy and SQL piece here at TechBookReport.

At some time or other we've probably all written code that just uses strings to write out a fragment of XML. It's cheap but clunky, but it saves having to instantiate a DOM tree or any of that fiddly stuff. Groovy's flexible strings allow us to do this very, very simply.

Here's how we create a fragment of XML that contains a list of names each in an element called hello, and each with an order attribute:

names=['john','bill','ted']
x=0
frag=''
names.each {
  x++
  frag+="""<hello order="$x">$it</hello>\n"""
}

Running this code in the GroovyConsole, or dumping it into a file called xml.groovy and running it from the command line produces the following output:

<hello order="1">john</hello>
<hello order="2">bill</hello>
<hello order="3">ted</hello>

But that's only a bunch of strings, there's no XML declaration, and certainly no way it can be processed as XML. If we do need to turn this into a DOM tree, we can do that easily enough too. Let's make our hello elements children of a root element called names and then turn it into a DOM tree which we can interrogate:

def StringReader s=new StringReader("<names>\n" + frag + "</names>")
def xmldoc=groovy.xml.DOMBuilder.parse(s)
def names=xmldoc.documentElement

println names 

use (groovy.xml.dom.DOMCategory){
    println names.hello[0].text()
    println names.hello.size()
    println names.hello[1].attributes.item(0).text()
}

This code spools out our well-formed XML and the results of some simple tree walking:

<?xml version="1.0" encoding="UTF-8"?>
<names>
<hello order="1">john</hello>
<hello order="2">bill</hello>
<hello order="3">ted</hello>
</names>

john
3
2

It should be clear by now that the combination of Groovy's convenience methods (from the groovy.xml.* libraries), iterators and powerful string handling capabilities make for very succinct code for creating XML from relatively straightforward data. While the equivalent Java code would take a lot more typing, (which certainly won't be half as readable and easy to follow), so far there's nothing tremendously taxing here.

However, Groovy includes a very handy feature for handling tree structures: builders. In the same way that Groovy has excellent built-in support for collections - lists and maps - it also comes with support for trees. Builders are perfect for all kinds of tree structures, from HTML to GUI elements to XML.

For a real world example let's say we have a nested structure that we want to export to XML. It could be the results of a query to a MySQL or Apache Derby database, the data from a class hierarchy or some other source. In our example the data relates to a simple personnel database, with a record for each person. We store this as follows:

pers=["john":[surname:"smith",age:37,gender:'m',children:2],
      "jill":[surname:"jones",age:28,gender:'f',children:0]
      ]

Before we dive into the builder, let's remind ourselves of what we can do with Groovy's iterators and closures:

pers.each {name, data ->
  println name + ' ' + data['surname'] + ' is ' + data['age'] + ' years old'
}

That single line of code iterates through each person record, mapping the key value to the name variable, and the map of data to the variable we've cleverly labelled data. We can then address the contents of the data map directly by name. Running that line of code gives the following on the command line:

john smith is 37 years old
jill jones is 28 years old

We're going to do something similar using a groovy.xml.MarkupBuilder object, as shown below:

s_xml=new StringWriter()
builder=new groovy.xml.MarkupBuilder(s_xml)
people=builder.people{
  pers.each{ name, data ->
    person(first_name:name, surname:data['surname']){
      age(data['age']){}
      gender(data['gender']){}
      children('count':data['children']){}
    }
  }
}

println s_xml

This clever little bit of code creates a builder object which writes it's data to the StringWriter variable called s_xml. The builder uses a closure that contains our data source called pers, which uses the each iterator as in the previous example. The magic is in the pers.each closure. Here we use a set of pseudo-methods called person, age, gender and children. These are all turned into XML elements, and the arguments to these pseudo-methods are the values of the elements. If we run the above code we can see the results clearly enough:

<people>
  <person first_name='john' surname='smith'>
    <age>37</age>
    <gender>m</gender>
    <children count='2' />
  </person>
  <person first_name='jill' surname='jones'>
    <age>28</age>
    <gender>f</gender>
    <children count='0' />
  </person>
</people>

No wonder the language is called Groovy. We can even spool that out to file in a few lines of code as well:

str=s_xml.toString()
def fw= new FileWriter('pers.xml')
'<?xml version="1.0"?>\n'.each{fw.write(it)}
s_xml.toString().each{fw.write(it)}
fw.close()

Anyone who's ever had to write code to get complex, hierarchical data out into XML will recognise that this is a very easy and natural way to go about navigating through the data and organising it into the required format.

In part two of this series we'll be turning out attention to the other side of the equation - reading in XML and querying or transforming it.

>>Next Page: Reading XML

TechBookReport Tutorials