Thursday, March 04, 2010

tidy Microsoft XML on Linux

currently I am working on a project which involves a lot of XML checking and coding. Now is XML a very structured language and can be read very easily however the issue is that I have to work with the XML form a docx document on a Linux machine. The Microsoft Office Open XML Format is very bad readable as they have all the information in the document without linebreaks which makes it very hard to understand. After testing some tools to tidy the XML I found out that the best option (in my opinion) is to use xmlindent which is a XML stream reformatter created by Pekka Enberg and documented by Thomas Fischer.

Below you can see a example of how to use this to format the Microsoft Office Open XML Format from a Linux command-line:


jolouwer@clusterbox056:/tmp$
jolouwer@clusterbox056:/tmp$ cat test1.xml | xmlindent -o test1.new.xml -f
jolouwer@clusterbox056:/tmp$