was kind enough to outline the basic steps needed:
So the general idea is to use StAX to read the first element of the XML file, determine the version of the file from that element and then act on it appropriately. We had this PortalDataKey concept that we would parse out of each XML file. From that we'd first go look in a Map<PortalDataKey, DataUpgrader> to see if there was a registered DataUpgrader implementation for that version of the XML file. If there was we would run it on the XML file then repeat the parsing of the first element and looking for another DataUpgrader. Once we didn't find a DataUpgrader we'd look up the DataImporter that was mapped to the PortalDataKey for the current version of the XML and run it on the contents.
This is a much more generalized solution as it is was designed to batch parse, update and import potentially millions of XML files across 20+ different schemes from 4+ versions.
What the process breaks down to is:
Parse the first bit of the XML document using StAX. StAX is used here as it only reads as much of the XML file as you request and can easily have validation disabled which you want for the first step.
Set up the Woodstox StAX parser for reading the first element: PortalDataKeyFileProcessor.java#L63
Read the first element and parse the type and version information from it: PortalDataKeyFileProcessor.java#L104
This code uses a bunch of custom StAX utility code that exists in org.jasig.portal.xml primarily to try and avoid ever reading from a file more than absolutely necessary. Not sure you'll need all of this extra logic since you're not parsing thousands or millions of files. You can simply re-create the StAX stream which is much simpler.
Update the XML if needed
Most updates can be done via XSLT if you're comfortable with it
Example of an XSLT based updater using StAX: XsltDataUpgrader.java
Example XSLT files for chained upgrades of an XML file: https://github.com/Jasig/uPortal/tree/master/uportal-war/src/main/resources/org/jasig/portal/io/xml/portlet
Example XML files for those XSLTs: https://github.com/Jasig/uPortal/blob/master/uportal-war/src/test/resources/org/jasig/portal/io/xml/portlet/
Look at the files whose name starts with test-portlet-1
You can see the upgrade path from 26 to 30 to 31 to 32 to 40
You now have the XML in the most recent format and do whatever validation you would do on it otherwise.
The thing we got a lot of mileage from this process is that as you make changes to the XML file format you only ever have to worry about creating an XSLT to go from the most recent released XML format to the new XML format. It is also easy to write unit tests as weird cases come along to verify the XSLT for each version change is handling things correctly.
Some of this we already have in place, like the move to StAX and peeking at the version.