Use an "upgrade" approach to validate and bind (JAXB) mapping XML

Description

was kind enough to outline the basic steps needed:

So the general idea is to use StAX to read the first element of the XML file, determine the version of the file from that element and then act on it appropriately. We had this PortalDataKey concept that we would parse out of each XML file. From that we'd first go look in a Map<PortalDataKey, DataUpgrader> to see if there was a registered DataUpgrader implementation for that version of the XML file. If there was we would run it on the XML file then repeat the parsing of the first element and looking for another DataUpgrader. Once we didn't find a DataUpgrader we'd look up the DataImporter that was mapped to the PortalDataKey for the current version of the XML and run it on the contents.

This is a much more generalized solution as it is was designed to batch parse, update and import potentially millions of XML files across 20+ different schemes from 4+ versions.

What the process breaks down to is:

  • Parse the first bit of the XML document using StAX. StAX is used here as it only reads as much of the XML file as you request and can easily have validation disabled which you want for the first step.

    • Set up the Woodstox StAX parser for reading the first element: PortalDataKeyFileProcessor.java#L63

    • Read the first element and parse the type and version information from it: PortalDataKeyFileProcessor.java#L104

    • This code uses a bunch of custom StAX utility code that exists in org.jasig.portal.xml primarily to try and avoid ever reading from a file more than absolutely necessary. Not sure you'll need all of this extra logic since you're not parsing thousands or millions of files. You can simply re-create the StAX stream which is much simpler.

  • Update the XML if needed

  • You now have the XML in the most recent format and do whatever validation you would do on it otherwise.

The thing we got a lot of mileage from this process is that as you make changes to the XML file format you only ever have to worry about creating an XSLT to go from the most recent released XML format to the new XML format. It is also easy to write unit tests as weird cases come along to verify the XSLT for each version change is handling things correctly.

Some of this we already have in place, like the move to StAX and peeking at the version.

Activity

Show:

Steve Ebersole February 8, 2014 at 12:49 AM

Calling this done. Still have work to do under HHH-8893, but validation is all handled

Steve Ebersole February 5, 2014 at 1:57 PM
Edited

On item of note. I think that it will better to use simple replacement wrapping/delegation for handling different JPA ORM Schema versions, rather than a full blown XSLT to essentially transform one (or 2[1]) elements as part of the StAX pipeline. Eric kindly contributed code for handling that as well.

I am a little nervous about this because "upgrading" to 2.1 (or to our new "combined" XSD) we would need to adjust namespaces, and I have had trouble with this approach and introducing namespaces. IIRC the issue is that each built element/attribute representation encodes its namespace as resolved initially, not after I alter the root. I assume the same issue would present itself with changing a namespace. http://stackoverflow.com/questions/10653416/stax-and-namespaces is the issue I saw originally trying to play with namespaces (see the answer and my response).

[1] upgrading 1.0 requires we handle mapping-metedata-complete in addition to the version element

Fixed

Details

Assignee

Reporter

Time tracking

1.37h logged

Components

Fix versions

Priority

Created January 24, 2014 at 2:45 PM
Updated May 5, 2022 at 11:09 AM
Resolved February 8, 2014 at 12:49 AM