Saturday, January 7, 2012

Simplify with an XML data model - Part 4


Part 4: Client-side data validation using Schematron.
Data validation prevents data corruption and security risks. This often goes beyond simple data type checking of strings and integers, and can consist of complex business rules such as “email address must be valid” or “if the user does not enter a zip code, they must enter a city and state”.

Data must be validated on the server side no matter what, since rich javascript applications can be hacked and manual http requests can be made to the server. Data should also be validated on the client before it is sent to the server. This makes for a better user experience where errors are surfaced to the user before they are sent to the server. Client validation takes load off of the server and makes for a more responsive application. Because it is a nice thing to do validation on both the client and the server, it would be ideal to define these validation rules once and use them in both places.

Interactive example 4:


Since we are using XML as our data model, it makes sense to use XML based validation. The Schematron consists of assertions written in XPath and tied to a context node. Here is an example schematron that we will use to validate an address in an html form:

<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
   
xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:sch="http://www.ascc.net/xml/schematron"
   
queryBinding='xslt1' schemaVersion="ISO19757-3">
   
<iso:title>Test ISO schematron file. Introduction mode</iso:title>
   
<iso:pattern>
       
<iso:rule context="address/street">
           
<iso:assert test="text()">Street is required</iso:assert>
       
</iso:rule>
       
<iso:rule context="address/state">
           
<iso:assert test="text() or ../zip/text()">State is required if no zipcode is entered</iso:assert>
       
</iso:rule>
       
<iso:rule context="address/zip">
           
<iso:assert test="string-length()=0 or (number(text())>0 and number(text())&lt;99999 and string-length()=5)">Zip Code Invalid</iso:assert>
       
</iso:rule>
   
</iso:pattern>
</iso:schema>

Pretty simple and intuitive, right? Each rule contains a context which evaluates to a node, and some assertions about that context. This is truly a great thing, even XSD borrowed some of these concepts for their new 1.1 release!

So now that we defined some rules, how do we use it? The answer is pretty cool. The reference implementation allows each Schematron schema to be transformed via XSLT into a new XSL stylesheet. You then apply that new stylesheet against your XML document and your validation report is spit out. So you use XSLT to transform your validation rules into a new XSL stylesheet that you then use to transform your data into a report.

The XSL that transforms your schema is available here (it’s an ISO standard by the way). Download the iso-schematron-xslt1.zip which contains the XSLT files you will need. The output of the stylesheet is something called Schematron Validation Report Language or SVRL.

Here’s an example of the output:

<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                       
xmlns:xs="http://www.w3.org/2001/XMLSchema"
                       
xmlns:schold="http://www.ascc.net/xml/schematron"
                       
xmlns:sch="http://www.ascc.net/xml/schematron"
                       
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                       
title="Test ISO schematron file. Introduction mode"
                       
schemaVersion="ISO19757-3"><!--   
          
          
        --><svrl:active-pattern/>
  
<svrl:fired-rule context="address/street"/>
  
<svrl:fired-rule context="address/state"/>
  
<svrl:fired-rule context="address/zip"/>
  
<svrl:failed-assert test="number(text())&gt;0 and number(text())&lt;99999 and string-length()=5"
                      
location="/root/address/zip">
     
<svrl:text>Zip Code Invalid</svrl:text>
  
</svrl:failed-assert>
</svrl:schematron-output>

Using this information, we can parse the results, look for any failed assertions, and link back to the offending node using the location attribute. 


Simply use your favorite XSLT engine to apply the iso_svrl_for_xslt1.xsl downloaded from the Schematron site against your schema. I used the Eclipse XSL Tools to do my transform. There are many others out there.



Now that we have generated our stylesheet that outputs SVRL, we need to update our javascript code to transform the XML model on the fly and parse the results.

The first thing we’ll do is load our validation stylesheet into a XSLTProcessor object with the help of Sarissa:

   var xmlhttp = new XMLHttpRequest();  
   xmlhttp
.open("GET", "example4-validation.sch.xsl", false);  
   xmlhttp
.send('');  
   
var validationXslt = new XSLTProcessor();
   validationXslt
.importStylesheet(xmlhttp.responseXML);

We will also need to store a mapping from the node path to the actual HTML form element so that we can display validation errors on the correct HTML element:

  var bindings = {};
  ...
  //Binds an element to the corresponding XPath expression
   
function bind(element, xpath, xmlDom) {
       
// Save a reference from xpath to input
       bindings
[xpath] = element;
...

The next thing we’ll do is wrap each form field in a new element so that we have a placeholder to put validation information. We’ll do this in the bind function.

  //Binds an element to the corresponding XPath expression
   
function bind(element, xpath, xmlDom) {
       ...
       
//wrap the element for validation messages
       wrap
(element);
   ...

  function wrap(element){
       
// Wrap the input in a div for error output
       
var parent = element.parentNode;
       
var wrapperDiv = document.createElement("div");
       wrapperDiv
.id = "__wrapped_" + element.name;
       parent
.insertBefore(wrapperDiv, element);
       wrapperDiv
.appendChild(parent.removeChild(element));
   
}


The last thing we need is the meat and potatoes: the transformation of the model, parsing of the SVRL, and display of the results. Let’s first add a call to do the validation when a model value is set:



  //Binds an element to the corresponding XPath expression
   
function bind(element, xpath, xmlDom) {
   ...
       
// Method to update and pretty print the model
       
var setModelValue = function() {
           setXmlValue
(element, xpath, xmlDom);
           doValidation
(xmlDom);




Finally we can add the validation method:

  function doValidation(xmlDom){
       
// Transform the xml model with the schematron stylesheet
       resultDocument
= validationXslt.transformToDocument(xmlDom);

       
// clear current errors
       
for ( var i in bindings) {
           
var parent = bindings[i].parentNode;
           
while (parent.childNodes.length > 1) {
               parent
.removeChild(parent.childNodes[1]);
           
}
           parent
.className = "parent-validation-ok";
       
}

       
// add any new errors
       
var failures = resultDocument.getElementsByTagNameNS('*', 'failed-assert');
       
for ( var i = 0; i < failures.length; i++) {
           
var location = failures[i].attributes["location"].value;
           
var failedInput = bindings[location];
           
if(failedInput==null){
               alert
('no mapings for '+location+' were found');
               
continue;
           
}
           
var error = document.createElement("div");
           error
.className = "validation-error";
           
for ( var j = 0; j < failures[i].childNodes.length; j++) {
               
if (failures[i].childNodes[j].nodeType == 1) {
                   error
.innerHTML = failures[i].childNodes[j].textContent;
                   
break;
               
}
           
}
           failedInput
.parentNode.appendChild(error);
           failedInput
.parentNode.className = "parent-validation-error";
       
}

   
}




Sweet! Now we can validate the client form in real time, and use the same validation on the server.

One major limitation of this approach is the lack of support of regular expressions. Regular expressions are one of the most powerful and useful tools for validation. Unfortunately, Webkit does not support the XPath matches function. Firefox and IE(with some tricks) do. 


There seems to be zero interest in updating the XSLT processors in any of the modern browsers to 2.0. Hopefully this will change in the future as I feel this approach to data binding and validation is very powerful, and makes for easy, standards based development. One solution might be to use javascript port of an XSLT engine.


Anyway, I hope this series was useful, please leave a comment!



You can view this example here.
As always you can download all the source for these examples as a zip.
Remember you will need the Sarrissa library.

2 comments:

  1. Daniel,

    Very interesting. I did some work last year hooking-up Schematron validation in XForms using the XSLTForms implementation.


    Basically, the, original, goal I had in mind was to be able to load a Schematron schema into an XForms instance and through a series of daisy-chained transforms, using the ISO Schematron transforms, be able to compile the resulting XSLT that could then be applied to the primary instance data in the form. Having an SVRL report document as the result would allow me to render the validation report.

    The realities when using XSLTForms (using beta 3 last year) where that the output of XSLTForms’ transform function was not an XML DOM but a serialisation, of the transform result, as a string. Not massively helpful and this made it impossible to daisy-chain any transforms. So, the limits of the work so far are:

    * Pre-compile the Schematron schema external to the form.
    * Apply schema transform to the primary instance data.
    * Format the plain-text result (not SVRL) as best one can.


    Regards

    Philip

    ReplyDelete
  2. Thanks for your comment Phillip!
    I have not yet looked into XForms very deeply. I am curious about how active the community is.

    The beauty I gleamed from Schematron was that using declarative assertions to define validation makes for concise, maintainable code. The other benefit is that XPath is one of the easier ways to work with a data model. The tools that one uses to transform a set of assertions into a set of validation results could be implemented in anything. PHP or Javascript work just as well as XSLT for validation, and the result doesn't need to be exclusively SVRL to be useful.

    ReplyDelete