Wednesday, May 27, 2009

Validating structured data

In an earlier post I discussed how HTML form processing, when sufficiently generalized, leads to processing structured data (by which I mean data in Perl-ish hashes and lists). Here is an example of some structured data:
my $user_record = {
   username => 'Joe Blow',
   occupation => 'Programmer',
   tags => ['Perl', 'programming', 'Moose' ],
   employer => {
      name => 'TechTronix',
      country => 'Utopia',
   options => {
      flags => {
         opt_in => 1,
         email => 0,
      cc_cards => [
            type => 'Visa',
            number => '4248999900001010',
            type => 'MasterCard',
            number => '4335992034971010',
   addresses => [
         street => 'First Street',
         city => 'Prime City',
         country => 'Utopia',
         id => 0,
         street => 'Second Street',
         city => 'Secondary City',
         country => 'Graustark',
         id => 1,
         street => 'Third Street',
         city => 'Tertiary City',
         country => 'Atlantis',
         id => 2,

Here is the HTML::FormHandler form that defines field validators to process that structure:
   package Structured::Form;
   use HTML::FormHandler::Moose;
   extends 'HTML::FormHandler';

   has_field 'username';
   has_field 'occupation';
   has_field 'tags' => ( type => 'Repeatable' );
   has_field 'tags.contains' => ( type => 'Text' );
   has_field 'employer' => ( type => 'Compound' );
   has_field '';
   has_field '';
   has_field 'options' => ( type => 'Compound' );
   has_field 'options.flags' => ( type => 'Compound' );
   has_field 'options.flags.opt_in' => ( type => 'Boolean' );
   has_field '' => ( type => 'Boolean' );
   has_field 'options.cc_cards' => ( type => 'Repeatable' );
   has_field 'options.cc_cards.type';
   has_field 'options.cc_cards.number';
   has_field 'addresses' => ( type => 'Repeatable' );
   has_field 'addresses.street';
   has_field '';
   has_field '';
   has_field '';


The names of the fields are flattened references to the elements of the structure, with special field types for Repeatable and Compound elements. These types of structures can be used to update a database with DBIx::Class (although there are limits, of course).

I've left off the actual validators, but they can be defined pretty easily using Moose types or other constraints.
  has_field 'cc_type' => ( apply => [ CCType ] );
It would probably be better to define some of the fields in a role or field, to keep some of the related validation in the same place. The validation of the credit card numbers depend on the type of the credit card, for example. Error messages can be retrieved from an array of error fields or plain error messages, but there need to be more flexible ways of getting those messages. I'm not exactly sure what people will want yet.

Real Soon Now (tm) I'm going to work on a KiokuDB model... So many programming tasks, so little time.


zby said...

Hmm - I think you omitted here the most interesting part of the FormHandler design that we collaborated on - which is that in the parent fields you have access to the values of child fields because they are processed first. So you can do things like building objects out of attribute lists or doing additional validation for fields that require cross-checking.

mo said...

The example show that HTML::FormHandler is not HTML only. So why did you go with that prefix?

I would prefer a form processor which is independent from the rendering and processing of the input data.

A more generic FormHandler class could have Processors like HTML (for parsing query and body parameters), XML, JSON and so on.

Then the FormHandler's validation process runs and renders the output according to a Render class (e.g. HTML, XML, JSON etc.)

G.Shank said...

The rendering is done in a separate role. The only one provided so far is HTML::FormHandler::Render::Simple. So it's not like FormHandler is creating HTML that you're not going to use... But there are some structural issues with being prepared to render HTML data that flowed through the architecture, and I do think that the main use case is HTML forms or something rather similar. If the data format is something that can be translated to and from similar structured data, it would be fairly trivial to write a couple of methods to do that. The problem with a totally generic validator is that in many cases the more generic you get, the more difficult the programming problem is, and often you don't end up doing anything particularly well. In order to be simple to use, you must make some assumptions about the most common use case. The use case that I chose to focus on was HTML forms. If there's something that fits your needs better, use that.

Anonymous said...
This comment has been removed by a blog administrator.