Wednesday, May 27, 2009

Validating structured data

In an earlier post I discussed how HTML form processing, when sufficiently generalized, leads to processing structured data (by which I mean data in Perl-ish hashes and lists). Here is an example of some structured data:
my $user_record = {
   username => 'Joe Blow',
   occupation => 'Programmer',
   tags => ['Perl', 'programming', 'Moose' ],
   employer => {
      name => 'TechTronix',
      country => 'Utopia',
   },
   options => {
      flags => {
         opt_in => 1,
         email => 0,
      },
      cc_cards => [
         {
            type => 'Visa',
            number => '4248999900001010',
         },
         {
            type => 'MasterCard',
            number => '4335992034971010',
         },
      ],
   },
   addresses => [
      {
         street => 'First Street',
         city => 'Prime City',
         country => 'Utopia',
         id => 0,
      },
      {
         street => 'Second Street',
         city => 'Secondary City',
         country => 'Graustark',
         id => 1,
      },
      {
         street => 'Third Street',
         city => 'Tertiary City',
         country => 'Atlantis',
         id => 2,
      }
   ]
};

Here is the HTML::FormHandler form that defines field validators to process that structure:
{
   package Structured::Form;
   use HTML::FormHandler::Moose;
   extends 'HTML::FormHandler';

   has_field 'username';
   has_field 'occupation';
   has_field 'tags' => ( type => 'Repeatable' );
   has_field 'tags.contains' => ( type => 'Text' );
   has_field 'employer' => ( type => 'Compound' );
   has_field 'employer.name';
   has_field 'employer.country';
   has_field 'options' => ( type => 'Compound' );
   has_field 'options.flags' => ( type => 'Compound' );
   has_field 'options.flags.opt_in' => ( type => 'Boolean' );
   has_field 'options.flags.email' => ( type => 'Boolean' );
   has_field 'options.cc_cards' => ( type => 'Repeatable' );
   has_field 'options.cc_cards.type';
   has_field 'options.cc_cards.number';
   has_field 'addresses' => ( type => 'Repeatable' );
   has_field 'addresses.street';
   has_field 'addresses.city';
   has_field 'addresses.country';
   has_field 'addresses.id';

}

The names of the fields are flattened references to the elements of the structure, with special field types for Repeatable and Compound elements. These types of structures can be used to update a database with DBIx::Class (although there are limits, of course).

I've left off the actual validators, but they can be defined pretty easily using Moose types or other constraints.
  has_field 'cc_type' => ( apply => [ CCType ] );
It would probably be better to define some of the fields in a role or field, to keep some of the related validation in the same place. The validation of the credit card numbers depend on the type of the credit card, for example. Error messages can be retrieved from an array of error fields or plain error messages, but there need to be more flexible ways of getting those messages. I'm not exactly sure what people will want yet.

Real Soon Now (tm) I'm going to work on a KiokuDB model... So many programming tasks, so little time.

Tuesday, May 19, 2009

Complexity happens

I've heard a programmer's job described as 'managing complexity'. People who like programming tend to like other complex systems, like, say D&D. (They also like fantasy in general... no doubt there's some clever comment there that my cold-fogged brain can't work out.)

And yet, one of the primary goals in a programming project is to keep it simple. (Otherwise known as KISS.) So there's this continual tension between simplicity and complexity. Simplicity in one area may require complexity in another. Creating a simple, easy-to-use API often requires more underlying complexity than a non-intuitive but straightforward interface. Though there are occasionally golden moments when things fall into place and you can achieve both greater simplicity of interface AND greater code simplicity. Just don't hold your breath waiting for them...

Problems that seem simple to start with acquire complexity when you add features, when you handle more use cases. Simple, stupid things like the fact that you don't get anything in CGI parameters for an un-checked checkbox introduce irregularities that flow through to surprising corners of code. Decisions made about what it means to not have a particular parameter or have it set to empty cascade through formerly pristine and clean lines of code.

I'm not quite sure whether this is a complaint or simply a report. Sometimes the logical complexity is fascinating. You poke something to see what happens; you try some new way of factoring to see if that magical moment of greater order occurs... And then sometimes you can hardly stay awake and certainly can't concentrate, and you find yourself surfing Amazon for some new fantasy novel that's a lot more your speed.

Or desperately trying to think of something to write that's somehow remotely related to Perl programming. Because you foolishly committed to DOING THAT in a moment of insanity.

Let me just repeat that key phrase a couple of more times, in case foolish sites that count dumb things and claim they mean something aren't paying attention: Perl programming Perl programming Perl programming

Thanks. I'm done blathering now. I think I've got enough of a word count.

Tuesday, May 12, 2009

Defining the form processing problem

This week I've been discussing the goals of a form processor and working on adding support to HTML::FormHandler for multiple rows. When I first started doing web programming and learning Perl 18 months ago, I looked for a module that did what I wanted, and thought "it can't be that hard". Once I got into it, the problem started to look more and more complex. You need to map different representations of data onto each other, each level having different kinds of relationships to each other, and yet maintain the accessibility of the information from multiple representations.

Let's start with a Perl-ish structure of hashes and lists:

{
   addresses => [
      {
         city => 'Middle City',
         country => 'Graustark',
         address_id => 1,
         street => '101 Main St',
      },
      {
         city => 'DownTown',
         country => 'Utopia',
         address_id => 2,
         street => '99 Elm St',
      },
      {
         city => 'Santa Lola',
         country => 'Grand Fenwick',
         address_id => 3,
         street => '1023 Side Ave',
      },
   ],
   'occupation' => 'management',
   'user_name' => 'jdoe',
}

This structure represents a user with a user name, an occupation, and an array of addresses. This example includes an array of hashrefs, because that's what I'm working on this week... The first problem is that this structure can't be directly represented in CGI/HTTP parameters, since they don't do nested hashrefs. So in order to get this structure into and out of an HTML form, we flatten it into a hashref with names that can be munged by something like CGI::Expand:
my $params = {
   'addresses.0.city' => 'Middle City',
   'addresses.0.country' => 'Graustark',
   'addresses.0.address_id' => 1,
   'addresses.0.street' => '101 Main St',
   'addresses.1.city' => 'DownTown',
   'addresses.1.country' => 'Utopia',
   'addresses.1.address_id' => 2,
   'addresses.1.street' => '99 Elm St',
   'addresses.2.city' => 'Santa Lola',
   'addresses.2.country' => 'Grand Fenwick',
   'addresses.2.address_id' => 3,
   'addresses.2.street' => '1023 Side Ave',
   'occupation' => 'management',
   'user_name' => 'jdoe',
};
A corollary of this is that the form processing program should be able to take in structures of either type, and output at least the flattened structure so that it can be used to fill in the form with current data. Then we consider where the initial data is going to come from. Often the data is in a database, so now we have the problem of taking a database object, like a 'user' row with a relationship pointing to a number of addresses, and convert it to the flat CGI hash. And we also want to go in the opposite direction, converting a flat CGI hash into a structure suitable for putting back into the database. The data in a database (or other data soruce) isn't necessarily in a form suitable for displaying as strings in an HTML form. So there are inflation and deflation steps. The database structure and data must be mapped to a CGI structure.

Then there's the question of how to define the validators which are the main purpose of this exercise. The data that's input from the parameters passed in must be validated (and/or inflated). If there are errors, the program has to present that information in such a way that an HTML form can be constructed with the errors presented to the user for correction.

There are a number of choices to be made about how to define the fields to allow these validations and conversions to happen in a simple and regular fashion. A common solution is to treat the nested elements as subforms, but they are not actually separate forms, they are simply ... nested elements.

The way that feels best to me is to allow the definition to be done in one 'form' class, which represents one HTML form.
   package HasMany::Form::User;
   use HTML::FormHandler::Moose;
   extends 'HTML::FormHandler::Model::DBIC';

   has_field 'user_name';
   has_field 'occupation';

   has_field 'addresses' => ( type => 'Repeatable' );
   has_field 'addresses.address_id' => ( type => 'PrimaryKey' );
   has_field 'addresses.street';
   has_field 'addresses.city';
   has_field 'addresses.country';
This flat representation matches the flatness of the HTML form. The field names with dots give information to allow the creation of nested elements. The field names are also related to the database object, where 'addresses' is the DBIC relationship accessor, and street/city/country are columns in the address table. In practice there would be more to these field definitions, since there would be validators associated with them, but I'll leave them out for now to simplify the problem.

Constructing the arrays is tricky. The form object doesn't know how many elements are in the array until it is handed the information from the database or the parameters from the form. So the arrays of address fields must be cloned from the fields that have been defined and put into some structure to hold the definitions and the data. There is a choice of structures here. We can either match the
 'addresses.1.country' => 'Utopia' 
format, or match the
 { addresses => []}
structure. These structures have different numbers of levels, since we have to add the ".1." level to indicate the array. It could be set up either way and mapped to the other. For the purposes of constructing HTML, however, you want to have some place to act as a container for an individual address so that it can be wrapped in a div, so the structure with the numbered level seems more useful. So now the 'HasMany' field container will create an array of field container objects (instances) that contain an address record.

Once constructed and filled, the nested fields can be accessed with
 $form->field('addresses')->field('1')->field('city')->value
or using the shortcut method
 $form->field('addresses.1.city')->value 
. There's something awkward about this, because it's oddly modal. The field structures are different depending on whether the form has been filled out with data or not. The implementation which I have working right now has an array of fields (the same as other non-has_many compound fields) which is cloned into subfields which are created on the fly. It would be possible, I suppose, to have a dummy subfield to contain the field definitions. I'll have to think about that one...

In order to interface with the database object in a regular, MVC-ish way, the form program should output structured data that can be saved by the database model. Inflations may be associated with this.

So in the end, it seems like what you end up with is program which will take structured data, process it and validate it, and return structured data. This is a much more general problem than it first appears when "all" you want to do is process an HTML form.

Monday, May 4, 2009

Moose beginners: clear, predicate, and triggers

One of the nice things about Moose is that it adds another state for your instance variables -- whether or not the variable is actually set. With standard Perl variables you can check whether the variable is defined or undefined and true or false. In Moose the state of being undefined is different from the state of being set. In order to take advantage of this additional state you need to use the 'clearer' and 'predicate' methods for your attribute:
   has 'my_var' => ( isa => 'Str|Undef', is => 'rw', 
          clearer => 'clear_my_var',
          predicate => 'has_my_var' );
Setting 'my_var' to 'undef' is different than doing 'clear_my_var'. If you set it to undefined:
   $my_obj->my_var(undef);
then the predicate 'has_my_var' will return true. If you check for truth in the usual way:
   if( $my_obj->my_var ) { ... }
false will be returned for both the case where the attribute has been set to undefined and has been cleared. So you have to to use the predicate method:
   if( $my_obj->has_my_var ) { ... }
The predicate method will return true if 'my_var' has been set to undefined, and false if 'my_var' has been cleared.
An important piece of related behavior is that a trigger on an attribute is called when you set it, whether or not you are setting it to undefined, but the trigger is not called when you do a clear. So if you have an object in which only one of two variables should have a value, you can create triggers for both of them and use clear to un-set the other variable.
   has 'my_var' => ( isa => 'Str', is => 'rw',
           clearer => 'clear_my_var',
           predicate => 'has_my_var',
           trigger => sub { shift->clear_my_other_var }
   );
   has 'my_other_var' => ( isa => 'Str', is => 'rw',
           clearer => 'clear_my_other_var,
           predicate => 'has_my_other_var',
           trigger => sub { shift->clear_my_var }
   );
If you tried to set 'my_var' to undef in the trigger, you would end up in an infinite recursion, since each attempt to set the other variable would cause the trigger in that variable to fire. Another issue is whether or not you should allow a particular attribute to be set to undefined. Some attributes may need an explicit undefined state, in which case you must set your isa to 'Str|Undef', but if you don't actually need an undefined state then you are better off not allowing it and using a predicate, in which case you must clear the variable to un-set it since setting it to undef will fail.