Tuesday, May 12, 2009

Defining the form processing problem

This week I've been discussing the goals of a form processor and working on adding support to HTML::FormHandler for multiple rows. When I first started doing web programming and learning Perl 18 months ago, I looked for a module that did what I wanted, and thought "it can't be that hard". Once I got into it, the problem started to look more and more complex. You need to map different representations of data onto each other, each level having different kinds of relationships to each other, and yet maintain the accessibility of the information from multiple representations.

Let's start with a Perl-ish structure of hashes and lists:

   addresses => [
         city => 'Middle City',
         country => 'Graustark',
         address_id => 1,
         street => '101 Main St',
         city => 'DownTown',
         country => 'Utopia',
         address_id => 2,
         street => '99 Elm St',
         city => 'Santa Lola',
         country => 'Grand Fenwick',
         address_id => 3,
         street => '1023 Side Ave',
   'occupation' => 'management',
   'user_name' => 'jdoe',

This structure represents a user with a user name, an occupation, and an array of addresses. This example includes an array of hashrefs, because that's what I'm working on this week... The first problem is that this structure can't be directly represented in CGI/HTTP parameters, since they don't do nested hashrefs. So in order to get this structure into and out of an HTML form, we flatten it into a hashref with names that can be munged by something like CGI::Expand:
my $params = {
   'addresses.0.city' => 'Middle City',
   'addresses.0.country' => 'Graustark',
   'addresses.0.address_id' => 1,
   'addresses.0.street' => '101 Main St',
   'addresses.1.city' => 'DownTown',
   'addresses.1.country' => 'Utopia',
   'addresses.1.address_id' => 2,
   'addresses.1.street' => '99 Elm St',
   'addresses.2.city' => 'Santa Lola',
   'addresses.2.country' => 'Grand Fenwick',
   'addresses.2.address_id' => 3,
   'addresses.2.street' => '1023 Side Ave',
   'occupation' => 'management',
   'user_name' => 'jdoe',
A corollary of this is that the form processing program should be able to take in structures of either type, and output at least the flattened structure so that it can be used to fill in the form with current data. Then we consider where the initial data is going to come from. Often the data is in a database, so now we have the problem of taking a database object, like a 'user' row with a relationship pointing to a number of addresses, and convert it to the flat CGI hash. And we also want to go in the opposite direction, converting a flat CGI hash into a structure suitable for putting back into the database. The data in a database (or other data soruce) isn't necessarily in a form suitable for displaying as strings in an HTML form. So there are inflation and deflation steps. The database structure and data must be mapped to a CGI structure.

Then there's the question of how to define the validators which are the main purpose of this exercise. The data that's input from the parameters passed in must be validated (and/or inflated). If there are errors, the program has to present that information in such a way that an HTML form can be constructed with the errors presented to the user for correction.

There are a number of choices to be made about how to define the fields to allow these validations and conversions to happen in a simple and regular fashion. A common solution is to treat the nested elements as subforms, but they are not actually separate forms, they are simply ... nested elements.

The way that feels best to me is to allow the definition to be done in one 'form' class, which represents one HTML form.
   package HasMany::Form::User;
   use HTML::FormHandler::Moose;
   extends 'HTML::FormHandler::Model::DBIC';

   has_field 'user_name';
   has_field 'occupation';

   has_field 'addresses' => ( type => 'Repeatable' );
   has_field 'addresses.address_id' => ( type => 'PrimaryKey' );
   has_field 'addresses.street';
   has_field 'addresses.city';
   has_field 'addresses.country';
This flat representation matches the flatness of the HTML form. The field names with dots give information to allow the creation of nested elements. The field names are also related to the database object, where 'addresses' is the DBIC relationship accessor, and street/city/country are columns in the address table. In practice there would be more to these field definitions, since there would be validators associated with them, but I'll leave them out for now to simplify the problem.

Constructing the arrays is tricky. The form object doesn't know how many elements are in the array until it is handed the information from the database or the parameters from the form. So the arrays of address fields must be cloned from the fields that have been defined and put into some structure to hold the definitions and the data. There is a choice of structures here. We can either match the
 'addresses.1.country' => 'Utopia' 
format, or match the
 { addresses => []}
structure. These structures have different numbers of levels, since we have to add the ".1." level to indicate the array. It could be set up either way and mapped to the other. For the purposes of constructing HTML, however, you want to have some place to act as a container for an individual address so that it can be wrapped in a div, so the structure with the numbered level seems more useful. So now the 'HasMany' field container will create an array of field container objects (instances) that contain an address record.

Once constructed and filled, the nested fields can be accessed with
or using the shortcut method
. There's something awkward about this, because it's oddly modal. The field structures are different depending on whether the form has been filled out with data or not. The implementation which I have working right now has an array of fields (the same as other non-has_many compound fields) which is cloned into subfields which are created on the fly. It would be possible, I suppose, to have a dummy subfield to contain the field definitions. I'll have to think about that one...

In order to interface with the database object in a regular, MVC-ish way, the form program should output structured data that can be saved by the database model. Inflations may be associated with this.

So in the end, it seems like what you end up with is program which will take structured data, process it and validate it, and return structured data. This is a much more general problem than it first appears when "all" you want to do is process an HTML form.


dami said...


I'm working on an application that frequently needs structured data (will be presented at http://yapceurope2009.org/ye2009/talk/1897). To do this we assembled the following toolbox :
- CGI::Expand (flat HTTP <=> structured tree)
- Data::Domain (tree validation)
- Alien::GvaScript (client-side filling form from structured tree)
- DBIx::DataModel (nested insert into database)

These are all on CPAN if you are interested.

Laurent Dami

Anonymous said...

Forms Processing refers to a service where data from different fields of entry is extracted and converted into electronic formats. This electronic data can be stored in a secure location and accessed from multiple locations. Our forms processing services can help you store large volumes of critical data efficiently and securely.

ranjini said...

Hi there, awesome site. I thought the topics you posted on were very interesting. I tried to add your RSS to my feed reader and it a few. take a look at it, hopefully I can add you and follow.

Form Processing Services

Stalla Johns said...


Form Processing is a very expensive and time-consuming task. It involves form design, data capture, handwritten information recognition, human intervention, transport, and storage.

Thanks for the wonderful structural explanation about Form Processing problem.

manoj S said...

My cousin recommended this blog and she was totally right keep up the fantastic work!

Form Processing Services

Anonymous said...

Hey, nice site you have here! Keep up the excellent work!

Form Processing

Mukesh Patel said...

Ask Datatech is a highly experienced, full service data entry and back office service provider based in Ahmadabad, India with its branches in USA, Australia and UK. Our eminent professional services cover the entire dimensions of Data entry Services, Data Processing Services, Data Conversion Services, Form Processing Services, Web and Internet Research Services, Scanning, OCR.

Our global client base includes clients in many time zones and geographic locations including the United States, United Kingdom, Australia, Canada, Japan, Israel, the Netherlands, New Zealand, France and Hong Kong.

Scott Armstrong said...

Data Entry India is a ISO 9001:2008 certified company and provider of Data Entry, Data Conversion, Data Processing, Web Scraping, Data Entry Specialist, Excel Data Entry Representative, Data Extraction Magento and e commerce products Data Entry and Catalog Data Entry and back office services based in Ahmadabad, India. Our services transcend geographical boundaries as our clients are based in United States of America (USA), Canada, United Kingdom (UK), Spain, Australia (AUS), New Zealand (NZ), Switzerland, France, Netherlands, India, Israel, Japan, Hong Kong and Singapore. They include insurance companies, Universities in USA, UK, Canada, Schools, educational institutions, marketing firms, Magento, Ecommerce , WooCommerce, Shopify, BigCommerce products Data Entry Services, media companies, medical research institutions, retail businesses and trade associations just to mention a few.