The first thing to do when building a framework for web applications that fits your own method of working in a way no pre made web application framework can, is to abstract out functions for working with your data storage mechanism. If you don’t need to store data, you can of course skip this step, but in that case your application is probably simple enough that it doesn’t need a framework at all. However, if you do need to store and retrieve data from the server, here’s something to think about:
Are you going to be using a database?
There are two traditional methods for storing data on the server. One is to maintain a bunch of text files (the blogging tool Movable Type does this, for example), the other is to use a database (the blogging tool Wordpress does this, for example, as do most other web applications). Recently some other storage options have become available, such as the S3 storage engine from Amazon. Of course, if your application is a mash up of someone else’s data, your “storage” will be interfaces to the web interfaces or REST interfaces of the providers of that data. Only in the last case are you not going to have a choice in the data storage mechanism you will be using. Assuming your application is not one of those special cases, however, I would suggest you simply use a database. The reason is that databases are mature and well proven tools for storing and retrieving data and it will be difficult to match their versatility and performance in, for example, a flat file storage mechanism you design yourself. The specific database engine you choose, however, is up to you: it doesn’t really make a difference (other than the fact that some cost money and others don’t). Unless you’re dealing with sensitive and / or financial data, in which case you will probably not get to determine which database you’ll be using yourself, I’d say just pick one. MySQL and PostgreSQL tend to be popular choices. For really light weight applications, SQLite might also be an option.
Hide the fact that you’re using a database
Once you’ve chosen a database (or some other storage mechanism), it’s time to think about which operations you wish to perform on your data. Do this thinking in the most general way possible: if you’re deciding you need to perform INSERT queries you should abstract that into “you need to store bits of data”. The reason is that when you’re ready to design the parts of your application that your users will work with, you really don’t want to keep thinking about INSERT queries. What you want is to “store some data”.
Common operations to perform on data are creating bits of data, reading bits of data, updating bits of data and deleting bits of data. A web application that needs to do any more than that is a rare piece of software indeed and unless your application is really, really special (and it probably isn’t), create, read, update, delete (commonly abbreviated to CRUD) is all you need to worry about.
Once you’ve determined that you need to be able to do CRUD, write a module that exposes only those four features. This will be quite a bit of work, as you’ll need to abstract out a lot of the underlying data storage mechanism (connecting to a database server, composing queries, running queries, that sort of thing). From now on, those four features are the only things you’ll ever need to think about when writing code that interacts with your data.
Keep in mind that for reads, updates and deletes you will need to be able to filter your data set. For example, when updating it will be necessary to specify which conditions data items need to match before they’re updated. In practice this means you are going to have to expose more of your data storage mechanism than might seem sensible at first glance. My own data storage abstraction module (which I will not publish here in full as a tiny bit of it was developed on company time), for example, does in fact provide a method for supplying raw SQL code. It does not, however, require me to supply SQL, so for something simple like updating the password of a specific user I can simply do:
Data.update({'name': 'some_name'}, {'password':'some_password'})
In other words, update the “password” field to “some_password” for items where the “name” field has the value “some_name”.
In fact, the data abstraction module I’ve built also requires you to supply the table name (as I feel thinking about stored data in terms of tables and records is, in fact, quite a sensible way to think about such things), meaning the actual code(*) reads something like this:
Data.update('users', {'name': 'some_name'}, {'password':'some_password'})
What this should show you is that your own data storage module should strive to expose nothing but the simple CRUD features and that it should only expose more if not doing so would make your module unusable (as a database abstraction class without a notion of “tables” would be). As far as allowing raw SQL is concerned, I tend to think database abstraction classes that don’t allow this become inordinately complicated as they need to implement their own version of common SQL features such as LIKE clauses. Simply allowing raw SQL when it’s necessary (but not requiring it for the simple operations) is probably best(**).
That’s it for now. The next time I will be using the data storage abstraction to build objects for specific types of data and construct the beginnings of a simple Model View Controller framework.
* The real code is written in Perl as that is what my company uses, but for reasons of clarity – Python code often times reads like pseudo code that even programmers who don’t use Python can understand reasonably well – for the examples in this series I will be using Python
** In fact, my own database abstraction module also provides a function for executing complete queries. This violates the idea that such an abstraction needs to hide the fact that you’re working with a database, but it does allow you to make use of its features should you really need to.
(see also part one and part three of this series)