After encountering difficulties installing CouchDB on Webfaction (Still a work in progress - let me know if you've had success!) I decided to try and back port some of my CouchDB refactor to the Django ORM. I'd not really used any of the model inheritance functionality, so assumed it would follow the classical pythonic inheritance model. Unfortunately it seems that instead the Django team have decided to use a lesser form using a 'has a' relationship instead of the traditional 'is a'.

What this means in practice is that subclasses, while accessible through the managers of their superclasses, are not first class citizens. Thus, the example in the django documentation:


>>> p = Place.objects.filter(name="Bob's Cafe")
# If Bob's Cafe is a Restaurant object, this will give the child class:
>>> p.restaurant
<Restaurant: ...>

Here you can see that, rather than restaurant being a specialised version of place, it is in fact a separate entity with a one to one mapping between the two. This, I guess, was the reason behind the team choosing to implement this form of inheritance - it maps to the RDBS model very smoothly. It is also the reason I fundamentally disagree with it - ORM's are designed to abstract the database to make the interface more like the language. In this situation, the ORM has chosen to be less pythonic in order to keep the backend implementation more simple.

The consequences are quite annoying. What I want is to have a generic, but not abstract, model, and certain specialisations of this model. I wish to be able to override the subclasses to provide different fuctionality. Specifically - I wish to have a generic Article class that will store postings to my website, but subclass it when further funcionality is required, for example when posting a photo. In this situation, the content attribute has a different semantic meaning. I wish to generate the content from the other attributes in the case of the Photo.

Unfortunately, with the way inheritance is implemented in Django, I cannot simply override the content property in my Photo model. Because the extra data is only stored in the Photo table, I either have to check whether each instance is a photo in all of my views so I can use the relationship to access the method (which violates the duck typing principle) or I have to have Photo specific logic within my Article model (violating encapsulation). It is unfortunate that this is the way it has been implemented. I'm going to try and find a workaround, because neither of these situations are satisfactory.

Permalink
Posted by Peter Braden. — Modified 23/01/2009 (6 comments) Tagged: code django

Moving to CouchDB

10th of January 2009

As I briefly mentioned before, I've been moving this website over to CouchDB, after Mikeal sold me on how awesome it is. Unfortunately, as it's still such a new database, and as it approaches the database problem from such a different angle, it's not supported by Django, my framework of choice for web development.

Several people have moved their Django sites across though, and I wasn't about to be left behind. What immediately struck me was how well the CouchDB architecture fits with Django - a CouchDB Document is simply a collection of key value pairings, which is pretty much how Python stores attributes in a object or dictionary. It's trivial therefore to simply marshall your python objects into CouchDB documents. Although CouchDB is a schemaless database, it is still useful to have some notion of classes or types. The python CouchDB wrapper is still pretty immature and lacks a lot of documentation, so the only real way to get a proper feel for how it works is to go through the code, and it was as I was exploring it I found schema.py, a class specifically designed for this problem.

What was even more useful was the similarities with the Django ORM. I managed to port my old models across with only a tiny amount of modification to the code - instead of extending django.db.models.Model, you extend couchdb.schema.Document, and instead of ManyToMany relationships, you can put your data in a List field. Most of the rest of my models could be kept exactly the same.

The schemaless nature of the database was still available - I wanted to have a separate type for photos on my homepage then for essays, so I created a generic Article type and then extended it for my photos. This allowed me to override the content field with the following:


content = models.TextField(default = "")
Became

@property
def content(self):
	return get_template("photo_to_blog.html").render(Context({'photo':self}))

Rather than have to store the html for a photo, I can generate it on the fly with a template - this means when I update the structure of the photo html, I don't have to go back and change the database. CouchDB allows me to only store the content field when I need, and my page templates won't know the difference.

Problems

I've already hit a few issues. One that confused me a lot was when I tried to use the Django Paginator with my new CouchDB models. The schema code allows you to attach CouchDB views to a model. When I tried to get the results from the view to paginate I got a paginator with the correct length, but an empty objects list. The problem lies in the way the python couchdb library wraps result sets. It implements slicing as between the couchdb indices, rather than the list indices as Django and I expected. I threw a quick hacky patch together to make this work, but it could do with some love:


--- a/couchdb/client.py
+++ b/couchdb/client.py
@@ -713,8 +713,6 @@ class ViewResults(object):
     def __getitem__(self, key):
         options = self.options.copy()
         if type(key) is slice:
-            if isinstance(key.start, int) and isinstance(key.stop, int):
-                return list(self)[key.start:key.stop] 
             if key.start is not None:
                 options['startkey'] = key.start
             if key.stop is not None:

Another issue was that schema.py didn't actually seem to be wrapping the values that it got back from the database. Maybe I was using it wrong or maybe it's just because the library is young, but another hacky patch:


--- a/couchdb/schema.py
+++ b/couchdb/schema.py
@@ -164,6 +164,11 @@ class Schema(object):
 
     def wrap(cls, data):
         instance = cls()
+
+        for k,v in data.items():
+            if v and k in instance._fields.keys():
+                data[k] = instance._fields[k]._to_python(v)  
+
         instance._data = data
         return instance
     wrap = classmethod(wrap)

I'm sure that I'm going to hit a whole lot more issues before I can merge my branch down to the live site, but I've been surprised how nicely the technologies mesh so far.

Permalink
Posted by Peter Braden. — Modified 10/01/2009 (0 comments) Tagged: code django

MySQL Encoding Woes

27th of December 2008

Posting this so hopefully no-one else wastes the 3 hours I just did trying to track down such a pointless bug.

So a new laptop means a whole load of installing and updating, and as I start working on my website again I notice that there's some unicode issues in some of my database data. I'm still stuck using MySQL (until I bring my CouchDB branch up to date), mainly because I lack the patience to move my stuff to Postgres. Unfortunately this means I'm stuck with the many nasty design choices made by the MySQL team.

I track the issue down to the fact that my data is being double encoded into utf-8. I've made specific efforts to keep all my databases in utf-8 because it is the only sane encoding. I seriously have less respect for anyone who chooses to use another character set as default because of the amount of my life that has been wasted chasing down the inevitable issues.

So the database is in utf-8, but when I do a SET NAMES 'utf8'; in mysql to ensure the client is getting utf-8 I get the same issues - basically, unless I leave the client encoding as latin1 it's going to doubly encode the data. Rather than tell you what I think of the developer that coded that, I'll just give you the solution that worked for me in Django - put this line in your settings:

DATABASE_OPTIONS = dict(charset="latin1", use_unicode=False)

Update 2009-01-05

So admittedly that was a hacky solution, and it turns out it breaks the django admin. After more hours looking through the django source code trying to find a way to fix the hack I give up.

The problem is that if you have latin1 data in your database, no matter what you do to the database the data will remain in that encoding. If you have the patience to go through every column in the database, converting to a blob and back then this may be your solution. You could probably even script it. For me, I've spent far too long trying to fix this bug - it only shows up in my staging environment so I'll live with it. And as soon as I can get away from the nastyness of mysql I will.

Permalink
Posted by Peter Braden. — Modified 5/01/2009 (0 comments) Tagged: code django

Newest Items

Photography

Recent Writing

Subscribe

Like what you see?

Subscribe to the RSS feed and be notified whenever there's something new…