Send Plone content to a Django web app via ContentMirror

« Return to page index

Export basic Plone Archetype-based content to Django, using the ContentMirror Product and a postgresql database.

Introduction

Overview and prerequisites.

Overview

We've got content in Plone sites that needs to be consumed by other services.  ContentMirror allows us to push this content out to a RDMBS, which we can use in any number of ways.  Since Django includes some basic out-of-the-box functionality with RDMBS back-ends, we thought this might be interesting.  Turns out, it was!

Environment

This was first done on 12 February 2009, with this stack:

  • Centos5 linux
  • Plone 3.2.1 (via buildout)
  • ContentMirror 0.4.1
  • PostgreSQL 8.1.11
  • Django 1.1 pre-alpha
  • mod_wsgi (version 2.4?)

I tried it again on 25 June 2010, with this stack:

 

  • Centos5 linux
  • Plone 3.2.3 (via buildout)
  • ContentMirror 0.4.1
  • PostgreSQL 8.1.18
  • Django 1.1.1
  • mod_wsgi 2.4
In other environments, YMMV.

What you should know or read before you start this

  • Installation of Plone Products via buildout "productdistros"
  • Installation and Configuration of PostgreSQL databases (or MySQL databases).
  • Installation and Configuration of Django apps

Prepare the Database

Create or setup a database for ContentMirror to use.

The details of these steps are way beyond the scope of this post. If you need help setting up your database, contact your system administrator and/or try Googling it yourself. Good luck!

PostgreSQL

  1. Create a PostgreSQL database that will be used by both ContentMirror and by Django.  Something like this:
    createdb mydatabase
  2. Setup user accounts and permissions so everything plays nicely.

You should be able to run psql mydatabase at the command prompt.

MySQL

  1. Create a MySQL database that will be used by both ContentMirror and by Django.  Something like this:
    $ mysql
    mysql> CREATE DATABASE mydatabase;
  2. Setup user accounts and permissions so everything plays nicely.

You should be able to run mysql mydatabase at the command prompt.

 SQLite

I haven't tried using SQLite, but I understand that SQLAlchemy (which ContentMirror uses for the database layer) now supports SQLite. I'd be interested in your comments on this.

Prepare the Django Project

Configure a new django project and server.

Setup a fresh Django project

  1. Install Django.  (This is beyond the scope of this document.)
  2. Create a new Django project.
    django-admin.py startproject mydjangoproject
    cd mydjangoproject
  3. The rest of these commands happen inside the "mydjangoproject" Django project directory.

  4. Configure the settings.py file to point to the PostgreSQL database.
    DATABASE_ENGINE = 'postgresql_psycopg2'  # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
    DATABASE_NAME = 'mydatabase'             # Or path to database file if using sqlite3.
    DATABASE_USER = 'myuseraccount'          # Not used with sqlite3.
    DATABASE_PASSWORD = 'mypassword'         # Not used with sqlite3.
    
  5. Do Django's syncdb so your PostgreSQL database has the Django minimum tables.
    python manage.py syncdb

Configure and Run ContentMirror

Connect ContentMirror to your database and create table structures for your Plone content types.

Install ContentMirror 

The fine details of this are beyond the scope of this document, but you can find out more here: ContentMirror.

 For my buildout-based installation, I added ContentMirror to the "productdistros" section, like this:

[productdistros]
urls +=
    http://contentmirror.googlecode.com/files/ContentMirror-0-4-1.tgz
Then, I re-ran buildout.

Configure ContentMirror

 

  1. Find the settings-example.zcml.  For buildout-based installations, it is located at parts/productdistros/ContentMirror/settings-example.zcml.  For non-buildout installations, look in Products/ContentMirror/settings-example.zcml.
  2. Copy settings-example.zcml to settings.zcml.
  3. Edit settings.zcml to point to the database:
    <!-- setup a database connection -->
    <db:engine url="postgres://localhost/mydatabase"
               name="mirror-db"
               echo="True"/>
    Be sure that you specify the correct username and password in this file, too. See the "Configuration" section here: http://code.google.com/p/contentmirror/wiki/Installation
  4. Create SQL statements for tables
    ddl.py postgres > somestuff.sql
  5. Run that SQL against your database, to create ContentMirror table.

    psql mydatabase < somestuff.sql
  6. Run bulk.py to export existing Plone content into the database.  (NOTE: bulk.py is buried in the eggs somewhere.)
Hack ContentMirror code

Everything worked fine the first time I tried this. But the second time, I ran into a couple roadblocks. They might not be true bugs, in the sense that they might work fine on normal data. But my database is full of bad, corrupted data. So, I had to make a couple hacks to get this to work.

Hack #1 - Run as Admin User

ContentMirror is only designed to mirror content that has been Published, and is viewable by Anonymous users. I needed to mirror ALL content, regardless of its workflow state. In bulk.py, I added three lines in main():

 def main( app ):
    if not len(sys.argv) == 2:
        print "mirror-batch portal_path"
        sys.exit(1)


    #Hack: run as admin user - 06/30/2010
    from AccessControl.SecurityManagement import newSecurityManager
    admin = app.acl_users.getUserById("admin")
    newSecurityManager(None, admin)

Incidentally, ContentMirror does mirror the workflow state to the external database, so I am able to see which content is Public, Private, etc. after export.

Hack #2 - Fix ReferenceTransform for bad data

I also had to hack transform.py, so that ContentMirror would keep going, despite the bad data in my database. In class ReferenceTransform, I added four lines to the copy() method:

    def copy( self, instance, peer ):
        value = self.context.getAccessor( instance )()

        if not value:
            return

        if not isinstance( value, (list, tuple)):
            value= [ value ]

        #Hack - remove None values.
        goods = [x for x in value if x is not None]
        if len(goods) < 1:
            return
        value = goods

I even compared this to the latest source code (version 0.6.0rc2) and found that the latest source still has this vulnerability. So, in true open-source spirit, I submitted an issue to the ContentMirror team.

Massage the Tables

Adjust the database tables that ContentMirror created so that they are usable by Django.

Massage the ContentMirror tables

Short-cut: Just use this file: Massage ContentMirror Tables for Django

For each of the ContentMirror tables, there must be an ID field that has unique values, in order for Django to use the data.

Tables affected in this attempt:

  • atbtreefolder
  • atdocument
  • atevent
  • atfavorite
  • atfile
  • atfolder
  • atimage
  • atlink
  • atnewsitem
  • files
  • relations

So, repeat these steps for almost all of the tables (excluding Content, which has ID built-in), replacing sometable with the actual table name.

  1. Create a sequence to be used by the new ID field.
    CREATE SEQUENCE sometable_id_seq;
  2. Add the ID field to the table.
    ALTER TABLE sometable ADD id INT UNIQUE;
  3. Make the ID field auto-increment.
    ALTER TABLE sometable ALTER COLUMN id SET DEFAULT NEXTVAL('sometable_id_seq');
  4. Update the data values in the ID field.
    UPDATE sometable SET id = NEXTVAL('sometable_id_seq');

Steps taken from this blog post:
http://pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-an-auto-increment-column-to-a-table.html

Finish the Django configuration

Show ATContentTypes in Django.

New Django App for Archetypes Content

  1. Create a new "app" to hold your Archetypes Content
    python manage.py startapp atcontent
  2. Let Django make basic models out of ContentMirror tables
    python manage.py inspectdb > atcontent/models.py
  3. Edit the generated code, per the comments django puts in it. (Django will warn you on the next step if you get it wrong.)
    • Remove django default table definitions.
    • Removed auto-generated "id" fields, because Django infers them anyway and complains if you define them yourself.
    • Reorder the model classes so that references come after the model classes they refer to are defined.
    • Add related_name parameters to ForeignKey definitions where necessary.
    • The second time I did this, I also removed "At" from the Django model class names; since the Django app is "atcontent," it just seemed redundant.
  4. Make sure that your data model is valid. (If it's not, go back to the previous step and fix it.)
    python manage.py sql atcontent
  5. Synchronize with the db.
    python manage.py syncdb

 

Here's the file I came up with (use at your own risk): models.py

Plug in the App and start serving

  1. Add "atcontent" to INSTALLED_APPS in mydjangoproject/settings.py.
  2. Add a custom admin.py in your app.  (Here's the file I used: admin.py). If you're brave, you can add some neat filtering and searching to the admin interface, using Django ModelAdmin classes.
  3. Add the admin panel and follow the other instructions in this tutorial:

http://docs.djangoproject.com/en/dev/intro/tutorial02/#intro-tutorial02

 At this point, all of the default Plone Content Types are now mirrored to Django. You can (sort-of) view them via the admin interface. However, they're not pretty. (If you want that, you'll need to build Django "views". This document is not going to cover that. Sorry.)

Here's what it looks like:

Django admin interface to ATContentTypes

 

Conclusion

What have we got and where do we go from here?

What have we got?

Now our Plone content is actually mirrored to a database, in near-real-time.  Pretty cool.  (You can actually watch it happen, if you run Plone with instance fg.)  We can query the content via anything that can access a PostgreSQL database.

Also, we can view (sort-of) our Plone content via Django's admin interface.  If you just need proof-of-concept, here it is.

What's next?

We do not have a full-featured Django web application.  Django does not have any of our workflow or permission definitions from Plone.  Nor do we even really have a proper way to display Plone content items.

Further Observations

It appears that ContentMirror is designed to work with a single Plone site. This one-site model will cause me a bit more work, because the name of the Plone site is not exported with the content, AFAICT.

I tried installing plango, the Django app for serving Plone content, since Andy did so much work on it. I found that I had to hack a ton of it to remove search-specific code that requires Postgres 8.3 (since i'm running 8.1.18). It worked OK, but not spectacularly. When I realized it uses the GPL 3 and I'm not sure I have approval to go GPL myself, I had to yank it entirely.

My next step will be to try another Django-based CMS, like django-cms. I don't expect to serve the exported atcontent data directly. Instead, I expect to write some migration code that converts from the AtContentTypes models into django-cms content type models.