Writing migrations
You need migrations, because things tend to change.
Migrations
Sometimes we don't get it right on the first attempt. In fact, we rarely do. The first evidence of this are the following lines in 'RichDocument/__init__.py':
sys.modules['Products.RichDocument.RichDocument'] = content.richdocument
sys.modules['Products.RichDocument.FileAttachemnt'] = content.attachments
sys.modules['Products.RichDocument.ImageAttachemnt'] = content.attachments
sys.modules['Products.RichDocument.widgets'] = widget
sys.modules['Products.RichDocument.ImagesManagerWidget'] = widget.images
sys.modules['Products.RichDocument.AttachmentsManagerWidget'] = widget.attachments
These lines are necessary because we moved some of the RichDocument python files around to get in line with best practice. Unfortunately, there are still ZODBs out there which depend on the old layout, so we have to provide aliases to make sure that code doesn't break.
Things can move around inside content types, too, and sometimes you want to replace a whole content type with another, better version. Doing so is called "migration". You may have come across this term if you ever upgraded Plone. Because Plone has lots of state in the ZODB, we need to write migrations to ensure upgraded sites are consistent with freshly installed new ones. Migrations can be tricky, mostly because there are so many different ways in which you can mess up your site that we just didn't think about, and you'll see plenty of bloody, sweat and tears in Products/CMFPlone/migrations. Now that you've learnt about unit testing, you can also check out Products/CMFPlone/tests/testMigrations.py and see how we test migrations to make sure they do what we think they do, and that they fail gracefully when something unexpected happens.
Luckily, most products are not quite as complex as Plone itself and hence can get away with simpler migrations. Even more luckily, if what you are migrating are content types, ATContentTypes contains a migration framework that takes most of the work out of it for you.
Using Walkers and Migrators
ATContentTypes uses a registry of migrators, managed in the portal_atct tool to keep track of its migrators. This is quite flexible, and enables the migrators to be hidden behind a sensible user interface. However, it's also a lot of work. In most cases, a manually created External Method will suffice, since migrations are typically only called once.
The ATContentTypes migration framework uses the concept of walkers and migrators. A walker, as found in Products/ATContentTypes/migration/walker.py is responsible for finding content to migrate. The simplest migrator is the CatalogWalker, which does a catalog query for all content of a given type. This has two implications:
- The old content must (still) be in the catalog
- It finds and migrates all content of the source type, no special rules.
Later on, we'll see an example of a more flexible walker.
Migrators are simple classes that know how to migrate a given content type. The framework contains base classes to make writing migrators surprisingly easy.
RichDocument 1.0 was a very different beast internally to what you see now. It didn't use ATContentTypes, the display menu or many of the other advanced techniques you have learnt about in this tutorial. It was, however, the base type for a number of other types, including a Page type (not to be confused with Plone's Page type, which is really just a renamed Document... hey, Limi, I had the name first!) in a customer project. Page was like an old-fashioned RichDocument with an added subtitle field. We later decided we didn't care about the subtitle, and we wanted to migrate it to a standard RichDocument to make it easier to maintain. This posed a few challenges:
- We had to change all the Page objects to RichDocument ones. There were far too many Pages in the portal to do this manually
- RichDocument didn't use to use the "display" menu for the float/preview images options. Instead, it had an IntegerField
displayImagesthat took on different values: 0 for basic view, 1 for float and 2 for display preview box. - RichDocument used to store plain old Images instead of ImageAttachment objects for its attached images.
The migration thus had to deal with all of this. We used the following external method, called 'Page/Extensions/migrate.py':
from Products.CMFCore.utils import getToolByName
from StringIO import StringIO
from Products.ATContentTypes.migration.walker import CatalogWalker
from Products.ATContentTypes.migration.migrator import CMFFolderMigrator, CMFItemMigrator
class RichDocumentMigrator(CMFItemMigrator):
"""Base class to migrate to RichDocument or a derivative. Takes care of
contained images.
"""
def migrate_imageAttachments(self):
"""Previously, we used standard images. Now we use our own
ImageAttachment type.
"""
for img in self.old.objectValues(('Image', 'ATImage',)):
self.new.invokeFactory('ImageAttachment', img.getId())
newImg = getattr(self.new, img.getId())
newImg.setImage(img.getImage())
def migrate_imageSettings(self):
"""Previously, we used an integer variable to determine whether to
display floats or previews or the standard document view. Now, we
use the "display" menu for this.
"""
displayImages = self.old.getDisplayImages()
self.new.setDisplayImages(False)
if displayImages == 1:
self.new.setLayout('richdocument_view_float')
elif displayImages == 2:
self.new.setLayout('richdocument_view_preview')
else:
self.new.setLayout('richdocument_view')
class PageMigrator(RichDocumentMigrator):
walkerClass = CatalogWalker
src_meta_type = 'Page'
src_portal_type = 'Page'
dst_meta_type = 'RichDocument'
dst_portal_type = 'RichDocument'
map = {'getRawText' : 'setText'}
def migrate(self):
"""Run the migration"""
out = StringIO()
print >> out, "Starting migration"
portal_url = getToolByName(self, 'portal_url')
portal = portal_url.getPortalObject()
migrators = (PageMigrator,)
for migrator in migrators:
walker = migrator.walkerClass(portal, migrator)
walker.go(out=out)
print >> out, walker.getOutput()
print >> out, "Migration finished"
return out.getvalue()
This method was set up in portal_skins/custom as a new External Method, with the id migrateTypes, module Page.migrate and function name migrate. To run it, we simply had to hit the Test tab of the External Method in the ZMI, or call it on http://localhost:8080/plone-site/migrateTypes.
WARNING: Migrations are typically not easy to Undo in the ZMI, because they tend to span many transactions. It's therefore vital that you keep a backup of your site pre-migration to ensure you can roll back if something goes wrong!
The migration framework works by instantiating a walker with a migrator to apply to all the objects it finds. This is what happens in the migrate() function. Migrators are classes usually derived from CMFFolderMigrator or CMFItemMigrator, or an intermediary. You can see these and the various ATContentTypes migration base classes in Products/ATContentTypes/migration/migrator.py and Products/ATContentTypes/migration/atctmigrator.py. Basically, a CMFItemMigrator migrates a standard CMF type (including those made with Archetypes), including all standard metadata, local roles etc. A CMFFolderMigrator does this as well as ensure folder contents are migrated with the folder.
There are three ways of performing custom migration:
- Any method in the class with a name starting with
migrate_will be called automatically. This is similar to howtest...methods are called during unit testing. If you follow the hierarchy of base classes fromCMFItemMigrator, you will see a number of such methods taking care of various parts of the migration process. In these methods, as well as the other automatically called methods described below, you have two basic variables available:self.oldis the old object being replaced, andself.newis the new object being created.If you need more control there are some additional method prefixes you can use:
beforeChange_...methods are called before the migration takes place, meaning beforeself.newis created. Hence, they must work in terms ofself.old.last_migrate_...methods are called just before the migrator finishes migrating an object.
- The method
custom()is called aftermigrate_...methods, but before thelast_migrate_...methods. The default implementation is empty, but you can override it to perform any custom migration. - The easiest way of performing migrations is with the
mapclass variable. Here, you can define a map of attributes and/or methods that should get migrated. From the docstring of migrate_withMap() in 'Products/ATContentTypes/migrate/migrator.py':"""Migrates other attributes from obj.__dict__ using a map The map can contain both attribute names and method names 'oldattr' : 'newattr' new.newattr = oldattr 'oldattr' : '' new.oldattr = oldattr 'oldmethod' : 'newattr' new.newattr = oldmethod() 'oldattr' : 'newmethod' new.newmethod(oldatt) 'oldmethod' : 'newmethod' new.newmethod(oldmethod()) """
In the code above, we define a base class for old-to-new RichDocument migrations. Note that it uses CMFItemMigrator instead of CMFFolderMigrator even though RichDocument is folderish, because we take care of the Image-to-ImageAttachment translation in the automatically called migrate_imageAttachments(). The method migrate_imageSettings() takes care of the change from using an integer to hold the display mode to using the "display" menu.
The PageMigrator now becomes very simple. It sets the source and destination portal- and meta-types (in most cases, these are the same - check the definition of your Archetypes class to be sure; in ATContentTypes core, they are actually different: the meta type is ATImage for the portal type Image and so on) for the walker to search for, and defines the map that migrates the value of the getRawText accessor to the setText mutator.
Migrating between different versions of the same type
Perhaps more common than migrating from one content type to a completely different one is the case where you have changed the internal organisation of a content type. You may have renamed fields, or changed the internal conventions for how your data is stored.
archetype_tool provides some means of performing basic migrations, via its Update schema tab. This will make sure that the in-ZODB schemata of your types are synced with the current story on the filesystem. However, you may end up losing data this way. If you rename a field, Archetypes' storage will store it in a different location (e.g. a different attribute). Hence, existing instances will have a blank/default value for these fields. To retain the old data, you will probably need migration.
The contentmigration product was written to help this scenario. It extends the ATContentTypes migrator in a few ways. From its README:
- A CustomQueryWalker can be used to specify a more specific catalog query for a walker to use (e.g. which content to actually migrate). This can be used with any migrator.
- A BaseInlineMigrator is similar to BaseMigrator, but does not migrate by
copying the old object to a temporary location, creating a new object and
applying migration methods. Instead, migration methods are applied in-place.
This simplifies the code significantly, because attributes, local roles etc.
does not need to be copied over.
Note that whereas BaseMigrator works in terms of self.old and self.new as the objects being migrated, BaseInlineMigrator only has a single object, stored in self.obj. This can be used with any walker.
- An extension of this class called FieldActionMigrator uses the action-based migration framework for Archetypes fields, found in field.py. Please refer to that file for full details, but briefly, you specify a list of attributes to migrate at the storage level, instructing the migrator whether to rename, transform, unset or change the storage for an attribute.
You can get contentmigration from the Collective svn repository. Note that it is not a product to install in Plone. Instead, you can call it from your own migration methods, with statements like:
from Products.contentmigration.walker import CustomQueryWalker
from Products.contentmigration.migrator import FieldActionMigrator
from Products.Archetypes.public import *
class MyMigrator(FieldActionMigrator):
src_portal_type = 'MyType'
src_meta_type = 'MyType'
fieldActions = ({'fieldName' : 'someField',
'storage' : AttributeStorage(),
'newFieldName' : 'renamedField',
'newStorage' : AnnotationStorage(),
'transform' : lambda obj, val, **kw: val + 10,
})
The CustomQueryWalker is a generic walker that you can use in other migration too. It allows you specify a specific catalog query to use to find objects. In this way, you can restrict migration to a subset of the content in your site. You initialise it with a query like this:
walker = CustomQueryWalker(portal, MyMigrator,
query = {'path' : '/some/path'})
walker.go()
Note that the src_portal_type and src_meta_type still exist in the migrator (MyMigrator in this case) and are inserted in the query. In fact, they override any setting of meta_type or portal_type in the query itself.
The BaseInlineMigrator and its extension FieldActionMigrator are the base classes you can use for "inline" migration - that is, migration that doesn't migrate from one type to another, but alters the fields in the current object. You can use migrate_..., beforeChange_..., last_migrate_... methods, and the custom() override here too. Note that the map class variable is not available, because it is superseded by the fieldActions variable and is generally less useful for inline migrations. Also note that any custom logic
you implement should use self.obj instead of self.old and self.new, since there is only one object taking part in the migration.
The fieldActions method of migration is similar to the map feature of the BaseMigrator, but is more powerful. It uses a list of actions, specified as dictionaries. Actions work on fields, accessed with storages. Note that the fields and storages do not need to exist in the current schema. Hence, if a field name has been changed, and/or the storage of the field changed, you can use actions like the one above to specify the change. Actions can also apply transformations via a callback method (the lambda above), and execute a callback method before or after the migration of a specific action.
To learn more about field actions, see contentmigration/field.py. For examples, see contentmigration/tests/testATFieldMigration.py and contentmigration/tests/cmtc.py.