Archetypes Developers Guide

by Alex Limi last modified Dec 10, 2009 08:49 PM
This is the main resource for Archetype developers.

Archetypes Developers Guide

Contents

Missing Chapters:

  • Getting help
  • Mailing lists
  • Wiki

Introduction and Installation of Archetypes

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.6 $
Web site:http://sourceforge.net/projects/archetypes

Introduction

Archetypes is a framework for developing new content types in Plone. Compared to building content types using the stock CMF, Archetypes gives you the following advantages:

  1. Automatically generates forms
  2. Provides a library of stock field types, form widgets, and field validators
  3. Integrates custom fields, widgets, and validators
  4. Automates transformations of rich content

The project is hosted on the Archetypes Project 21 at SourceForge 20. The latest version of this document can be always found under the under the Developers/Archetypes 22 directory of plone-docs.

Installing Archetypes

Requirements

Archetypes is currently being tested and run in various environments using the following combination:

  • Zope 2.6
  • CMFPlone 1.0.2
  • CMF 1.3.1

It is also known to work smoothly with Zope 2.5.

Using the tarball

  1. Go to the Archetypes Sourceforge page 23 and download the latest stable version of Archetypes.
  1. Decompress the tarball:

    tar xvfz archetypes-1.0.1.tgz
    

It should contain the following directories:

Archetypes
ArchExample
generator
validation
  1. Install generator and validation as described in generator/README and validaton/README
  2. Restart your Zope.
  3. Check in the Control Panel and the Products Management screen to make sure everything imported just fine. If a product appears with a

Checking out from CVS

If you want to get the latest version of Archetypes from CVS, here is how to do it.

Please note: the version of Archetypes in the CVS is experimental and may not function as expected. You have been warned!

Using Windows

  1. Download TortoiseCVS 24, a Windows CVS client.
  2. Install the program and reboot if necessary.
  3. Right-click inside the Products folder, and choose CVS checkout
  4. Use the following settings to set up the CVSROOT.
Protocol:Password server (:pserver:)
Server:cvs.sourceforge.net
Port:(leave blank)
Repository directory:/cvsroot/archetypes
User name:anonymous
Module:Archetypes
  1. Repeat for the modules ArchExample, generator, validator

Using *nix

Quick and dirty:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes co Archetypes
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes co ArchExample
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes co validation
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes co generator

Installing an Archetypes based Plone site

To a new Plone site

  1. From the ZMI add drop-down, select Plone Site
  2. In the Add Plone Site form choose Archetypes Site from the Site Type drop-down.

To an existing Plone site

  1. Edit Products/Archetypes/config.py and change INSTALL_DEMO_TYPES to 1
  2. Restart the Zope server
  3. From the ZMI add drop-down, select External method
Id:install_archetypes
Title:Install Archetypes (optional)
Module Name:Archetypes.Install
Function Name:install
  1. Click on the external method install_archetypes that you just created
  2. Click on the Test tab to run the external method

Getting started with Archetypes - A Simple Example

Author:Alexander Limi
Contact:limi@plone.org
Date:$Date: 2004/03/03 13:02:29 $
Version:$Revision: 1.7 $
Web site:http://sourceforge.net/projects/archetypes
Original URL:http://plone.org/documentation/archetypes/gettingstarted/

Introduction

Archetypes is Plone's next-generation content type generation framework. This section is aimed at the beginner level and is still a work in progress. (ArchExample hasn't been updated to reflect all the changes here yet.)

What is Archetypes?

Archetypes (formerly known as CMFTypes) is a Zope Product which simplifies the creation of new content types under Zope 2 and the Content Management Framework (CMF). Most content management projects involve introducing new types of content, which in the non-trivial case requires an informed understanding of how Zope and the CMF work. Archetypes provides a simple, extensible framework that can ease both the development and maintenance costs of CMF content types while reducing the learning curve for the simpler cases.

Merriam-Webster entry:

Main Entry:ar-che-type
Pronunciation:'är-ki-"tIp Listen
Function:noun
Etymology:Latin archetypum, from Greek archetypon, from neuter of archetypos archetypal, from archein + typos type
Date:1545

1. The original pattern or model of which all things of the same type are representations or copies - PROTOTYPE - also: a perfect example.

  1. IDEA.

3. an inherited idea or mode of thought in the psychology of C. G. Jung that is derived from the experience of the race and is present in the unconscious of the individual.

Create your Product based on ArchExample

Go to Products dir, create InstantMessage directory.

Copy contents of the ArchExample directory to your InstantMessage directory. (You do not need the .pyc files, these are compiled Python files that are generated when the scripts are run.)

It should now look like this:

- InstantMessage
- Extensions
    - Install.py
- skins
    - archexample
    - article_view.pt
- __init__.py
- Article.py
- ChangeLog
- config.py
- Permissions.py

Now we will go through the files one by one and add what we need to produce our Instant Message content type.

Permissions.py - The security permissions for your content type

We have decided that we need three types of permissions for our Instant Message type - Add, Edit/Delete and View:

# Add an Instant Message
ADD_CONTENTS_PERMISSION = 'InstantMessage: Add Message'

# Edit/Delete an Instant Message
EDIT_CONTENTS_PERMISSION = 'InstantMessage: Edit Message'

# View Instant Message
VIEW_CONTENTS_PERMISSION = 'InstantMessage: View Message'

The reason the variables here are UPPERCASED is that it should be treated like a constant, not a variable, and this makes them easy to spot.

In normal CMF, all content types use the same basic "Add portal content" permission - however, this does not allow you to have a fine-grained control over your content types - let's say you want Members to be able to add Documents, but only Managers should be able to add News Items - then one generic permission for all of them won't do.

Therefore, we recommend using separate permissions when creating a Plone Product, and to use the standard way of naming permissions: '<ProductName>: <Permission>'. An example: 'MessageBoard: Add Message'. This will group the related permissions together, and allows us to recognize which permissions belong to which Product.

config.py - The configuration details for your content type

First we have to import two classes from our Product and Archetypes, respectively:

from Permissions import ADD_CONTENTS_PERMISSION
from Products.Archetypes.public import DisplayList

ADD_CONTENTS_PERMISSION refers to your Content Type permission for adding your type defined in Permissions.py.

The second import is Displaylist, a data container we use when displaying pulldowns/radiobuttons/checkmarks with different choices. Let's say we wanted priorities on our instant messages, and we wanted those to be High, Normal and Low. We will specify these later in the file.

The next two lines sets the project (Product in Zope) name, and points to the skin directory. We change the PROJECTNAME to InstantMessage:

PROJECTNAME = "InstantMessage"
SKINS_DIR = 'skins'

The next line sets a GLOBALS reference point that is used later during installation (finding the Product directory etc):

GLOBALS = globals()

Then we are done with the generic setup, and need to specify our Priority pulldown. It should look like this:

MESSAGE_PRIORITIES = DisplayList((
    ('high', 'High Priority'),
    ('normal', 'Normal Priority'),
    ('low', 'Low Priority'),
    ))

(The reason for double parantheses is that DisplayList is a class that you pass a tuple of tuples to.)

__init__.py - The startup script

__init__.py is the script that is run when Zope and Plone is started.

The first part looks like this:

from Globals import package_home
from Products.Archetypes.public import process_types, listTypes
from Products.CMFCore import utils
from Products.CMFCore.DirectoryView import registerDirectory
import os, os.path

These are import statements that essentially gets various things you need to get the type up and running.

The first one imports package_home from Zope, this tells the script where on the filesystem your package is located.

Then we import process_types and listTypes from the Archetypes package. These are convenience functions - process_types generates Content Types, constructors and Factory Type Information from all the available types in your Product, and listTypes enables you to list the types available in your Product.

The two next lines are needed to talk to the CMF, and to register Directory Views inside CMF.

Finally we import some convenience classes from the standard Python library to ease working with the operating system.

Next step is to import the config details for out Product, defined in config.py:

from config import SKINS_DIR, GLOBALS, PROJECTNAME
from config import ADD_CONTENTS_PERMISSION

The next line registers the skins and global directories as File System Directory Views in the CMF:

registerDirectory(SKINS_DIR, GLOBALS)

Now for the real action. We define a function to initialize our Content Type(s):

def initialize(context):
##Import Types here to register them
import InstantMessage

We change Article to InstantMessage, since that is what our type is known as. You can also import several Content Types at once here, just separate them with commas: import InstantMessage, Person, Location.

The next section generates the content types, constructors and FTIs required to make your types work with the CMF:

content_types, constructors, ftis = process_types(
    listTypes(PROJECTNAME),
    PROJECTNAME)

The final section instantiates an object of the type ContentInit (from CMFCore), and registers your types in the CMF:

utils.ContentInit(
    PROJECTNAME + ' Content',
    content_types      = content_types,
    permission         = ADD_CONTENTS_PERMISSION,
    extra_constructors = constructors,
    fti                = ftis,
    ).initialize(context)

InstantMessage.py - the Schema for your content type

Finally we are ready to create the Schema for your content type. Rename ` Article.py` to InstantMessage.py. A step-by-step guide of the Schema creation follows.

First, we import our message priorities. Change the first line to:

from Products.InstantMessage.config import MESSAGE_PRIORITIES

Then we import the basic Schema from Archetypes:

from Products.Archetypes.public import BaseSchema, Schema

The heart of an Archetype is its Schema, which is a sequence of fields. Archetypes includes three stock schemas: BaseSchema, BaseFolderSchema, and BaseBTreeFolderSchema. All three include two fields, id and title, as well as the standard metadata fields. BaseSchema is a normal content type, BaseFolderSchema is a folderish content type (can contain other content types), and BaseBTreeFolder is for folders which need to handle hundreds of thousands or millions of objects.

The Schema works like a definition of what your object will contain and how to present the information contained. When Zope starts up, during product initialization, Archetypes reads the schema of the registered classes and automagically generates methods to read (access) and change (mutate) each of the fields defined on a Schema.

Next, we import some of the widgets we need for displaying our content type:

from Products.Archetypes.public import SelectionWidget, TextAreaWidget
from Products.Archetypes.public import RichWidget

A SelectionWidget is a UI widget that will let you choose one of many choices, a TextAreaWidget lets you enter text, and a RichWidget enables the currently selected Visual Editor (WYSIWYG editors like Epoz, or tranformed editors like MS Word or OpenOffice).

Full list of the widgets can be found in generators/widgets.py. You can also write your own.

Next up, we import BaseContent (which we inherit from with our Content Type, if it had been folderish, we would have imported and inherited from BaseFolder or BaseBTreeFolder):: The registerType function is needed to initialize our content type later, so we import it here.

from Products.Archetypes.public import BaseContent, registerType

You also can import all all the fields, widgets and classes made publicly available by Archetypes. Note that this is considered bad style by most Python programmers (importing * from a module), but this doesn't mean it can't be done and, of course, it makes life a lot easier: it makes a lot of stuff available to our module, although we are only going to use a small amount of it:

from Products.Archetypes.public import *

Import Permissions

We also need to import our permissions, so we write:

import Permissions

Now we have everything we need to start building the schema itself. We start out by defining it as the BaseSchema (Dublin Core metadata) plus our own additions:

schema = BaseSchema +  Schema((

We pass 'Schema' a tuple of field objects:

StringField('priority',
            vocabulary=MESSAGE_PRIORITIES,
            widget=SelectionWidget(),
            ),

TextField('body',
          searchable=1,
          required=1,
          allowable_content_types=('text/plain',
                                   'text/structured',
                                   'text/html',),
          widget=RichWidget,
          ),

We define the body of the Instant Message to be a RichWidget, so you can use formatting of your message. To see the list of available fields, check the Fields section in the Archetypes chapter.

The last step is to create the class for the InstantMessage itself. It inherits from BaseContent, which automatically gives it 'id' and 'title' attributes, and the entire Dublin Core set:

class InstantMessage(BaseContent):

Then we assign the schema defined outside of this class definition to this class:

schema = schema

(The reason we have to explicitly set it like this is because of namespace rules in Python.)

We can also assign it a dedicated view action, although if you do not do this, Archetypes will generate one for you:

actions = ({

    'id': 'view',
    'name': 'View',
    'action': 'instantmessage_view',
    'permissions': (Permissions.VIEW_CONTENTS_PERMISSION,)
    },)

Now everything is ready for use, and we activate the type by registering it:

registerType(InstantMessage)

Congratulations! You have just created your first Archetype!

Beginners (not only) with knowledge about the Unified Modeling Language UML (or person who want to learn it) should have a look at ArchGenXML, a command line utility for generating complete working products out of UML-class-diagrams.

XXX (Mention 'REFRESH.txt' as a useful mechanism when developing, but with Archetypes it does not work properly in every case!)

XXX (Still missing: the custom view description)

More examples: ArchRecipes

Author:Ivo van der Wijk
Contact:ivo@amaze.nl
Date:$Date: 2003/09/14 15:31:05 $
Version:$Revision: 1.1 $
Web site:http://sourceforge.net/projects/archetypes
Original URL:http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/checkout/archetypes/ArcheRecipes/README?rev=1.4

Introdcution

Here you will find some Archetype Recipes, small code examples with documentation explaining how certain features of Archetypes work and how certain functionality can be achieved using Archetypes.

I write them myself mostly as an excercise, as a way to learn certain features, to help colleagues along or just because I want to show what cool things can be done using Archetypes :)

To me, a piece of code often says more than a thousand words. I hope you find these code snippets usefull. If you want any of your own archetypes included, please mail me at:

ivo@amaze.nl

or contact me on Freenode / #plone, nick VladDrac or VladDrak

Currently included recipes:

FieldSharing
Demonstrates how to "share" a specific field between two schema's (standard and Metadata), i.e. use the Dublin Core 'description' metadata field for your newsarticles Teaser.
StoreAlot
Demonstrates a Folderish (ordered) content type in which an unlimited number of attachements and linkobject can be stored, subobject filtering (allowable_content_types etc) and a custom view template.
SQLDocument
Demonstrates how to store specific fields in a non-default storage (PostgreSQL storage in this case)
DemoEvent
Demonstrates how to create events that show up in the portal calendar. Mainly focusses on working around a CalendarTool bug.
CustomValidator
Demonstrates how to write your own 'pluggable' validator.

recipes to write:

  • i18n
  • archetypes caveats

How to access Archetypes fields using Page Templates

Author:Alexander Limi
Contact:limi@plone.org
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes
Original location:http://plone.org/documentation/howto/HowToAccessArchetypesFields

Use the widget method to get the macro

<input metal:use-macro="python: here.widget('title', mode='edit')"
       name="textfield" type="text" size="30">

Multi-valued vocabulary-based fields

<div tal:define="field python:here.Schema()['my_field_name']">
 <ul tal:define="accessor python:getattr(here, field.accessor);
              vocab python:field.Vocabulary(here)">
  <li tal:repeat="item accessor"
      tal:content="python:vocab.getValue(item)"></li>
 </ul>
</div>

Single-valued vocabulary-based fields

<div tal:define="field python:here.Schema()['my_field_name'];
              accessor python:getattr(here, field.accessor);
              vocab python:field.Vocabulary(here)"
  tal:content="python:vocab.getValue(accessor())">
</div>

Single-valued non-vocabulary fields (extended version)

<div tal:omit-tag=""
     tal:define="field python:here.Schema()['my_field_name'];
                 accessor python:getattr(here, field.accessor)"
  tal:content="python:accessor()">
</div>

Singlevalued non-vocabulary fields (compact version)

<div tal:omit-tag=""
     tal:define="accessor python:getattr(here, here.Schema()['my_field_name'].accessor)"
     tal:content="accessor">
</div>

Schema

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes
Document:Archetypes Developers Guide

The heart of an archetype is its Schema, which is a sequence of fields. Archetypes includes three stock schemas: BaseSchema, BaseFolderSchema, and BaseBTreeFolderSchema. All three include two fields, 'id' and 'title', as well as the standard metadata fields.

The Schema works like a definition of what your object will contain and how to present the information contained. When Zope starts up, during product initialization, Archetypes reads the schema of the registered classes and automagically generates methods to access and mutate each of the fields defined on a Schema.

Advanced: Class Attributes

Besides the schema, you can define all of the content properties you see when you click on a content type in the 'portal_types' tool. Here is a list of class attributes, with their default values (see 'ArchetypeTool.py'):

Default class attributes/methods

modify_fti : method
Is looked up on the module and called before product registration. Works as a hook to allow you to modify the standard factory type information provided by Archetypes.
add${classname} : method
Is looked up on the module. If it doesnt exist, a basic one is autogenerated for you.
content_icon
A name of an image (that must be available in the context of your object) to be used as the icon for your content type inside CMF.
global_allow
Overrides the default global_allow setting on the default factory type information.
allowed_content_types
Overrides the default allowed_content_types setting on the default factory type information. If set, supercedes the filter_content_types in case it is not provided on the class.
filter_content_types
Overrides the default filter_content_types setting on the default factory type information.

Additional notes about Factory Type Information

  • If your class declares to implement IReferenceable, you will get a references tab on your object, allowing you to make references to other objects.

  • If your class declares to implement IExtensibleMetadata, you will get a properties tab on your object, allowing you to modify the metadata.

  • Custom actions: Define an actions member on your content type and the external method will apply this to the types tool for you. This means the if you want custom views or something you only need to say something like:

    class Foo(BaseContent):
        actions = ({'id': 'view',
                    'name': 'View',
                    'action': 'custom_view',
                    'permissions': (CMFCorePermissions.View,)
                   },)
    

Fields

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.2 $
Web site:http://sourceforge.net/projects/archetypes

Introduction

You add additional fields to a schema by using one of available field types. These fields share a set of properties (below, with their default values), which you may modify on instantiation. Your fields override those that are defined in the base schema.

Properties

More commonly used field properties:

required
Makes the field required upon validation. Defaults to 0 (not required).
widget
One of the Widgets to be used for displaying and editing the content of the given field.

Less commonly used field properties:

default
Sets the default value of the field upon initialization.
vocabulary

This parameter specifies a vocabulary. It can be given either as a static instance of DisplayList or as a method name (it has to be the name as a string). The method is called and the result is taken as the vocabulary. The method should return a DisplayList.

The contents of the vocabulary are used as the values which can be choosen from to fill this field.

An example for a DisplayList usage can be found in the ArchExample directory in config.py.

enforceVocabulary
If set, checks if the value is within the range of vocabulary upon validation
multiValued
If set, allows the field to have multiple values (eg. a list) instead of a single value
isMetadata
If set, the field is considered metadata
accessor 1
Name of the method that will be used for getting data out of the field. If the method already exists, nothing is done. If the method doesn't exist, Archetypes will generate a basic method for you.
edit_accessor
Name of the method that will be used for getting data out of the field just before edition. Unlike the standard accessor method which could apply some transformation to the accessed data, this method should return the raw data without any transformation. If the method already exists, nothing is done. If the method doesn't exist, Archetypes will generate a basic method for you.
mutator
Name of the method that will be used for changing the value of the field. If the method already exists, nothing is done. If the method doesn't exist, Archetypes will generate a basic method for you.
mode
One of r, w or rw. If r, only the accessor is generated. If w only the mutator and the edit accessor are generated. If rw, accessor and mutator and edit accessor are generated.
read_permission
Permission needed to view the field. Defaults to CMFCorePermissions.View. Is checked when the view is being auto-generated.
write_permission
Permission needed to view the field. Defaults to CMFCorePermissions.ModifyPortalContent. Is checked when the submitted form is being processed..
storage
One of the Storage options. Defaults to AttributeStorage, which just sets a simple attribute on the instance.
generateMode
Deprecated?
force
Deprecated?
validators
One of the Validators. You can also create your own validator.
index

A string specifying the kind of index to create on portal_catalog for this field. To include in catalog metadata, append :schema, as in FieldIndex:schema. You can specify another field type to try if the first isn't available by using the | character. Both combinations can be used together, as in:

...
index="TextIndex|FieldIndex:schema",
...
schemata
Schemata is used for grouping fields into fieldsets. Defaults to default on normal fields and metadata on metadata fields.

Example

Here is an example of a schema (from 'examples/SimpleType.py'):

schema = BaseSchema + Schema((
  TextField("body",
        required=1,
        searchable=1,
        default_output_type="text/html",
        allowable_content_types=("text/plain",
                                 "text/restructured",
                                 "text/html",
                                 "application/msword"),
        widget  = RichWidget,
        ),
  ))
[1]

Depending on the mode of each Field in the Schema the runtime system will look for an accessor or mutator. If, for example, the mode of a field is "rw" (as is the default), then the generator will ensure that accessors and mutators exist for that field. This can happen one of two ways: either you define the methods directly on your class, or you let the generator provide them for you. If you don't require specialized logic, then letting the generator create these methods on your new type is a good idea.

The format for accessors and mutators is as follows:

field -> title

accessor -> getTitle()          here/getTitle
mutator  -> setTitle(value)

Views

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes

Views

Views are auto-generated for you by default, based on the options you specified on your Schema (Widgets, Fields, widget labels, etc.) if you use the default FTI actions (eg: don't provide an actions attribute in your class. See Additional notes about Factory Type Information.

XXX Add link to Factory Type Information

Customizing Views

Override parts of the generated view

If you want only to override a few parts of the generated View, like the header or footer, you can:

  1. Create a template named ${your_portal_type_lowercase}_view 2

  2. On this template, you may provide the following macros:

    header
    body
    footer
    
  3. When building the auto-generated view, archetypes looks for these macros and includes them in the view, if available. Note that the body macro overrides the auto-generated list of fields/values.

[2]Currently, this is only implemented for the auto-generated view template.

Customizing only a widget

Or, for customizing only a widget:

  1. Set the attributes macro_view or macro_edit to the location of your custom macro upon instantiation of the Widget.
  2. Your custom macro template must contain a macro with the same name as the mode where it will be used. Eg: a template that is being used on macro_view must have a macro named view. The same applies to macro_edit and edit.

Widgets

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes

Introduction

When Archetypes generates a form from a schema, it uses one of the available Widgets for each field. You can tell Archetypes which widget to use for your field using the widget field property.

Note, though, that a field cannot use just any widget, only one that yields data appropriate to its type.

Built-in Widgets

Below is a list of possible widget properties, with their default values (see generator/widget.py). Individual widgets may have additional properties.

attributes

Used for??

description

The tooltip for this field. Appears in response to onFocus.

description_msgid

i18n id for the description

label

Is used as the label for the field when rendering the form

label_msgid

i18n id for the label

visible

Defaults to 1. Use 0 to render a hidden field, and -1 to skip rendering.

Validators

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes

Pre-defined Validators

Archetypes provides some pre-defined validators. You use them by passing a sequence of strings in the validator field property, each string being a name of a validator. The validators and the conditions they test are:

inNumericRange
The argument must be numeric
isDecimal
The argument must be decimal, may be positive or negative, may be in scientific notation
isInt
The argument must be an integer, may be positive or negative
isPrintable
The argument must only contain one or more alphanumerics or spaces
isSSN
The argument must contain only nine digits (no separators) (Social Security Number?)
isUSPhoneNumber
The argument must contain only 10 digits (no separators)
isInternationalPhoneNumber
The argument must contain only one or more digits (no separators)
isZipCode
The argument must contain only five or nine digits (no separators)
isURL
The argument must be a valid URL (including protocol, no spaces or newlines)
isEmail
The argument must be a valid email address

Pre/post validation hooks

The current usefulness of Archetypes' validators is mitigated by weak error messaging, and the lack of support for separators in SSNs, phone numbers, and ZIP codes.

There are also hooks for pre and post validation that can be used to assert things about the entire object. These hooks are:

pre_validate(self, REQUEST, errors)
post_validate(self, REQUEST, errrors)

You must extract values from REQUEST and write values into errors using the field name as the key. If pre_validate throws errors, then other custom validators (including post) will not be called.

Writing a custom validator

Custom validator example

If you need custom validation, you can write a new validator in your product.:

from validation.interfaces import ivalidator
class FooValidator:
    __implements__ = (ivalidator,)
    def __init__(self, name):
        self.name = name
    def __call__(self, value, *args, **kwargs):
        if value == 'Foo':
            return """Validation failed"""
        return 1

Registering your validator

Then you need to register it in FooProduct/__init__.py method initialize:

from validation import validation
from validator import FooValidator
validation.register(FooValidator('isFoo'))

The validator is now registered, and can be used in the schema of your type.

Storage

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/10/27 09:30:16 $
Version:$Revision: 1.4 $
Web site:http://sourceforge.net/projects/archetypes
Original URL:http://plone.org/documentation/archetypes/quickref/

Basic Storages

There are a few basic storages available by default on Archetypes, including storages that store data on SQL. Here's a listing:

AttributeStorage

Simply stores the attributes right into the instance.

MetadataStorage

Stores the attributes inside a PersistentDict named _md in the instance.

ReadOnlyStorage

Used to mark a field as being ReadOnly

ObjectManagedStorage

Uses the ObjectManager methods to keep the attribute inside the instance. Allows you to make a folderish content object behave like a simple content object.

Advanced Storages

SQLStorage

Experimental storage layer, which puts the data inside SQL. Available variations are: MySQL and PostgreSQL. There's an initial implementation of an Oracle storage, but it isn't tested at the moment.

See How-To Below

APE (Adaptable Persistence)

The use of APE (formerly known as Adaptable Storage) as a storage layer with Archetypes is currently under investigation.

How to write your own Storage

XXX Not written yet.

Advanced Storage Manual

Author:Joel Burton
Contact:joel@joelburton.com
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.4 $
Web site:http://sourceforge.net/projects/archetypes
Covers:Archetypes 1.0beta

SQLStorage How-To: Using and Advanced Tips

Introduction

Where is the knowledge we have lost in information?

—T. S. Eliot, The Rock

The SQLStorage storage for Archetypes 27 allows you to transparently store attributes of your Archetypes objects in an SQL-backed database.

This is useful in many situations, such as:

  • You have other applications that need to simultaneous access data in relational database format.
  • You have a lot of existing data in a relational database format.
  • You or your boss/client are more comfortable knowing that your data is accessible in a relational database format.

This HOWTO assumes that you are comfortable installing Archetypes, and using it with non-SQLStorage, and are comfortable administering Plone/CMF/Zope sites.

This HOWTO explains how you can use relational database features like triggers and rules to store your data in traditional relational database parent/child tables, while still accessing it with traditional Zope accessors. While the advanced techniques in this HOWTO can be used with any database that supports triggers and updatable views, the example code is for PostgreSQL 25. It should not be difficult to translate these ideas to Oracle or other database. At the time of this writing, MySQL does not support triggers, views, or rules, so these ideas could not easily be implemented in MySQL.

Typographic conventions

In most relational databases, commands are case-insensitive: SELECT is the same as select. To help you understand the commands, however, I'll follow the traditional format of putting SQL commands in upper case.

Database identifiers (field names, table names, etc.) may or may not be case-sensitive, depending on your database. The examples here will be using PostgreSQL, which is case-insensitive for identifiers. I'll show table names with a leading capital letter, and field name in all lower case. However, you can enter these any way you like.

Comparing SQLStorage to other relational database interface strategies

The scope of what Archetypes accomplishes is similar to, yet very different from, other systems of connecting Zope to relational databases.

Archetypes stores the objects in the ZODB as an archetype object, having traditional Plone/CMF/Zope methods such as Title(), absolute_url(), etc. However, individual attributes (such as title, author, body, etc.) can be looked up in and stored in the relational database. Since objects are real Zope objects, they work naturally with acquisition, catalogs, and other Zope technologies. Since you can choose which attributes to store in the relational database, attributes that don't have a naturally tight fit with relational database structures can be left in the ZODB, as can ones that might easily fit in a relational database structure, but for which you have no external relational database access requirements.

Versus ZSQL methods

A more traditional method of Zope/Relational Database connection has been to store rows of information in a relational database, and create ZSQL Methods to look up and display this information. With this technique, you can associate a Python class with a relational database row 3, but the objects aren't real persistent Zope objects, and aren't found during catalog calls. This strategy requires customized integration to work with key Plone technologies such as acquisition, workflow, portal_forms, etc.

While there are worthwhile Zope product to simplify some of the details of traditional relational database storage and Zope (such as Znolk 28, which auto-generates database forms and storage methods), these still fall quite short of the interface simplication and power that Archetypes delivers.

Traditional SQL Method strategies for using Zope with relational databases are of most use when converting an existing site built using other web technologies (such as PHP or Perl), and in which you already have written the SQL statements for insterting, updating, deleting, viewing, etc., all of your object types.

[3]This technique goes by the colorful name pluggable brains.

Versus APE (formerly Adaptable Storage)

Shane Hathaway's product APE 26 (formerly called Adaptable Storage) allows you to store your Zope objects in different formats (such as in standard filesystem objects or on a relational database). In this case, segments of the ZODB space are "mounted" from results in a relational database. This means the entire object is kept in the relational database--all attributes, etc. Deleting an object from the relational database, adding it, or modifying it affects the the ZODB instantly since the this part of the ZODB is just a mounted pointer the relational database.

While APE is a technological accomplishment, and very useful for some projects, it doesn't fit perfectly into an existing natural database role. All ZODB objects are stored in a few very APE-oriented tables, rather than being stored in customizable, traditional-relational-database tables.

In addition, APE works by location, rather than by type (as Archetypes does). That is, everything in the folder /foo is controlled (mounted) by APE. If /foo contains all and only objects of a certain portal_type (like Customers) you could treat these tables as the "customer relational database", and work around the unusual object-to-relational database table structure. However, if there are different types stored in that directory, you end up with a mishmash of different types of data stored in the same tables, and don't have the straightforward setup of a "customer" table versus an "orders" table, etc. 4 With Archetypes, each portal_type maps to an individual table, regardless of where it is stored.

Lastly, APE does not produce the integrated form production/validation/editing systems that Archetypes does.

[4]It seems as if APE supports custom object marshallers and unmarshallers to create different table types for different object types; however, at this point, I haven't been able to find working examples of this that I could understand and apply.

Creating a Database-Stored Object

Let's start with a simple Archetypes object, representing a Customer:

# Customer.py
# Customer portal type (non-SQL storage)

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin

schema = BaseSchema + Schema((
    
    TextField('body',
              required=1,
              primary=1,
              searchable=1,
              default_output_type='text/html',
              allowable_content_types=('text/restructured',
                                       'text/plain',
                                       'text/html',
                                       'application/msword'),
              widget=RichWidget,
              ),
    
    StringField("phone",
                 index="FieldIndex",
                 ),
    
    )) + TemplateMixin.schema

class Customer(TemplateMixin, BaseContent):
    """Our example object"""
    schema = schema
    archetype_name = "Customer"
    actions = TemplateMixin.actions
                  
registerType(Customer)

This object defines two custom fields, body and phone (plus all the traditional metadata attributes that are brought in by BaseSchema). This object would be stored entirely in the ZODB by Archetypes; however, we can convert this to being stored in a relational database by making just two simple changes to the object:

  • Add an import to the beginning for the appropriate SQL database storage method.
  • Add an attribute storage to the fields we want stored in the database, and set these to our storage method.

Since we're using PostgreSQL in this example, we'll import the PostgreSQL storage method.

Our new object then becomes:

# CustomerSQL.py
# Customer portal type (SQL storage)

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
from Products.Archetypes.SQLStorage import PostgreSQLStorage

schema = BaseSchema + Schema((
    
    TextField('body',
              required=1,
              primary=1,
              searchable=1,
              storage=PostgreSQLStorage(),
              default_output_type='text/html',
              allowable_content_types=('text/restructured',
                                       'text/plain',
                                       'text/html',
                                       'application/msword'),
              widget=RichWidget,
              ),
    
    StringField("phone",
                storage=PostgreSQLStorage(),
                index="FieldIndex",
                ),

    )) + TemplateMixin.schema

class CustomerSQL(TemplateMixin, BaseContent):
    """Our example object"""
    schema = schema
    archetype_name = "Customer SQL"
    actions = TemplateMixin.actions
                  
registerType(CustomerSQL)

At this point, you should install our new Archetypes type and register it with portal_types.

Now, before we can begin using this object, we must do two things:

  • Add a database connector (in our case, PostgreSQL) to our site. We can use any PostgreSQL adapter; however, I've used ZPyscopgDA 29 for testing this, as this appears to be the best maintained of the noncommercial adapters.
  • In the archetype_tool, under the Connections tab, we need to set our database connector for this type of object to our new database connector. Note that in this tab, we have a default connection, and we can override this for an portal_type that uses SQLStorage. In our case, you can either set the default to the new connection, or the specific connection for our CustomerSQL type. However, since we'll be adding several other Archetypes types, it will be easier to point the default setup to your database adapter connection.

Before you go any further, make sure that the user you defined in your database connection has the ability to create tables, and insert, update, and delete from tables in your database. 5

[5]In real life, it isn't neccessary that the user have to have CREATE TABLE privileges, as you can create the table yourself before Archetypes tries to. However, it's easiest to let Archetypes create the table for you. In which case a good strategy is to grant CREATE TABLE permissions for the zope database connector user, let it create the table, then rescind that right (for security's sake) once the table has been created.

Testing Our New Object

Now, we can add an instance of our object through the standard Plone interface. Plone will recommend a unique ID; let's change that to "new_example". Put in values for body and phone. Notice that you can see these values in the view view, and can re-edit them in the edit view.

Switch to your database monitor (for PostgreSQL, this is psql) and examine the database:

database=# \d
                  List of relations
 Schema |           Name           |   Type   | Owner
--------+--------------------------+----------+-------
 public | customersql              | table    | joel

Archetypes has created our table for us. Examine the table:

database=# \d customersql
  Table "public.customersql"
  Column   | Type | Modifiers
-----------+------+-----------
 uid       | text | not null
 parentuid | text |
 body      | text |
 phone     | text |
Indexes: customersql_pkey primary key btree (uid)

Notice that Archetypes has created our body field as text field and the phone field as a text field. These transformations are part of the PostgreSQLStorage method, and can be easily changed in the source, should your needs require different mappings. 6 We'll look at changing those mappings later in this document, in Changing SQLStorage's Storage Methods: An Example With Lists.

Also, notice that there are two new fields created:

  • UID (uid): this is a unique identifier for your object
  • Parent UID (parentuid): this is the unique identifier (if any) for the parent (enclosing) container for your object.
[6]Or you can create the table in advanced of Archetypes, and choose whatever field types you want, as long as your database can cast Archetypes values into your field types. For instance, in our example, though text is an acceptable choice for phone numbers, we might prefer to have this stored as a varchar(20). We could have created the table ourselves and made this choice; when Archetypes goes to insert a text-type value into phone, PostgreSQL can cast this to varchar(20)

About UIDs

One of the smartest things about Archetypes is that it introduces the ideas of unique identifiers into CMF sites. Zope IDs must be unique within a folder, but need not be unique across a site. Therefore, keeping track of the fact that you have an object called Eliot isn't useful, since you may have several objects called that in different folders.

A common workaround has been to refer to objects by their path (eg, /animals/cats/Eliot), but this is fragile, since any change to the object ID, or the IDs of any of the parent objects will change the path and break these references.

Archetypes assigns each object a unique ID at creation 7, and then maintains a mapping of that unique ID to the current location of the object in the ZODB. If the object is deleted, Archetypes will remove it from its UID mapping.

[7]The IDs that are created are in the Plone default style, e.g., PortalType.2003-07-23.4911

Please note the difference between the Zope ID (the standard name for the object returned by getId()) and the Archetypes UID. When our object was created, Plone assigned it an ID like CustomerSQL.2003-07-23.4911. Archetypes used this ID as its UID. Even though we may change the object ID to new_example, it will keep its UID for the lifetime of the object. The UID should be treated as an immutable attribute.

Archetypes also creates a portal_catalog index for the UID field, so you can easily query the catalog using the UID. It also exposes several methods in its API for finding an object by its UID (from ArchetypeTool.py):

## Reference Engine Support
def lookupObject(self, uid):
    if not uid:
        return None
    object = None
    catalog = getToolByName(self, 'portal_catalog')
    result  = catalog({'UID' : uid})
    if result:
        #This is an awful workaround for the UID under containment
        #problem. NonRefs will aq there parents UID which is so
        #awful I am having trouble putting it into words.
        for object in result:
            o = object.getObject()
            if o is not None:
                if IReferenceable.isImplementedBy(o):
                    return o
    return None

def getObject(self, uid):
    return self.lookupObject(uid)

def reference_url(self, object):
    """Return a link to the object by reference"""
    uid = object.UID()
    return "%s/lookupObject?uid=%s" % (self.absolute_url(), uid)

We can use the method lookupObject(uid) to get the actual object by UID, or use reference_url(object) to generate a "safe" URL to an object that will always find it given its UID.

You can see the list of currently-tracked UIDs and actual objects in the archetype_tool, UID tab.

Parent UID

The Parent UID field created in our table is the UID of the container, if it is an Archetypes object (or some other kind of future object that might expose a UID).

This is very helpful for creating a simple parent/child relationship in Plone, as we'll see in the next section.

Customers and Orders

For example, a common database example is a database of customers and orders, where one customer can have several orders. Pseudo-SQL for this would be:

CREATE TABLE Customer
  ( custid SERIAL NOT NULL PRIMARY KEY
  , custname TEXT
  ... other customer fields ...
  );

CREATE TABLE Orders
  ( orderid SERIAL NOT NULL PRIMARY KEY
  , custid INT REFERENCES Customer
  ... other order fields ...
  );

The field custid in the orders table is a reference (called a foreign key) to the field custid in the customer table.

To create a similar structure in Archetypes, we need to create just two types: CustomerFolder and Orders. Objects of both of these types will get UIDs from Archetypes. But if we change our Customer type to become folderish (ie, derived from Archetypes's BaseFolder rather than BaseContent), it can contain objects, and we can add Orders objects inside of it. These Orders objects will have their Parent UID field set to the CustomerFolder UID, giving us an easy way to write ZCatalog queries for all orders with a certain customer UID, or SQL queries asking the same thing.

Creating in Archetypes

Let's create these two new archetypes. First, the CustomerFolder. This will be exactly the same as CustomerSQL, except using BaseFolder rather than BaseContent:

# CustomerFolder.py
# Customer portal type (SQL storage, folderish)

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
from Products.Archetypes.SQLStorage import PostgreSQLStorage

schema = BaseSchema + Schema((
    
    TextField('body',
              required=1,
              primary=1,
              searchable=1,
              storage=PostgreSQLStorage(),
              default_output_type='text/html',
              allowable_content_types=('text/restructured',
                                       'text/plain',
                                       'text/html',
                                       'application/msword'),
              widget=RichWidget,
              ),
    
    StringField("phone",
                storage=PostgreSQLStorage(),
                index="FieldIndex",
                ),
    
    )) + TemplateMixin.schema

class CustomerFolder(TemplateMixin, BaseFolder):
    """Our example object"""
    schema = schema
    archetype_name = "Customer Folder"
    actions = TemplateMixin.actions
                  
registerType(CustomerFolder)

Our Order type is straightforward. It will include the cost of an order, and shipping details:

# Orders.py

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
from Products.Archetypes.SQLStorage import PostgreSQLStorage

schema = BaseSchema + Schema((
    
    TextField('shipping_details',
              required=1,
              storage=PostgreSQLStorage()),
    FixedPointField('total_cost',
                    storage=PostgreSQLStorage())
    
    )) + TemplateMixin.schema

class Orders(TemplateMixin, BaseContent):
    """Our example object"""
    schema = schema
    archetype_name = "Orders"
    actions = TemplateMixin.actions
                  
registerType(Orders)

Testing Them Out

Register these two new types with portal_types and add a CustomerFolder object. You should be able to edit this data and see the resulting information in the table customerfolder without a problem.

As of the writing of this HOWTO, Archetypes does not expose a "folder contents" tab for folderish objects like our CustomerFolder. However, you can go to this view manually by visiting the new customer folder object, and changing the end of the URL to point to folder_contents. 8

[8]And you can add this action to the type so that it automatically shows as a tab up by adding it to the actions for CustomerFolder in portal_type.

Inside of the new customer folder, add an Orders object and enter details. Then, examine the orders table in the database:

database=# SELECT * FROM Orders;
          uid           |           parentuid            | shipping_details | total_cost
------------------------+--------------------------------+------------------+------------
 Orders.2003-07-23.4935 | CustomerFolder.2003-07-23.4609 | Shipping         |          0
(1 rows)

Notice how we get the parentuid value correctly. From our relational database, we could write a traditional query now on customers and the total of the orders as:

database=# SELECT C.uid, C.phone, SUM(O.total_cost)
             FROM CustomerFolder as C
               INNER JOIN Orders as O on (O.parentuid = C.uid)
             GROUP BY C.uid, C.phone;

Working With Existing Table Structure

Of course, if you're working with existing tables, or if you want to work with other SQL tools, chances are you want to use a more traditional primary key/foreign key setup than the Archetypes UID. Many databases use a serial column 9 (integers that increase for each new record) as a primary key.

[9]MySQL uses an integer columns with the autoincrement feature.

To do this with Archetypes, you can simply either:

  • create the table before you insert the first Archetypes record

or

  • modify the table after Archetypes creates it and starts using it.

For example, our customerfolder table was created automatically by Archetypes, and it contains a UID field, but not a traditional, numeric primary key. We can fix this by adding this:

ALTER TABLE Customerfolder ADD customerid INT;

CREATE SEQUENCE customerfolder_customerid_seq;

UPDATE Customerfolder SET customerid = nextval('customerfolder_customerid_seq');

ALTER TABLE Customerfolder ALTER customerid
  SET DEFAULT nextval('customerfolder_customerid_seq');

ALTER TABLE Customerfolder ALTER customerid SET NOT NULL;

ALTER TABLE Customerfolder DROP CONSTRAINT customerfolder_pkey;

ALTER TABLE Customerfolder ADD PRIMARY KEY ( customerid );

ALTER TABLE Customerfolder ADD UNIQUE ( uid );

Note that syntax for altering tables, adding primary keys, etc., varies considerably from one relational database to another, so if you're not using PostgreSQL, you'll want to research how to do this with your relational database. Also note that it's rather wordy to make this changes, whereas having the table setup properly in the first place is much more succinct:

CREATE TABLE Customerfolder ( customerid SERIAL NOT NULL PRIMARY KEY,
                              ...
                            )

So it may often be to your advantage to create the table before Archetypes.

Now we have a traditional primary key that is automatically increased, but since its not part of Archetypes's schema, it will leave it alone.

Important

Notice that we make the UID field UNIQUE. This guarantees that two records cannot have the same UID. Even though we're no longer using the Archetypes UID as our primary key, it is still critical to keep this field unique.

When Archetypes edits an object, it doesn't know if the object exists in the relational database yet or not. Therefore, it tries to insert a record for this object. If this fails, it then updates the existing record.

This behavior may change in future versions of Archetypes, but, unless it does, you must make sure UID stays unique or else you'll have multiple copies of your objects' data in the relational database, only one of which will be correct.

If You Need A Very Different Table Structure

Instead of having Archetypes write to the real table, we can have Archetypes insert to a view of the table. Such a view can have fields that looks like those that Archetypes expects, but actually insert the information in different places and different ways.

This is especially useful if you have existing relational database tables that have non-Zope-like fields, names, etc.

To do this, let's first move the real table out of the way:

ALTER TABLE customerfolder RENAME TO customerfolder_table;

This is because Archetypes expects to work with customerfolder, and we want that to be our view. The actual table name doesn't have to be customerfolder_table; it can be whatever we want it to be.

Now, let's create our view:

CREATE VIEW customerfolder AS
  SELECT uid, parentuid, body, phone
    FROM customerfolder_table;

Now, we'll make this view updatable so that new records can be inserted into it. The syntax for this is very relational database-specific; you'll need to change this for other database systems. Following is our PostgreSQL syntax:

CREATE RULE customerfolder_ins AS
  ON UPDATE TO customerfolder DO INSTEAD (
    INSERT INTO customerfolder_table ( uid, parentuid, body, phone )
      VALUES ( NEW.uid, NEW.parentuid, NEW.body, NEW.phone ); );

Now, Archetypes can insert to customerfolder, assuming that it is a table, when in fact, we're rewriting its work to write to the real table.

So that Archetypes can do updates and deletes, we'll need to add rules for that, too:

CREATE RULE customerfolder_del AS
  ON DELETE TO customerfolder DO INSTEAD
    DELETE FROM customerfolder_table WHERE uid=OLD.uid;

CREATE RULE customerfolder_upd AS
  ON UPDATE TO customerfolder DO INSTEAD
    UPDATE customerfolder_table
      SET parentuid = NEW.parentuid
        , body = NEW.body
        , phone = NEW.phone;

In this example, our real table and view are only slightly different, but this strategy is helpful when dealing with existing tables that have many fields not of interest to Archetypes, or when our relational database tables have a different type of structure than is natural to Archetypes. We'll see advanced uses of this later.

FIXME: Show It Working

Using Traditional Referential Integrity For the Child Table

For our orders table, we can do the same thing to give that a serial-type primary key that is more traditinal for a relational database. In addition, though, it's likely that we want to child orders table to relate to the parent customerfolder table by the new customerid rather than the Archetypes-oriented Parent UID.

To do this, let's create a customerid field to the order table:

ALTER TABLE Orders ADD customerid INT;

UPDATE orders 
  SET customerid = Customerfolder.customerid
  FROM Customerfolder
  WHERE Orders.parentuid = Customerfolder.uid;

ALTER TABLE Orders ALTER customerid SET NOT NULL;

ALTER TABLE Orders ADD FOREIGN KEY (customerid) 
  REFERENCES Customerfolder;

Now we have a traditional primary key/foreign key relationship between our tables. If we have a orders record for customer #1, we won't be able to delete this customer until we delete these orders.

We need to set it up so that when we add an order via Plone, we look up the customerid from the customerfolder table and set it in the orders table for the new record.

To do this, we'll add a trigger that, before completing an insert on the order table, figures out the customerid and makes that part of the insert.

Different database implement triggers in different ways. In PostgreSQL, a trigger statement is a simple statement that calls a function. This function can reference and change a record structure called new which reflects the new record being inserted (or for an update, the new record to be written). Functions in PostgreSQL can be written in different languages, including Python; for our example, however, we'll use PostgreSQL's built-in PL/PgSQL language, a PL/SQL-like language that is simple to write and understand.

Before you can write PL/PgSQL functions, you must enable this by adding this language to your database. From the shell:

$ createlang plpgsql your_db_name

Our trigger function will be:

CREATE OR REPLACE FUNCTION order_ins () RETURNS TRIGGER AS '
  BEGIN
  NEW.customerid := customerid 
                      FROM customerfolder AS C
                      WHERE NEW.parentuid = C.uid;
  RETURN NEW;
  END;
' LANGUAGE plpgsql;

Now, let's create the trigger:

CREATE trigger order_ins_trig BEFORE INSERT ON Orders
  FOR EACH ROW EXECUTE order_ins();

Our real test is whether this works in Plone, but for a Q&D simulation, we'll test this in the SQL monitor by manually inserting a child orders record and seeing if it gets the parent UID (for your tests, use the real UID of one of your CustomerFolder objects):

database=# insert into orders (uid,parentuid)
             values ('test', 'CustomerFolder.2003-07-23.4609');
INSERT 35162 1
database=# select uid, parentuid, customerid from orders;
          uid           |           parentuid            | customerid
------------------------+--------------------------------+-----------
 Orders.2003-07-23.4935 | CustomerFolder.2003-07-23.4609 |          1
 test                   | CustomerFolder.2003-07-23.4609 |          1
(2 rows)

In the above output, the second record is our newly inserted record, and it did get the correct customerid field.

Referential Integrity & Prevention of Deletions

Now our traditional referenial integrity is set up. If we try to delete a customer that has related orders, we'll get the error that we expect and want:

database=# DELETE FROM Customerfolder;
ERROR:  $1 referential integrity violation - key in
        customerfolder still referenced from orders

However, we can still have problems in Plone.

Our current example has the child order objects nested inside of the parent customer objects, so it's not possible to delete a customer without deleting the orders because the customer itself is a folderish object, so the orders would be deleted automatically.

However, this may not always be the setup. Sometimes, you won't be able to have a child object contained physically in the parent object, and you'll connect things using attributes yourself.

For example, we might want to keep track of which staff member handles this customer. We could do this by nesting the CustomerFolder objects inside a Staff object, but this might, for different reasons, not be possible or preferable. Instead, we would create a staffuid attribute on the CustomerFolder type, and populate this with the UID of the staff member.

In cases like this, if you have the referential integrity in the database connected properly, you won't be able to delete the staff record if related customers exist, but you will be able to delete the customer object in the ZODB without problems--stranding the data in the relational database and ruining your database connections.

This is because the current version of Archetypes doesn't deal properly with deletion exceptions. Archetypes issues an SQL delete on the staff record, but since there are related children, it fails. This raises an exception, but Zope only stops a deletion by raising and propagating a particular exception--others just get logged and ignored. Therefore, the database record can't be deleted (your database will refuse to do this, regardless of how Zope asks), but the pointer to it in the ZODB will be deleted. So the staff member won't be visible on the site, but the data will stay in the relational database.

To fix this, apply the patch FIXME included with this howto. This raises the proper exception (BeforeDeleteException) if the SQL deletion call fails, which causes the Plone object deletion to fail. Unfortunately, you'll get an standard error message, rather than a polite, user-friendly explanation, but this is better than silently ignoring the database failure and moving on. 10

This patch was developed for the current version of Archetypes. This fix may be included by the time you read this HOWTO. If so, please let me know, and I'll update this section.

[10]If you want to make a nicer deletion error message, you could modify the standard_error_message method.

Cascading

PostgreSQL and most other databases that support referential integrity can handle deletion of parent records in other ways. The default is to block the deletion of parent with related children, but you can also opt to automatically delete the children when a related parent is deleted.

This option is called "cascading" a deletion. To set this up, we'd create our child table differently:

CREATE TABLE Child (
  childid SERIAL NOT NULL PRIMARY KEY,
  parentid INT NOT NULL REFERENCES Parent ON DELETE CASCADE
  ...
);

Now, when the parent is deleted in the database, it will delete the related child records rather than raising an exception.

Of course, this won't automatically delete the Zope ZODB objects for the children, but the next section of this tutorial deals with the question of how to have operations in the database "notify" Zope of changes to make in the ZODB. Using techniques explained there, we'll be able to have the child ZODB deleted for us.

Propagating Direct DB Changes Back to ZODB

Sometimes in Zope projects, the changes all come from the Zope interface, and the relational DB storage is just to soothe ZODB-nervous customers, or to allow reporting from standard SQL tools. In this case, the setup we have would be acceptable.

In cases where changes must propigate to Zope, here are some problems we need to solve:

  • Records that are inserted directly into the database are never visible to Zope, as ZODB objects aren't instantiated for these records.
  • Records that are deleted directly in the database are never deleted from Zope. Therefore, objects will remain in the ZODB that point to records that are no longer in the relational database. The current version of Archetypes raises an error if you try to view these objects or get the attributes that are stored in SQLStorage.
  • Records that are changed in the database are visible immediately to Zope, but any Catalog entries won't be updated, making Catalog queries incorrect.

Forcing Catalog Reindexes on Update

There's no way for our relational database to directly affect the ZODB. Instead, we'd have to either make a request that the ZServer hears and passes on to Plone, or we'd have to write a standalone Python program that connects to the ZODB to make these requests. 11

[11]wget-based strategy outlined below, and talk directly to Zope.

The latter can be very slow (starting connecting to the ZODB can take a while), and would only work on the machine that the ZODB is hosted on, whereas the first choice is ZEO-friendly, remote database machine friendly, and generally easier and faster.

By creating a custom function in PostgreSQL, we can execute a web or XMLRPC request to reindex the catalog.

We'll need a bit of Zope support: Zope will be given the UID for the record that has changed, and it needs to find the real Zope object, and call reindexObject() on it.

We could do this by adding a method to ArchetypesTool.py 12, but, for simplicity's sake, we'll implement it as a PythonScript:

# "reindex_by_uid"

## Parameters: uid

o = context.archetype_tool.lookupObject(uid)
o.reindexObject()
return "ok"

You can test calling this by giving it a UID of an existing object. This should be recataloged; you can see the changed catalog information by viewing the portal_catalog.

[12]I think it would be a good idea to move helper methods like this into the Archetypes API. We'll see what happens.

Functions can be written in several procedural languages in PostgreSQL, including Python. However, making a web request is an "unsafe" act in PostgreSQL, so we need to use a language that supports making unsafe calls. PostgreSQL refers to these languages as "untrusted" languages, and traditionally names them with a trailing U. At this time, the Python language is implemented as a trusted language but not as an untrusted language. The built in, easy-to-use PL/PgSQL is also implemented as a trusted language only.

This is changing, howver, in PostgreSQL 7.4. Due to RExec (restricted environment) module being dropped from Python, PL/Python (trusted) is no longer part of PostgreSQL, and PL/PythonU (a new, untrusted variant) is being added. It would be easy to write the functions below as PL/PythonU functions. 13

Our current options for untrusted languages, though, are PL/tclU (tcl untrusted), PL/perlU (perl untrusted), and C. We'll use Perl's untrusted language, plperlu.

[13]In earlier versions of PostgreSQL, you could use PL/Python (trusted) by recompiling PL/Python after adding the required-for-import modules to the list of acceptable modules. However, you'd have to install this modified PL/Python library on every PostgreSQL server that was going to use these functions.

Make sure that Perl untrusted functions are enabled for your database:

$ createlang plperlu your_db_name

Then, in psql, we'll create a function that uses wget, a common, simple command line http request tool:

CREATE OR REPLACE FUNCTION reindex_by_uid (text) RETURNS text as '
  $uid = shift;

  # Auth required to run PythonScript w/right role
  #    or, dont pass auth stuff and make PyScript proxied to role
  $AUTH = "--http-user=joel --http-passwd=foo";

  # Set to your server and portal name
  $SERVER = "http://localhost:8080/arch";

  # wget options
  #   -q     is quiet (no status output)
  #   -O -   is to send results to standard output
  #   -T 5   is to timeout after 5 seconds
  $WGET = "/usr/bin/wget -q -O - -T 5";

  $cmd = "$WGET $AUTH $SERVER/reindex_by_uid?uid=$uid";

  # output it to psql log so the user has some idea whats going on
  elog NOTICE, $cmd;

  return `$cmd`;
' LANGUAGE plperlu;

As noted in the function comments, above, you can pass authorization headers via wget (using the $AUTH variable). If you don't want to have the username and password in a plaintext script on the PostgreSQL server, or don't want them to travel across the network, you could instead make the PythonScript usable by anonymous users, and have it proxied to a high-enough level role that it can reindex all needed content.

The elog statement in the function outputs the $cmd variable through the PostgreSQL logging system, making it appear in psql as a notice. This is useful for debugging and providing feedback, but may confuse some database front-end systems that don't expect notices to be raised. In addition, it exposes your username and password to the log everytime a record is updated. Once you have the function working for your database, you should probably remove this line.

Now, in PostgreSQL, if we update a record, we can force a reindex by calling this, as in:

database =# SELECT reindex_by_uid('*uid*');

Using Triggers to Automate This

Of course, we'll want to have this happen automatically when we update a record, rather than having to execute the select statement. To do this, we'll write a trigger in PostgreSQL that triggers whenever an update is made to our customer table.

To do this, we need a trigger function that is called when our table is changed. In a perfect world, we could use our Perl function, above. However, at this time, Perl functions can't be used as trigger functions (though functions written in PL/Python, PL/PgSQL, C, PL/tcl, and other procedural languages can). For a simple wrapper function like this, PL/PgSQL would be the normal choice.

Our trigger function is:

CREATE OR REPLACE FUNCTION customer_upd() RETURNS TRIGGER as '
  BEGIN
    PERFORM reindex_by_uid(OLD.uid);
    RETURN NEW;
    END;
' LANGUAGE plpgsql;

Then the trigger itself:

CREATE TRIGGER customer_upd 
  AFTER UPDATE ON Customerfolder
    FOR EACH ROW EXECUTE PROCEDURE customer_upd();

Now, whenever we make an change to our table, our trigger calls the PL/PgSQL function customer_upd. This, in turn, calls our general reindexing function, which makes a simple web request that Zope hears and calls the reindexing support PythonScript. It seems like a lot of redirection, but works fine. Go test it out. Make a change to your object's body field directly via PostgreSQL, then check the catalog results and see that the appropriate field (in this case, SearchableText) has been updated.

Syncronizing Deletes and Inserts

(For advanced readers, since some of the detail is left to you to fill in).

Deletes and inserts are a bit trickier than updates, but also more critical to get right. If you don't use the update/reindexing technique, above, everything works fine, except your ZCatalog calls will be out-of-date, and even those will be fixed for you the next time you edit the object in Plone or do a manual recataloging. If inserts and deletes aren't propagated from the relational database to ZODB, data will be missing and errors will be raised.

For this reason, it may be reasonable to decide that record additions and deletions should happen only via the Plone interface. You can make quick data changes in the relational database (helpful for fixing typos across a series of records, or reformatting a field, etc.), but never inserts or updates.

However, if you want or need to have insertions/deletions made to the relational database and propagated into the ZODB, the following sections explain how to do this.

Inserts

Inserts would be handled using the saem general concepts as the update/reindex fix: write a Zope PythonScript that creates the object when it is passed the different fields required for object creation. Then write a PL/perlU function that crafts a command line wget statement that calls our Zope PythonScript, all of this being set into motion by a trigger on our table.

First, we'll want to create a PythonScript that will create our content for us. The trickiest part is coming up with a good, unique UID. If we knew that something in our table that was being inserted was unique, we could use that (prepended by the type name, to make sure it was unique across types so that Archetypes could use it). However, to look and feel consistent with UIDs created through the web, we'll copy in the same UID-generating code that Plone itself uses. 14

[14]It would be smarter, of course, to simply call an API method to get a unique ID in this format, but I couldn't find a suitable way to do so given how this has been implemented. If any readers have a suggestion, please let me know.

Our script will be called create_customerfolder, and will be:

## create_customerfolder
## Arguments: phone, body

# this function ripped out of CMFPlone/FactoryTool.py
def generateId(self, type):
    now = DateTime()
    name = type.replace(' ', '')+'.'+now.strftime('%Y-%m-%d')+'.'+now.strftime('%H%M%S')

    # Reduce chances of an id collision (there is a very small chance that somebody will
    # create another object during this loop)
    base_name = name
    objectIds = self.getParentNode().objectIds()
    i = 1
    while name in objectIds:
        name = base_name + "-" + str(i)
        i = i + 1
    return name

context.invokeFactory( "CustomerFolder"
                     , id=generateId(context, 'CustomerFolder')
                     , phone=phone
                     , body=body)
return "ok"

You can test this script by calling it through the web, or by using the Test tab on the PythonScript. Give it a body and a phone and it will create a new CustomerFolder object in the current context.

Now, we'll write a plperlu function that will craft a proper wget web request to call this script:

CREATE OR REPLACE FUNCTION customerfolder_add (text,text) RETURNS text as '
  $body = shift;
  $phone = shift;
  $sec = "--http-user=joel --http-passwd=foo";
  $portal = "/arch";
  $server = "localhost:8080";
  $wget = "/usr/bin/wget";
  $cmd = "$wget $sec -q -O - http://$server/$portal/create_customer?body=$body".
         ''\\\&''."phone=$phone";
  return `$cmd`;
' LANGUAGE plperlu;

FIXME -- CLEAN THIS UP LIKE PREVIOUS

The difference is that we don't want to really do an insert in the database, though--when Zope does its object creation, it will create the database record itself in Archetypes. So we want our original direct-in-DB insert to be ignored.

We could do this with a trigger, and have the trigger raise a failure so the insert didn't happen. This, though, would be confusing for the user, who would see an error message, and, if we were in the middle of transaction, would spoil that transaction, aborting it and preventing other actions from happening in the relational database. A better solution, then, would be to use a feature of PostgreSQL called rules, which we saw briefly earlier If You Need A Very Different Table Structure.

Rules are rewritings of a query to do anything instead of the called-for-query. We'll "rewrite" our INSERT query to a SELECT query, which in this case will select the PL/perlU function that calls wget to notify the Zope PythonScript to create the object. Again, it seems like a lot of redirection, but works well.

Rule creation is covered in the PostgreSQL documentation, in The Rule System 30. Our rule will be:

CREATE RULE customer_ins AS 
  ON INSERT TO Customerfolder 
  WHERE NEW.uid = 'direct'
  DO INSTEAD 
    SELECT customerfolder_add ( NEW.body, NEW.phone );

Now, when you want to insert a record directly, you can do so by:

INSERT INTO customer_ins ( uid, body, phone ) 
  VALUES ( 'direct', 'body goes here', '555 1212' );

The WHERE NEW.uid = 'direct' clause is required to prevent Zope's insertion from triggering our rule which would trigger Zope's insertion ... and so on into permanent recursion. Any attempt to insert a record with a UID not equal to "direct" will go directly into the database without triggering any action from Zope. Since Zope will be inserting a record with real UID, it will always therefore bypass our rule.

Deletes

Deletes would be handled like inserts, but our PythonScript would obviously do the deleting for us instead.

Details here can be figured here by the reader, but you'll need a PythonScript to handle the deletion, a plperlu function to craft the proper wget command, and a trigger that handles ON DELETE.

Since we can't stop recursion from happening with a DELETE the way we can with an INSERT, we should have our trigger call Zope not just as DO INSTEAD but DO, so the Zope deletion happens and the normal PostgreSQL deletion happens. When the Zope deletion tries ...

XXX FIXME XXX

Inserting a Child Record

If we want to allow direct database insertion of the child Orders objects, we have to consider one additional wrinkle: the Orders objects are meant to be physically contained in their related parent Customer object. Therefore, our PythonScript that would add their child Orders record must make the context for the invokeFactory call be the context of the enclosing Customer object.

We could accomplish this easily by passing the child Orders PythonScript add helper the UID of the Customer, and it could lookup the Customer object (using the API demonstrated earlier for looking up an object given its UID). Then we could use that context for our invokeFactory call.

FIXME: show this.

Changing SQLStorage's Storage Methods: An Example With Lists

Creating the Types and Fixing the Mapping

If we add a list type to our customer object, we run into a snag with marshalling and unmarshalling.

Let's add the object type, first as a standard Archetypes object stored completely in the ZODB:

# CustomerList.py

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin

schema = BaseSchema + Schema((
    
    TextField('body',
              required=1,
              primary=1,
              searchable=1,
              default_output_type='text/html',
              allowable_content_types=('text/restructured',
                                       'text/plain',
                                       'text/html',
                                       'application/msword'),
              widget=RichWidget,
              ),
    
    StringField("phone",
                 index="FieldIndex",
                 ),
    LinesField("clients"),
    
    )) + TemplateMixin.schema

class CustomerList(TemplateMixin, BaseContent):
    """Our example object"""
    schema = schema
    archetype_name = "Customer List"
    actions = TemplateMixin.actions
                  
registerType(CustomerList)

Put this in the schema and restart Archetypes.

As we're storing this in the ZODB (not in the relational database), everything works fine. The form widget for the clients field is a textarea in which the user enters newline-separated entries. These are converted by Zope to a Python list and stored as an attribute of the object.

If we create a new Archetypes type that contains this same lines field, but tries to store it in the relational database, we run into problems with Archetypes's default behaviors.

First, the object type:

# CustomerListSQL.py

from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
from Products.Archetypes.SQLStorage import PostgreSQLStorage

schema = BaseSchema + Schema((
    
    TextField('body',
              required=1,
              storage=PostgreSQLStorage(),
              primary=1,
              searchable=1,
              default_output_type='text/html',
              allowable_content_types=('text/restructured',
                                       'text/plain',
                                       'text/html',
                                       'application/msword'),
              widget=RichWidget,
              ),
    
    StringField("phone",
                 index="FieldIndex",
                 storage=PostgreSQLStorage(),
                 ),
    LinesField("clients",
              storage=PostgreSQLStorage()),
    
    )) + TemplateMixin.schema

class CustomerListSQL(TemplateMixin, BaseContent):
    """Our example object"""
    schema = schema
    archetype_name = "Customer List SQL"
    actions = TemplateMixin.actions
                  
registerType(CustomerListSQL)

Restart Archetypes, and don't forget to add the new type to portal_types.

At the time of this writing, Archetypes tries to create the new table with the field type lines for the clients field. This is not a valid field type for PostgreSQL (or any other database I know of), and therefore, the addition of the table fails, and any attempt to add an object of this type fails since there is no table to store them in.

There are several different ways we could fix this problem.

  • Create the table before Archetypes does.

    If the table already exists, Archetypes won't create it. We can easily create the table, and give it a text type for the clients field.

    The table structure would be:

    CREATE TABLE Customerlistsql (
        uid text NOT NULL PRIMARY KEY,
        parentuid text,
        body text,
        phone text,
        clients text
    );
    
  • Change the mapping performed by Archetypes.

    We can fix this problem by patching SQLStorage.py to do the right thing and create a text field by changing the type mapping that Archetypes does. You can do this either by editing SQLStorage.py and making changes for your database type, or, if you'd rather not modify the Archetypes source code, you can subclass your storage type, make the changes there, and use this new subclassed storage type. We'll look explicitly at the subclassing strategy later in this document; for now, we'll make changes directly to SQLStorage.py.

    The change we want is in the dictionary db_type_map, which translates an Archetypes field type into the relational database field type. As of this writing, there is no translation for lines, so Archetypes uses lines as the relational database field type. We'll add a translation for lines to become text:

    db_type_map = {
        'object': 'bytea',
        'file': 'bytea',
        'fixedpoint': 'integer',
        'reference': 'text',
        'datetime': 'timestamp',
        'string': 'text',
        'metadata': 'text', 
        'lines':'text', # THIS IS THE CHANGE
        }
    

    If you restart Archetypes and try to add your object now, it will create the table and let you create objects.

  • Create a suitable domain in PostgreSQL.

    PostgreSQL, like many advanced SQL databases, supports the notion of domains. A domain is a custom-defined explanation of a database type, which can be referenced as if it were a real type.

    For example, if you commonly want to use a varchar(20) not null for telephones in a database, you could create a domain called telephone that is defined as varchar(20) not null, and then you can simply create your tables with the field type telephone to get the right definition and behavior.

    We'll create a domain called lines:

    CREATE DOMAIN lines AS text;
    

    Domains can contain restrictions (such as CHECK constraints and NOT NULL requirements), but in this case, we don't want or need any of these. This simple definition will be enough.

    Now, when Archetypes tries to create a field with the type lines, it will succeed.

    In some ways, this is the best strategy of our three, as it lets other applications and users understand that this is a lines field. It's still stored as text, and behaves as such, but if you look at the table structure, you'll see lines, which can remind you of its newline-separated, list-oriented use.

Fixing the Translations

A serious problem still persists, though.

The newline-separated entries from the form (the "lines") are turned into a Python list by Archetypes, such as:

[ 'cat', 'dog', 'bird' ]

but SQLStorage attempts to store this list directly in the database. This ends up as the literal string value "['cat,'dog','bird']" which is Archetypes stores in the database:

database=# SELECT uid, clients FROM Customerlistsql ;
               uid               |        clients
---------------------------------+------------------------
 CustomerListSQL.2003-07-23.1619 | ['cat', 'dog', 'bird']
(1 row)

Unfortunately, this string representation of a Python list is a difficult format to work with in the database, and not handled correctly coming out by Archetypes. When Archetypes gets the data back from the relational database, it sees it as a single string. It tries to turn this isnto a list, with the following results:

[
'
c
a
t
'
,

'
d
o
g
'
,

'
b
i
r
d
'
]

As this is the way Python handles being given a string and being told to treat it like a list.

The solution is that we want to write a custom marshaller and unmarshaller. These are the routines that Archetypes will run on a value before it tries to write them to the database, and after it retrieves the value from the database.

There are hooks in Archetypes for this: any function called map_XXX is called when storing field type XXX and a method called unmap_XXX is called when retrieving field type XXX.

Our mapping will convert this list back to a newline-separated string, and this is the format it will be given to our relational database as:

def map_lines(self, field, value):
    return '\n'.join(value)

Our unmapping method will convert the newline-separated string back to a Python list:

def unmap_lines(self, field, value):
    return value.split('\n')

Both of these should go into SQLStorage.py, as methods the class SQLStorage or as methods of the class for your particular relational database. If don't want to (or can't) modify the source to Archetypes, you could subclass your storage class, add the methods to the subclass, and have your object schema fields use your new, subclass storage type. We'll cover this concept of subclassing a storage class extensively later, when we subclass an improved version of the PostgreSQL storage class.

Now we can transparently work with our lists: they appear and are edited on the form as a newline-separated string (so we can easily edit them in a textarea), they're handled in Zope as a Python list object (so we can work naturally with them and don't have to be concerned with how they're stored), and they're stored in the relational database as a simple newline separated list so we can access them simply. 15

[15]Thought PostgreSQL doesn't have any problems arising here, you should ensure that there are no problems with your database converting newlines to another character or messing with the text fields in any way that will prevent our unmap_lines method from being able to retrieve the individual lines. If some kind of unhelpful conversion does occur, you should check to see what kind of binary storage your database offers, as binary storages perform no conversion at all in a relational database. PostgreSQL offers a binary storage type, bytea, but we don't need it in this case as it leaves the newlines alone.

Even Better: Turning Into Arrays

While our solution above lets Archetypes store the data and get it back in one piece, it isn't very suitable in the relational database: most relational database querying programs and reporting programs are ill-equipped to deal with searching for individual values that are stuffed into text fields.

To find all customers that have two values, "fish" and "cat", in clients, you could write queries like:

SELECT * FROM Customerlistsql
  WHERE clients LIKE 'cat\n%fish'
    OR clients LIKE 'cat\n%fish\n%'
    OR clients LIKE '%\ncat\n%fish'
    OR clients LIKE '%\ncat\n%fish\n%'
    OR clients LIKE 'fish\n%cat'
    OR clients LIKE 'fish\n%cat\n%'
    OR clients LIKE '%\nfish\n%cat'
    OR clients LIKE '%\nfish\n%cat\n%'

(and this is still an incomplete example for this!)

However, this is ugly, slow, unindexable 16, and error-prone, especially as you add more predicates to the logic.

[16]Actually, in some databases, including PostgreSQL, there are ways to do a full-text index on a field like this, but this would still be suboptimal compared to more natural ways to store multiple values on a field.

We'll exploit a feature of PostgreSQL that allows us to store arrays in a field, so that one field holds an array of values. While this is similar to storing as newline-separated text, there are many functions in PostgreSQL that can quickly find records having a value in an array, or count the number of values in an array, and so on--all the things that would be slow and unwieldy using text.

First, let's change our table structure to use arrays:

database=# ALTER TABLE Customerlistsql DROP CLIENTS;
ALTER TABLE

database=# ALTER TABLE Customerlistsql ADD CLIENTS text[];
ALTER TABLE

The type text[] is a PostgreSQL type for storing an array of text values. We can test out the array storage works directly in PostgreSQL by updating an existing record and examining it:

database=# UPDATE Customerlistsql SET clients='{cat,dog,bird}';
UPDATE 1

database=# SELECT uid, clients FROM Customerlistsql;
               uid               |    clients
---------------------------------+----------------
 CustomerListSQL.2003-07-23.1619 | {cat,dog,bird}
(1 row)

database=# SELECT uid, clients[1] FROM customerlistsql;
               uid               | clients
---------------------------------+---------
 CustomerListSQL.2003-07-23.1619 | cat

Now we can change our map_lines and unmap_lines methods from above, to write out and retrieve values written in this format:

def map_lines(self, field, value):
    return "{%s}" % ','.join(value)

def unmap_lines(self, field, value):
    return value.strip("{}").split(',')

Note

These are very naive implementations, as they do not deal with the possibility that a list item might have a comma in it. It would be quite easy, though, to write versions that quoted the comma and unquoted it for unmapping.

Restart Archetypes to pick up the changes to the storage type, then edit an existing or add a new object. Notice how the values you put into the clients field end up as array in PostgreSQL, and are read correctly.

Creating a new storage class

FIXME

Dynamically changing storage types

FIXME, but for now:

| You add "storage=PostgreSQLStorage()" on fields that you want
| in the PostgreSQL database. This makes it easy to have some
| fields stored in a PostgreSQL database and others in a Sybase
| database. I realise this.
|
| Let's say that I store everything in PostgreSQL and then one day
| I want to switch to Sybase. Then I'd have to change every
| "storage=PostgreSQLStorage" to "storage=SybaseStorage".
| What if the class had a 'storage' attribute that you could set
| instead? Then you could only specify "storage=SQLStorage".
|
| I think most people only use one RDBMS? So then this would
| be the simplest thing. And also if you want your new type public,
| everyone won't probably use the same database. Then you'd
| like to change a field in the ZMI to say that all relational database
| fields should be stored with Sybase. Anyone agree?
|
| But I may be ranting about something that already exists?

In fact there is an (undocumented) method to change storages. It is
called (guess what?) setStorage, and it's a method of the Field
class. So, if you have an existing object, and you want to migrate
from one storage to another, you need to do something like this (in
an External Method, of course):

----------------------------------------------------------------------
from Products.Archetypes.Storage import SybaseStorage # Does not exist currently

results = root.portal_catalog(portal_type='MyObjectType')
for r in results:
   r.getObject().Schema()['fieldname'].setStorage(SybaseStorage())
---------------------------------------------------------------------

Then you would shut down your zope instance and adjust the schema
definitions accordingly. NOTE: After migrating your objects.

You also need to make sure the column already exists in your
databases.

from mail on the Archetypes-devel mailing ilst by Sidnei da Silva.

Integrating relational database schemas into Archetypes

One interesting trick that could benefit us is using schemas with Archetypes.

While many documents refer to a database schema as the definition of its tables, views, etc., I'm referring here to the feature of PostgreSQL that give a database multiple namespaces.

For example, in the database mydb, we could have only one table called ideas. But if we create a new namespace (schema), we can put this a new ideas table in the new namespace.

For example:

database=# CREATE TABLE Ideas ( idea text );
CREATE TABLE

database=# CREATE TABLE Ideas ( idea text );
ERROR:  Relation 'ideas' already exists

database=# CREATE SCHEMA More;
CREATE SCHEMA

database=# CREATE TABLE More.Ideas ( idea text );
CREATE TABLE

Now we have two independent tables called ideas, and can choose which one to use by referring to it by namespace:

database=# INSERT INTO Ideas VALUES ( 'commute to work.' );
INSERT 116491 1

database=# INSERT INTO More.Ideas values ( 'go out with friends.' );
INSERT 116492 1

The tables are completely separate:

database=# SELECT * FROM Ideas;
       idea
------------------
 commute to work.
(1 row)

database=# SELECT * FROM More.Ideas;
         idea
----------------------
 go out with friends.
(1 row)

If you don't name a schema explicitly, PostgreSQL uses your schema path to determine your schema. So, even before you knew you were using schemas, you were! The default schema in most PostgreSQL setups is the same name as the user, so I could see the "unnamed" ideas table above as joel.ideas.

The interesting idea here is that we could have different database adapters that have different connecting PostgreSQL users, each having a different schema path. Then, depending on which database adapter I used, I could see different results.

Or, even more interesting, we can ask PostgreSQL to change our schema path for us by executing the SQL command SET search_path=XXX, where XXX is our new, comma-separated search path. If we set our search path differently, we could get different results. Useful ideas would be to have the search path set differently based on who the logged in user was, or what her or his roles were, etc.

Using this ideas we could implemented Archetypes-stored objects that provided versioning or security capabilities that were dependant on users or roles: the user june would see results from one table, where rebecca would see results from another. All this would happen transparently and quickly at the PostgreSQL level, without having to put code for this in all your accessors in Archetypes.

Of course, we'd have to make sure inserts and deletes kept the tables in sync, otherwise june would get an error trying to look at a ZODB object for which there is no corresponding record in her schema's table. But this syncronization would be fairly straightforward to create using triggers on the tables.

There are no hooks currently in Archetypes SQLStorage to execute the required SET command before the various SQL calls, but it would be easy to implement this.

And, of course, this trick isn't specific to Archetypes--it's also useful when working with relational databases in Zope the traditional way with SQL Methods. It's particularly cool as an idea with APE--you could make the ZODB reconfigure itself depending who the logged in user is! 19

[19]In truth, though, I haven't tried this with APE, and I suspect you'd to test it thoroughly--there might be lots of nasty surprises if APE finds the ZODB changing underneath of it in unexpected ways. Or maybe not. Let me know if you try this out.

The status of SQLStorage

SQLStorage is newer than Archetypes itself, and does not appear to be as soundly developed and tested. In email to me, Alan Runyan said that:

NOTE: SQLStorage is incredibly inefficient. It works quite well and we have done a project with it (that why it exists). But really it should be rewritten if you are going to use it in a very large scale production environment. I would consider the implementation 'alpha' but stable.

I have not had a chance to audit the code to see what the inefficiencies are that he is referring to; however, as seen here, there are several buglets that prevent SQLStorage from working correctly (failing to catch deletion errors, failing to map lists correctly, etc.)

By the time you read this, these errors may be corrected and SQLStorage may be better-tested and more efficiently implemented. Stay tuned!

Marshaller - Im-/Export of ArcheTypes Content

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.5 $
Web site:http://sourceforge.net/projects/archetypes
Original URL:http://plone.org/documentation/archetypes/quickref/

Definition: Marshalling

From The Free On-line Dictionary of Computing (09 FEB 02) [foldoc]:

marshalling

<communications> (US -ll- or -l-) The process of packing one or more items of data into a message {buffer}, prior to transmitting that message buffer over a communication channel. The packing process not only collects together values which may be stored in non-consecutive memory locations but also converts data of different types into a standard representation agreed with the recipient of the message.

Marshalling is used in Archetypes to convert data into a single file for example, when someone fetches the content object via FTP or WebDAV. The inverse process is called Demarshalling.

How it works

Archetypes currently has a few sample marshallers, but they are somewhat experimental (there are no tests to confirm that they work, and that they will keep working). One of the sample marshallers is the RFC822Marshaller, which does a job very similar to what CMF does when using FTP and WebDAV with content types. Here's what happens, basically:

  1. Find the primary field for the content object, if any.
  2. Get the content type for the primary field and its content.
  3. Build a dict with all the other fields and its values.
  4. Use the function formatRFC822Headers from CMFCore.utils to encode the dict into RFC822-like fields.
  5. Append the primary field content as the body.
  6. Return the result, content_type and data.

When putting content back, the inverse is done:

1. The body is separated from the headers, using parseHeadersBody from CMFCore.utils.

2. The body, with the content type, is passed to the mutator of the primary field.

3. For each of the headers, we call the mutator of the given matching field with the header value.

That's it.

An example of using a Marshaller

To use a Marshaller, you just need to pass a Marshaller instance as one of the arguments for the Schema. For example:

from Products.Archetypes.Marshall import RFC822Marshaller
class Story(BaseContent):
    schema = BaseSchema + Schema ((

        TextField('story_description',
                  primary = 1,
                  default_output_type = 'text/plain',
                  allowable_content_types = ('text/plain', 'text/restructured',),
            widget = TextAreaWidget(label = 'Description',
                                    description = 'A short story.'
                                    )),

        ),
        marshall = RFC822Marshaller())

Predefined Marshallers

PrimaryFieldMarshaller
Handles the content of the field which has the attribute primary set to 1 in schema.
RFC822Marshaller
Handles the content of all fields as "fieldname: value" pairs as defined in RFC822 as it's used by email. The field with attribute primary set to 1 in schema is handled as the content.

Examples and more information

Examples can be found on the ArchExample product, that is included in the download. You can also browse the cvs repository 33.

Refreshing

If you don't want to restart Zope all the time, you can use the refresh option in the ZMI. Go to Control PanelProduct ManagementArchetypesrefresh tab (at the top right)

Auto Refresh works every time it is needed. Just remember to deactivate it on a production site, as it will slow down things

Advanced: How to write your own marshaller

Have a look at the Archetypes code in Marshall.py. It shows how a marshaller work.

Special Thanks

To Vladimir Iliev, for contributing with i18n and lots of other nice ideas and Bill Schindler, for lots of nice patches and reviewing documentation.

Portal Transforms

Author:Sylvain ThŽnault
Contact:sylvain.thenault@logilab.fr
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.5 $
Web site:http://sourceforge.net/projects/archetypes

Introduction

Portal Transforms is a transformation module for Archetypes. You can convert Archetype fields from StructuredText to HTML, or HTML to PDF, or Word XP to HTML. It has a low bar of entry for writing your own custom transformations, say from Excel to HTML.

This product is part of the Archetypes project, but the only requirement to use it is to have a CMF based site. If you are using Archetypes, this package replaces the transform package.

This Zope product provides two new tools for the CMF in order to make MIME types based transformations on the portal contents, and so an easy to way to plugin some new transformations for previously unsupported content types. The provided tools are :

  • mimetypes_registry (the mimetypes tool) : handle mime types information
  • portal_transform (the transform tool) : handle transformation of data from a mime type to another

A bunch of ready to use transformations are also provided. Look at the documentation for more information.

Notice this package can also be used as a standalone Python package. If you've downloaded the Python distribution, you can't make it a Zope product since Zope files have been removed from this distribution.

This product is an off-spring of the Archetypes project.

Installation

First download product PortalTransforms 34 and copy it to your Zope instance Products directory.

WARNING : The two installation methods may conflict, choose the one adapted to
your need.

Zope

  • Put this package in your Zope's Products directory and restart Zope

  • either use the QuickInstaller to add this product to your CMF site or add an external method to the root of your CMF site with the following information :

    module:

    PortalTransforms.Install

    method:

    install

and click the test tab to run it.

Python

  • Extract the tarball
  • Run "python setup.py install". See "python setup.py install --help" for installation options.
  • That's it, you should have the library and the transform command line tool installed.

Documentation

See the docs directory of the package.

Mailing-list

Discussion about this products occurs to the archetypes mailing list : http://sourceforge.net/mail/?group_id=75272 or on the #plone channel of irc.freenode.net.

Portal Transforms User manual

Author:Sylvain Thenault
Contact:syt@logilab.fr
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.4 $
Web site:http://sourceforge.net/projects/archetypes

What does this package provide ?

This package is both a python library for MIME type based content transformation, including a command line tool, (what i call the python package) and a Zope product providing two new tools for the CMF (what i call the Zope product). A python only distribution will be available, where all the Zope specific files won't be included.

Python side

The transform command line tool

command line tool for MIME type based transformation:

USAGE: transform [OPTIONS] input_file output_file

OPTIONS:
-h / --help
display this help message and exit.

-o / --output <output mime type>
output MIME type. (conflict with --transform)

-t / --transform <transform id>
id of the transform to apply. (conflict with --output)

EXAMPLE:

$ transform -o text/html dev_manual.rst dev_manual.html

Customization hook

You can customize the transformation engine by providing a module "transform_customize" somewhere in your Python path. The module must provide a initialize method which will take the engine as only argument. This method will have the reponsability to initialize the engine with desired transformations. When it's not found, the initialize method from the transforms subpackage will be used.

Zope side

The MIME types registry

This tool registered known MIME types. The information associated with a MIME type are:

  • a title
  • a list rfc-2046 types
  • a list of files extensions
  • a binary flag
  • an icon path

You can see regitered types by going to the mimetypes_registry object at the root of your CMF site, using the ZMI. There you can modify existent information or add / delete types. This product cames with a default set of MIME types icons located in portal_skins/mimetypes_icons.

The tranformation tool

It's a MIME type based transformation engine. It's has been designed to transform portal content from a given MIME type to another. You can add / delete transformations by going to the portal_transforms object at the root of your CMF site, using the ZMI. Some transformations are configurable, but not all. A transform is a Python object implementing a special interface. See the developper documentation if you're interested in writing a new transformation.

Archetypes integration

Archetypes will use this product for automatic transformation if you have configurated it to use the new base unit (set USE_NEW_BASEUNIT to 1 in the Archetypes config.py). If you're using the old base unit (still default in 1.0), the transform tool won't be used (at least by the Archetypes library).

Default transformations

The default transformations are described here. They are separated in two groups, safe and unsafe. Safe transforms are located in the transforms directory of this product. Unsafe transforms are located in the unsafe_transforms directory and are not registered by default. Moreover, there is no __init__.py file in this directory so it requires a manual intervention to make them addable to the transforms tool. Usually unsafe transforms are so called since they allow configuration of a path to a binary executable on the server, which may be indesirable for Zope service providers.

Safe transforms

st
transforms Structured Text to HTML. Not configurable.
rest
transforms Re Structured Text to HTML. You need docutils to use this transformation. Not configurable.
word_to_html
transforms M$ Word file to HTML, using either COM (on windows), wvWare or PyUNO (from OpenOffice.org). Not configurable.
pdf_to_html
transforms Adobe PDF to HTML. This transforms requires the "pdftohtml" program. Not Configurable.
lynx_dump
transforms HTML to plain text. This transforms requires the "lynx" program. Not Configurable.
python
transforms Python source code to colorized HTML. You can configure used colors.

Unsafe transforms

command

this is a fully configurable transform based on external commands. For instance, you can obtain the same transformation as the previous lynx_dump:

  1. add a new transform named "lynx_dump" with "Products.PortalTransforms.unsafe_transforms.command" as module (this supposes that you've added a __init__.py file to the unsafe_transforms directory to make them importable).

  2. go to the configure tab of this transform and set the following parameters :

    binary_path:

    '/usr/bin/lynx'

    command_line:

    '-dump %s'

    input:

    'text/html'

    output:

    'text/plain'

xml

this transform has been designed to handle XML file on a doctype / DTD basis. All the real transformation work is done by a xslt processor. This transform only associate XSLT to doctypes or DTD, and use give the correct transformation to the processor when some content has to be transformed.

FIXME: add an example on how to setup docbook transform.

Advanced features

Transformation chains

A transformation chain is an ordered suite of transformations. A chain itselve is a transformation. You can build a transformations chain using the ZMI.

Transformation policy

You can set a simple transformation policies for the transforms tool. A policy say that when you try to convert content to a given MIME type, you have to include a given transformation. For instance, imagine you have a html_tidy tranformation which tidy HTML page, you can say that the transformation path to text/html should include the html_tidy transform.

Caches

For efficiency, transformation's result are cached. You can set the life time of a cached result using the ZMI. This is a time exprimed in seconds.

Portal Transforms Developer manual

Author:Sylvain Thenault
Contact:syt@logilab.fr
Date:$Date: 2003/10/27 09:30:15 $
Version:$Revision: 1.3 $
Web site:http://sourceforge.net/projects/archetypes

Tools interfaces

The MIME types registry

class isourceAdapter(Interface):

def __call__(data, \**kwargs):
    """convert data to unicode, may take optional kwargs to aid in conversion"""

class imimetypes_registry(isourceAdapter):

def classify(data, mimetype=None, filename=None):
    """return a content type for this data or None
    None should rarely be returned as application/octet can be
    used to represent most types.
    """

def lookup(mimetypestring):
    """Lookup for imimetypes object matching mimetypestring.
    mimetypestring may have an empty minor part or containing a wildcard (*).
    mimetypestring may be an imimetype object (in this case it will be
    returned unchanged, else it should be a RFC-2046 name.
    Return a list of mimetypes objects associated with the RFC-2046 name.
    Return an empty list if no one is known.
    """

def lookupExtension(filename):
    """ return the mimetypes object associated with the file's extension
    or None if it is not known.
    filename maybe a file name like 'content.txt' or an extension like 'rest'
    """

def mimetypes():
    """return all defined mime types, each one implements at least imimetype
    """

def list_mimetypes():
    """return all defined mime types, as string"""

The tranformation tool

class iengine(Interface):

def registerTransform(transform):
    """register a transform
    transform must implements itransform
    """

def unregisterTransform(name):
    """ unregister a transform
    name is the name of a registered transform
    """

def convertTo(mimetype, orig, idata=None, \**kwargs):
    """Convert orig to a given mimetype
    return an object implementing idatastream or None if not path has been
    found
    """

def convert(name, orig, idata=None, \**kwargs):
    """run a tranform of a given name on data
    name is the name of a registered transform
    return an object implementing idatastream
    """

def __call__(name, orig, idata=None, \**kwargs):
    """run a transform returning the raw data product
    name is the name of a registered transform
    return an object implementing idatastream
    """

Writing a new transformation

Configurable transformation

You can make your transformation configurable through the ZMI by setting a config dictionnary on your transform instance or class. Keys are parameter's name and values parameter's value. Another dictionnary config_metadata describes each parameter. In this mapping, keys are also parameter's name but values are a tree-uple : (<parameter's type>, <parameter's label>, <parameter's description>).

Possible types for parameters are:

int:field is an integer
string:field is a string
list:field is a list
dict:field is a dictionnary

You can look at the command and xml transforms for an example of configurable transform.

Images / sub-objects management

A transformation may produce some sub-objects, for instance when you convert a PDF document to HTML. That's the purpose of the setObjects method of the idatastream interface.

Some utilities

Transform utilities may be found in the libtransforms subpackage. You'll find there the following modules :

commandtransform
provides a base class for external command based transforms.
retransform
provides a base class for regular expression based transforms.
html4zope
provides a docutils HTML writer for Zope.
utils
provides some utilities functions.

Write a test your transform !

Every transform should have its test... And it's easy to write a test for your transform ! Imagine you have made a transform named "colabeer" which transforms cola into beer (I let you find MIME type for these content types ;). Basically, your test file should be:

from test_transforms import make_tests

tests =('Products.MyTransforms.colabeer', "drink.cola", "drink.beer", None, 0)

def test_suite():
    return TestSuite([makeSuite(test) for test in make_tests()])

if __name__=='__main__':
    main(defaultTest='test_suite')

In this example :

  • "Products.MyTransforms.colabeer" is the module defining your transform (you can also give directly the transform instance).
  • "drink.cola" is the name of the file containing data to give to your transform as input.
  • "drink.beer" is the file containing expected transform result (what the getData method of idatastream will return).
  • Additional arguments (None and 0) are respectivly an optional normalizing function to apply to both the transform result and the output file content, and the number of subobjects that the transform is expected to produce.

This example supposes your test is in the tests directory of PortalTransforms and your input and output files respectively in tests/input and tests/output.

ArchGenXML - generating Archetypes using UML

Author:Jens W. Klein (aka Jensens)
Contact:jens.klein@jensquadrat.de
Date:$Date: 2004/03/03 13:02:29 $
Version:$Revision: 1.26 $
Web site:http://sourceforge.net/projects/archetypes
Original URL:cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/archetypes co ArchGenXML

Overview

ArchGenXML is a commandline utility that generates Plone applications (products) based on the Archetypes framework from UML models using XMI (.xmi, .zargo, .zuml) and XSD (XMLSchema) files.

Design your Archetypes (code for content objects in plone) offline using a UML design tool like Poseidon or ObjectDomain. Export the UML to XMI or XSD and use ArchGenXML to generate Phyton code for use as a Product. This new Product is available as a new item (document) type in the Plone userinterface.

This time the tool is not round-trip enabled: Generated can't be back-converted into XMI. But you can re-generate over existing code: Method bodys are kept.

ArchGenXML is started, programmed and maintained by Phil Auersperg 35

Download

Download the latest ArchGenXML-Package 36 release or check it out from CVS:

cvs -d :pserver:anonymous@cvs.sf.net:/cvsroot/archetypes co ArchGenXML

Documentation

You'll find the complete ArchGenXML Documentation inside the package.

Documentation Guide

Author:Sidnei da Silva
Contact:sidnei@x3ng.com
Date:$Date: 2003/09/14 15:27:26 $
Version:$Revision: 1.4 $
Web site:http://sourceforge.net/projects/archetypes

Introduction

This guide's purpose is to help developers to mantain some level of organization on how to write docstrings and how to keep documentation updated all the time.

Main documentation

The entry point for documentation in archetypes is the Archetypes API. It is maintained as a reStructuredText document inside the Archetypes directory of the Developers module in plone-docs CVS. Its purpose is to serve as a guide for newbies that want to learn how to use Archetypes to build their custom objects and as a reference of the features available.

Docstrings

Starting from the 1.0 release, we are requiring all docstrings to be written using reStructuredText, so they can be extracted using the Docutils 37 package. That will generate a API guide to be used in conjunction with the Archetypes API Document.

PEP 257 38 and PEP 258 39, specially the section about the Python Source Reader 40, are a MUST READ for all developers and contributors.

Additional Documentation

Additional documentation must be provided in the case where a feature or functionality is important enough and big enough not to fit in the API. An example of this is the Archetypes i18n How-to, which describes how to build objects that support multiple languages both for the management interface and for storing i18n-ized content.

XXX Make a link to Archetypes i18n How-to

Filed under: