Attention

This document was written for an unsupported version of Plone, Plone 2.1.x, and was last updated 863 days ago.

For more information, see the version support policy.

To learn how to upgrade to the current version of Plone, read the upgrade manual.

Transforms

by Martin Aspeli last modified Jan 10, 2010 09:12 PM
Working with PortalTransforms' portal_transforms tool to register new transforms, and writing tests for transforms.

Transforms are registered for one or more input MIME types, and a single output MIME type. Once registered, 'portal_transforms' will be able to use the available transforms to convert between two MIME types.

The 'intelligenttext' transforms are found in 'intelligenttext/transforms'. The structure of this directory should follow the convention that each transform is in its own module (i.e. its own .py file), each of which should contain a class implementing the 'itransform' interface and a 'register()' function that returns a new instance of the transform itself. The '__init__.py' file in the 'transforms' module (directory) should be able to register the available types. As before, we will use 'Extensions/Install.py' to register the types at install time manually.

First of all, '__init__.py' contains the following code:

from Products.PortalTransforms.libtransforms.utils import MissingBinary
modules = [
    'web_intelligent_plain_text_to_html',
    'html_to_web_intelligent_plain_text',
    ]

g = globals()
transforms = []
for m in modules:
    try:
        ns = __import__(m, g, g, None)
        transforms.append(ns.register())
    except ImportError, e:
        print "Problem importing module %s : %s" % (m, e)
    except MissingBinary, e:
        print e
    except:
        import traceback
        traceback.print_exc()

def initialize(engine):
    for transform in transforms:
        engine.registerTransform(transform)

All of this is boilerplate, except for the list of 'modules'. These are the names of the python modules under 'transforms/'.

Each transform module contains a transform class and a 'register()' function. The module 'intelligenttext/web_intelligent_plain_text_to_html.py' contains the following:

from Products.PortalTransforms.interfaces import itransform
from htmlentitydefs import entitydefs
import re

class WebIntelligentPlainTextToHtml:
    """Transform which replaces urls and email into hyperlinks"""

    __implements__ = itransform

    __name__ = "web_intelligent_plain_text_to_html"
    output = "text/html"

    def __init__(self, name=None, inputs=('text/x-web-intelligent',),
                    tab_width = 4):
        self.config = { 'inputs' : inputs, 'tab_width' : 4}
        self.config_metadata = {
            'inputs' : ('list', 'Inputs',
                            'Input(s) MIME type. Change with care.'),
            'tab_width' : ('string', 'Tab width',
                            'Number of spaces for a tab in the input')
            }
        if name:
            self.__name__ = name

        self.urlRegexp = re.compile(r'((?:ftp|https?)://(?:[a-z0-9]' \
        r'(?:[-a-z0-9]*[a-z0-9])?\.)+(?:com|edu|biz|org|gov|int|info' \
        r' |mil|net|name|museum|coop|aero|[a-z][a-z])\b(?:\d+)' \
        r'?(?:\/[^;"\'<>()\[\]{}\s\x7f-\xff]*(?:[.,?]+[^;"\'<>()' \
        r'\[\]{}\s\x7f-\xff]+)*)?)', re.I|re.S)
        self.emailRegexp = re.compile(r'["=]?(\b[A-Z0-9._%-]+@' \
        r'[A-Z0-9._%-]+\.[A-Z]{2,4}\b)', re.I|re.S)
        self.indentRegexp = re.compile(r'^(\s+)', re.M)

    def name(self):
        return self.__name__

    def __getattr__(self, attr):
        if attr in self.config:
            return self.config[attr]
        raise AttributeError(attr)

    def convert(self, orig, data, **kwargs):

        text = orig

        # Do &amp; separately, else, it may replace an already-inserted & from
        # an entity with &amp;, so < becomes &lt; becomes &amp;lt;
        text = text.replace('&', '&amp;')
        # Make funny characters into html entity defs
        for entity, letter in entitydefs.items():
            if entity != 'amp':
                text = text.replace(letter, '&' + entity + ';')

        # Replace hyperlinks with clickable <a> tags
        def replaceURL(match):
            url = match.groups()[0]
            return '<a href="%s">%s</a>' % (url, url)
        text = self.urlRegexp.subn(replaceURL, text)[0]

        # Replace email strings with mailto: links
        def replaceEmail(match):
            url = match.groups()[0]
            return '<a href="mailto:%s">%s</a>' % (url, url)
        text = self.emailRegexp.subn(replaceEmail, text)[0]

        # Make leading whitespace on a line into &nbsp; to preserve indents
        def indentWhitespace(match):
            indent = match.groups()[0]
            indent = indent.replace(' ', '&nbsp;')
            return indent.replace('\t', '&nbsp;' * self.tab_width)
        text = self.indentRegexp.subn(indentWhitespace, text)[0]

        # Finally, make \n's into br's
        text = text.replace('\n', '<br />')

        data.setData(text)
        return data

def register():
    return WebIntelligentPlainTextToHtml()

The class 'WebIntelligentPlainTextToHtml' implements 'itransform'. Notice the '__name__' attribute, which contains the name of the transform as registered with 'portal_transforms', and the 'output' attribute, which specifies the output type of the transform. The '__init__()' method is used to initialise the transform. By providing 'self.config' and 'self.config_metadata', the transform becomes through-the-web configurable. By convention, we allow the list of input MIME types to be configured. We also allow the tab width to be spcified.

All the magic happens in the 'convert()' method. Here, we use the regular expressions compiled in the '__init__()' method (to avoid compiling the same expression more than once) to find and replace URLs and mail addresses with clickable hyperlinks, handling whitespace and converting newlines to '<br />' tags. The method returns a data stream, described in the 'idatastream' interface. In this case, the stream simply contains the replaced text.

The 'html_to_web_intelligent_plain_text' transform is equivalent, but rather longer and more complicated.

To install the transforms, 'Extensions/Install.py' contains:

from Products.CMFCore.utils import getToolByName

from StringIO import StringIO
from types import InstanceType

...

def registerTransform(self, out, name, module):
    transforms = getToolByName(self, 'portal_transforms')
    transforms.manage_addTransform(name, module)
    print >> out, "Registered transform", name

def unregisterTransform(self, out, name):
    transforms = getToolByName(self, 'portal_transforms')
    try:
        transforms.unregisterTransform(name)
        print >> out, "Removed transform", name
    except AttributeError:
        print >> out, "Could not remove transform", name, "(not found)"


def install(self):

    out = StringIO()

    print >> out, "Installing text/web-intelligent mimetype and transform"

    ...

    # Register transforms
    registerTransform(self, out, 'web_intelligent_plain_text_to_html',
'Products.intelligenttext.transforms.web_intelligent_plain_text_to_html')
    registerTransform(self, out, 'html_to_web_intelligent_plain_text',
'Products.intelligenttext.transforms.html_to_web_intelligent_plain_text')

    return out.getvalue()

def uninstall(self):

    out = StringIO()

    # Remove transforms
    unregisterTransform(self, out, 'web_intelligent_plain_text_to_html')
    unregisterTransform(self, out, 'html_to_web_intelligent_plain_text')

    ...

    return out.getvalue()

Finally, we need to test our transforms. The appropriate tests are found in 'intelligenttext/tests/test_transforms.py'. This contains two simple Archetypes test cases that exercise the transforms via various strings. Take a look at this file if you want to understand the transforms in more detail.


Contribute

Something wrong or out of date? Anybody can edit or create a new article in the knowledge base. Simply create an account on this site, log in, and click the Edit button to contribute.