Transforms
Transforms are registered for one or more input MIME types, and a single output MIME type. Once registered, 'portal_transforms' will be able to use the available transforms to convert between two MIME types.
The 'intelligenttext' transforms are found in 'intelligenttext/transforms'. The structure of this directory should follow the convention that each transform is in its own module (i.e. its own .py file), each of which should contain a class implementing the 'itransform' interface and a 'register()' function that returns a new instance of the transform itself. The '__init__.py' file in the 'transforms' module (directory) should be able to register the available types. As before, we will use 'Extensions/Install.py' to register the types at install time manually.
First of all, '__init__.py' contains the following code:
from Products.PortalTransforms.libtransforms.utils import MissingBinary
modules = [
'web_intelligent_plain_text_to_html',
'html_to_web_intelligent_plain_text',
]
g = globals()
transforms = []
for m in modules:
try:
ns = __import__(m, g, g, None)
transforms.append(ns.register())
except ImportError, e:
print "Problem importing module %s : %s" % (m, e)
except MissingBinary, e:
print e
except:
import traceback
traceback.print_exc()
def initialize(engine):
for transform in transforms:
engine.registerTransform(transform)
All of this is boilerplate, except for the list of 'modules'. These are the names of the python modules under 'transforms/'.
Each transform module contains a transform class and a 'register()' function. The module 'intelligenttext/web_intelligent_plain_text_to_html.py' contains the following:
from Products.PortalTransforms.interfaces import itransform
from htmlentitydefs import entitydefs
import re
class WebIntelligentPlainTextToHtml:
"""Transform which replaces urls and email into hyperlinks"""
__implements__ = itransform
__name__ = "web_intelligent_plain_text_to_html"
output = "text/html"
def __init__(self, name=None, inputs=('text/x-web-intelligent',),
tab_width = 4):
self.config = { 'inputs' : inputs, 'tab_width' : 4}
self.config_metadata = {
'inputs' : ('list', 'Inputs',
'Input(s) MIME type. Change with care.'),
'tab_width' : ('string', 'Tab width',
'Number of spaces for a tab in the input')
}
if name:
self.__name__ = name
self.urlRegexp = re.compile(r'((?:ftp|https?)://(?:[a-z0-9]' \
r'(?:[-a-z0-9]*[a-z0-9])?\.)+(?:com|edu|biz|org|gov|int|info' \
r' |mil|net|name|museum|coop|aero|[a-z][a-z])\b(?:\d+)' \
r'?(?:\/[^;"\'<>()\[\]{}\s\x7f-\xff]*(?:[.,?]+[^;"\'<>()' \
r'\[\]{}\s\x7f-\xff]+)*)?)', re.I|re.S)
self.emailRegexp = re.compile(r'["=]?(\b[A-Z0-9._%-]+@' \
r'[A-Z0-9._%-]+\.[A-Z]{2,4}\b)', re.I|re.S)
self.indentRegexp = re.compile(r'^(\s+)', re.M)
def name(self):
return self.__name__
def __getattr__(self, attr):
if attr in self.config:
return self.config[attr]
raise AttributeError(attr)
def convert(self, orig, data, **kwargs):
text = orig
# Do & separately, else, it may replace an already-inserted & from
# an entity with &, so < becomes < becomes &lt;
text = text.replace('&', '&')
# Make funny characters into html entity defs
for entity, letter in entitydefs.items():
if entity != 'amp':
text = text.replace(letter, '&' + entity + ';')
# Replace hyperlinks with clickable <a> tags
def replaceURL(match):
url = match.groups()[0]
return '<a href="%s">%s</a>' % (url, url)
text = self.urlRegexp.subn(replaceURL, text)[0]
# Replace email strings with mailto: links
def replaceEmail(match):
url = match.groups()[0]
return '<a href="mailto:%s">%s</a>' % (url, url)
text = self.emailRegexp.subn(replaceEmail, text)[0]
# Make leading whitespace on a line into to preserve indents
def indentWhitespace(match):
indent = match.groups()[0]
indent = indent.replace(' ', ' ')
return indent.replace('\t', ' ' * self.tab_width)
text = self.indentRegexp.subn(indentWhitespace, text)[0]
# Finally, make \n's into br's
text = text.replace('\n', '<br />')
data.setData(text)
return data
def register():
return WebIntelligentPlainTextToHtml()
The class 'WebIntelligentPlainTextToHtml' implements 'itransform'. Notice the '__name__' attribute, which contains the name of the transform as registered with 'portal_transforms', and the 'output' attribute, which specifies the output type of the transform. The '__init__()' method is used to initialise the transform. By providing 'self.config' and 'self.config_metadata', the transform becomes through-the-web configurable. By convention, we allow the list of input MIME types to be configured. We also allow the tab width to be spcified.
All the magic happens in the 'convert()' method. Here, we use the regular expressions compiled in the '__init__()' method (to avoid compiling the same expression more than once) to find and replace URLs and mail addresses with clickable hyperlinks, handling whitespace and converting newlines to '<br />' tags. The method returns a data stream, described in the 'idatastream' interface. In this case, the stream simply contains the replaced text.
The 'html_to_web_intelligent_plain_text' transform is equivalent, but rather longer and more complicated.
To install the transforms, 'Extensions/Install.py' contains:
from Products.CMFCore.utils import getToolByName
from StringIO import StringIO
from types import InstanceType
...
def registerTransform(self, out, name, module):
transforms = getToolByName(self, 'portal_transforms')
transforms.manage_addTransform(name, module)
print >> out, "Registered transform", name
def unregisterTransform(self, out, name):
transforms = getToolByName(self, 'portal_transforms')
try:
transforms.unregisterTransform(name)
print >> out, "Removed transform", name
except AttributeError:
print >> out, "Could not remove transform", name, "(not found)"
def install(self):
out = StringIO()
print >> out, "Installing text/web-intelligent mimetype and transform"
...
# Register transforms
registerTransform(self, out, 'web_intelligent_plain_text_to_html',
'Products.intelligenttext.transforms.web_intelligent_plain_text_to_html')
registerTransform(self, out, 'html_to_web_intelligent_plain_text',
'Products.intelligenttext.transforms.html_to_web_intelligent_plain_text')
return out.getvalue()
def uninstall(self):
out = StringIO()
# Remove transforms
unregisterTransform(self, out, 'web_intelligent_plain_text_to_html')
unregisterTransform(self, out, 'html_to_web_intelligent_plain_text')
...
return out.getvalue()
Finally, we need to test our transforms. The appropriate tests are found in 'intelligenttext/tests/test_transforms.py'. This contains two simple Archetypes test cases that exercise the transforms via various strings. Take a look at this file if you want to understand the transforms in more detail.
