Personal tools
You are here: Home Documentation How-tos HTML Filtering options
Support

Get Help

Join our chat rooms or support forums if you have more specific questions.

Plone Training
Learn how to design, build, and deploy a website in Plone through one of the numerous Plone training sessions around the world.
Find Plone training…
 
Document Actions

HTML Filtering options

Warning: This item is marked as outdated.

This How-to applies to: Plone 2.1.x
This How-to is intended for: Integrators, Customizers

A discussion of the main way Plone filters HTML.

Plone filters HTML in several different places each of which has its own pro's and con's. Unfortunately this can cause confusion as the different filters have different effects.

Kupu
Filters out unwanted tags and attributes. Also turns HTML into XHTML. Configurable TTW. Runs client-side only before form is submitted so not secure.
mxTidy
Runs the HTML through the HTML tidy program. XML configuration file (not TTW). The default configuration can mess up non-ascii characters and <pre> blocks. Runs on the server when an HTML fields is updated.
Safe HTML
Removes dangerous tags and javascript from attributes. No configuration (although it can be disabled TTW). Runs on the server when an HTML field is rendered.

Kupu HTML Filter

Kupu's HTML filter runs on the client browser whenever HTML is saved. The client can bypass this filtering so it cannot be relied on for any form of security.

Kupu lets you blacklist attributes on particular tags, or all occurrences of a specific attribute or a tag.

By default the following tags are removed:

  • center
  • span
  • tt
  • big
  • small
  • u
  • s
  • strike
  • basefont
  • font

The following attributes are removed from any tag:

  • dir
  • lang
  • valign
  • halign
  • border
  • frame
  • rules
  • cellspacing
  • cellpadding
  • bgcolor

The attributes width, height are removed from the tags table, th, td (largely because IE keeps inserting them when you didn't want them, and pasting from Word always includes inappropriate width and height attributes).

The style attribute is handled specially: most style attribute values are stripped, but text-align, list-style-type, and float are allowed to remain since Kupu can generate them under some circumstances.

There is also a blacklist for the class attribute: by default it is empty but there is a sample kupu configuration which adds in some class names used by Microsoft Word.

All of these are configurable either through the control panel, or by scripting or Python. A sample script is supplied which can be edited fairly easily to change any of Kupu's configurable options.

Kupu also scrubs any event attributes on..., as well as any tags or attributes which the HTML spec doesn't define, or any attributes which aren't permitted on a particular tag. This is not configurable (except by editing the filter code).

Configuring Kupu's filter

The simple way is just to use the control panel to change the options. A better way is to copy Kupu's sample-kupu-customisation-policy.py script from the kupu_plone skin folder, naming the customised script as kupu-customisation-policy.py, and then edit it. This ensures that any customisations you make will be preserved even if you uninstall/reinstall it (e.g. when upgrading).

Disabling Kupu's filter

Short of editing kupu there is no way to entirely disable the content filter.

mxTidy

By default Plone's content types run all HTML through the external HTML Tidy program (if it is available) whenever HTML is saved. HTML Tidy will pretty-print the HTML, but it does not have options to strip specific unwanted tags/attributes so it cannot be used for the same kind of tidying as kupu or the safe html cleanup. Also many combinations of mxTidy options can result in output HTML which does not display the same as the input HTML.

Configuring mxTidy

The default configuration is in the ATContentTypes folder ATContentTypes\etc\atcontenttypes.conf.in. You should not edit this directly: make a copy of the file renamed to atcontenttypes.conf and put it in your Zope instance's etc folder, then edit that copy.

The default options are:

<mxtidy>
    enable           yes
    drop_font_tags   yes
    drop_empty_paras yes
    input_xml        no
    output_xhtml     yes
    quiet            yes
    show_warnings    yes
    indent_spaces    yes
    word_2000        yes
    wrap             72
    tab_size         4
    char_encoding    raw
</mxtidy>

Problems with this configuration include:

No character encoding has been specified, so accented characters will be corrupted: even numeric character entities are reduced modulo 256. Solution: Set char_encoding to utf8.

HTML Tidy will add a newline following <br> tags inside <pre> sections. This double-spaces the pre section. I'm not sure whether there is any combination of options which avoids this issue.

Kupu does not generate a summary attribute for tables (although it should), so HTML Tidy will generate a warning message every time you save a document containing a table. You can suppress this by setting show_warnings to no.

Disabling mxTidy

One option is simply to ensure that HTML Tidy is not accessible to Plone. If you don't install HTML Tidy Plone will simply not try to use it.

A less drastic solution is to copy the configuration file as described above, and change the enable line to:

enable           no

Safe HTML

The safe html transform is applied to the main body of documents when they are rendered by Plone. The intention of this filter is simply to ensure that certain security holes are closed. The most obvious such hole is that if a (non-Manager) user can create a web page containing a script tag and get a site administrator to view that page the script can then perform actions on the site which require administrator access.

There is no configuration for Safe HTML (before version 1.3.9 of PortalTransforms) neither TTW, nor any configuration file (it is of course possible to edit the source, or to inject patches from another product). It is also possible but extremely unadvisable to disable it. If you need to customise the transform then upgrade to version 1.3.9-rc1 or later. This is [or at the time of writing 'will be'] available in Plone 2.1.2, or check it out from Subversion at https://svn.plone.org/svn/archetypes/PortalTransforms/branches/archetypes-1_3-branch

Plone 2.1.2 will include PortalTransforms 1.3.9 which allows customising as follows:

  • You can disable the entire transform.
  • You can edit the lists of nasty tags, valid tags, and whether or not javascript is to be stripped.

The following tags are permitted by Safe HTML:

a b base big blockquote body br caption cite code dd del div dl dt
em h1 h2 h3 h4 h5 h6 head hr html i img ins kbd li meta ol p pre q
small span strong sub sup table tbody td th title tr tt u ul

Version 1.3.9 also permits:

area map

These tags and anything they contain are removed:

script object embed applet

Any other tags are removed, although their content will remain. This means that the following tags, although legal HTML will be removed by safe HTML:

abbr acronym address area basefont bdo button center dfn dir fieldset font
form iframe input isindex label map menu noframes noscript s samp
select strike textarea var

Attributes starting with 'javascript:' are also removed.

Disabling Safe HTML

This section has been removed. Versions prior to 1.3.9 could not reliably delete a transform so the instructions which were given here wouldn't work reliably (if at all). Upgrade and disable it through the configuration screen (.../portal_transforms/safe_html/manage_main) instead.

Do this only if you trust everyone who can create content for your site, as you are leaving it wide open for scripting attacks.

Alternatives to disabling Safe HTML

Another solution is to decide what javascript you want to permit, and attach it to tags based on the html. For example, Plone's applies clickable column headings to any table with the class 'listing' when the page loads. Adding additional behaviour this way allows end users to use javascript in a controlled manner.

by Duncan Booth last modified June 18, 2006 - 21:43
Contributors: Duncan Booth
All content is copyright Plone Foundation and the individual contributors.

CMF filters tags also

Posted by Chris Calloway at February 6, 2006 - 16:22

I was asked in comments to add a link to this How-to from my FAQ which explains tag filtering in the CMF:

http://plone.org/documentation/faq/tags-filtered

Maybe a link from this How-to back to my FAQ would complete the loop?

Event Attributes

Posted by Mike Takahashi at June 16, 2006 - 18:06

Found this while searching on how to enable event attributes. If you wish to enable event attributes such as "onclick" as stated above, here is how you do it.

From Duncan Booth's original message:

The reason [event attrobutes] are filtered out is simply that allowing end users to set events in the html they create is a big security threat, so in general blocking all events makes sense.

If you are happy to customise your copy of kupucontentfilters.js, then around line 280 you should find the event attributes:

// All event attributes are here but commented out so we don't // have to remove them later. this.events = []; // onclick|ondblclick|onmousedown|onmouseup|onmouseover|onmousemove|onmouseou t|onkeypress|onkeydown|onkeyup.split(|); this.focusevents = []; // [onfocus,'onblur'] this.loadevents = []; // [onload, 'onunload'] this.formevents = []; // [onsubmit,'onreset'] this.inputevents = [] ; // [onselect, 'onchange']

For these lines, simply deleting "[] ; // " should be sufficient to add the events back in to the attribute tables. You should then be able to control the events in the same way as other attributes.

reload the transforms

Posted by johannes raggam at March 21, 2007 - 13:37
after editing portal transforms, don't forget to <strong>reload the transforms!</strong>

/portal_transforms/manage_reloadAllTransforms

PHP-based Filter

Posted by John Tully at April 29, 2008 - 05:00
In case this helps, I've integrated the htmlawed PHP filter thru a system call.

bioinformatics.org/phplabware/internal_utilities/htmLawed

The htmlawed filter is almost perfect for me.

For any issues with the web site functionality, please file a ticket.

Please consult the policy on plone.org content if you want your content published on this site.

Servers and hosting by