XML in Plone with Marshall

« Return to page index

Marshall lets you save and load Plone content using XML. As a configurable system, it has lots of options. This hands-on how-to shows exactly what to do to make the basics work.

Introduction

Background information on the tutorial.

As a CMS, Plone needs to fit in with other information systems. Increasingly, the XML stack is the preferred way for semi-structured content to express itself between systems. Also, customers view XML as a future-proof storage.

Can Plone give XML representations of content types defined in Archetypes? This how-to gives a hands-on treatment of Marshall, a Collective add-on which provides XML saving and loading of your content.

As background, I'm really, really dense. This how-to is written for someone like me, that wants to be told exactly what to do to achieve some initial result.

Outcomes

In this first draft of the how-to, the goal is to get an XML representation of a Page in the fewest possible steps. We'll also show how to create a new Page from a file on disk.

Later installments will show a more flexible configuration, where you can define the kind of thing to be added via an XML element in your file. If Sidnei has enough patience with my questions, I'll add more to this how-to.

Setup

Prerequisites and configuration for software.

Plone doesn't yet support XML saving and loading as part of its default setup.  We need to add and configure some software.

Prerequisites

Marshall has been around a while, so in a sense, it should work with semi-recent versions of software. However, to get the experience described herein, you should use this:

  • Zope 2.8.4
  • Plone 2.1.2
  • Marshall 0.6 (link here)

You also need libxml2. This is an industrial-strength XML parser with a good Python binding. If you are on Linux or OS X, you already have libxml2, though you might not have the Python binding in the Python you are using for your Zope.

How to find out? Run the Python for your Zope and do:

import libxml2

If that works, you're golden. If not, you have some compiling to do.

Finally, this walkthrough shows the use of DAV clients. The steps show both cadaver  and oXygen. You can use one of these, both of these, or neither of these, as they are only for illustration.

Setup

Once your software is in place, the next step is to configure Zope to provide a WebDAV port. For example, if your ZMI port is 8080, you might want to connect to Plone on port 8880 for WebDAV. In your Zope instance, open etc/zope.conf and uncomment the webdav-server section:

<webdav-source-server>
# valid keys are "address" and "force-connection-close"
address 8880
force-connection-close off
</webdav-source-server>

Make sure you restart your Zope. Next, log into Plone as a Manager. Click on plone setup (top right corner) and install the Marshall product.


Much of the remaining work is in the ZMI. (Yeh, we should provide a configlet for this. If someone teaches me how to make a configlet, I'll do it and maintain it.) Thus, in Plone Setup, click on the link for Zope Management Interface.

Plone Setup

Marshaller Registry

Archetypes by default uses its own "marshaller" for exporting content. This step points it at Marshaller's ATXML exporter.

The bits are now installed but not configured. We need a way to tell Archetypes to use this XML marshaller when exporting a Page's content. Specifically, we want to add a "predicate" to use the ATXML marshaller.

  1. In ZMI, at the portal root, click on marshaller_registry.

  2. Marshaller Registry Tool

  3. Click on the Add Marshaller Predicate button.
  4. Fill in the fields and click Add:
    • Leave Predicate Type as Default Predicate
    • Choose an Id, such as myatxml
    • Choose a Title, such as My ATXML Predicate
    • Set the Component Name to ATXML Marshaller
    • Leave the Condition blank.
    • Click the Add button.

      Add Predicate

 

Getting and Editing XML

Now that the ATXML marshaller is configured, let's work with it.

So good news, you already have reached a point of success!  Visit Plone's front-page in a browser and add manage_FTPget to the URL, as shown in the URL bar below:

Viewing XML from Marshal


Ahh, look at all that XML goodness, ain't she beautiful?  But can we edit an existing entry?  Let's use the cadaver command-line DAV tool and find out:

$ cadaver http://localhost:8880/atxml/
$ edit front-page

Note the usage of the WebDAV port number!!  Depending on your editor settings, the second command will give you an editor such as vi.  Change the value of the <dc:title> element, then save and exit the editor.  cadaver will send the changes to Plone and unlock the resource.

Re-open the Plone front page in a web browser.  You should see your new title.  Cool, eh?

Some notes on this:

  1. Eagle-eyed observers will note that the body field was, errm, how shall we say this....encoded. In the case of Marshall 0.6, if a field's contents are HTML, the Marshaller puts the content in a CDATA.  Why?  Because we can't promise that the HTML is well-formed XML.  Future versions of Marshall might revisit this policy.  Note that image and file field contents are currently not serialized.
  2. cadaver has a helpful readline mode, if your compilation supported it.
  3. This XML format is in flux!  Sidnei and I are discussing how to get more "meaning" out of Plone, in a way that fits the median of expectations.

I like using the oXygen editor, as noted above.  With oXygen I can open the Plone site, browse to front-page, and edit the content in a real XML editor.  I can even download the Relax NG schema for Marshall and validate before saving.

In this next screenshot, I show browsing the contents of the Plone site using oXygen's WebDAV browser:

oXygen Browsing

I double-click to open front-page and tell oXygen that this is an XML document (lack of file extension means it couldn't guess).  In the next screenshot, I have the front-page, exported as ATXML, open for editing.  I have changed the dc:title and I have also associated the Relax NG schema with this page, thus giving me the inspector on the right:

oXygen Changed


Finally, I show the schema validation in action.  I mistakenly change the id attribute on a field to be xid, which is not allowed in the schema.  Note the red underline, the completely accurate warning message in the status bar, and the appearance of the right-hand inspector pane:

oXygen Validator











Creating New Entries with CTR

Loading a new XML file should create the correct content type. The Content Type Registry helps us.

As shown, editing existing entries was straightforward.  Creating new content in Plone based on external XML files is more problematic.  Namely, what content type should we use for the new resource?

As this isn't a one-size-fits-all situation, Marshall approaches this with configurability in mind.  For mortals like me, choice means confusion.  So this first example shows the simplest possible way to make it work, albeit in a clumsy-to-use fashion.

For this example, we will set a policy that any XML file ending in .atxmlpage will be used to create a Page resource.  The id of that resource will come from the rest of the filename.

The CMF's Content Type Registry is responsible for policies related to file extensions.  This tool can be reached via the ZMI in the portal root under the name "content_type_registry".

  1. Click on content_type_registry tool in ZMI.
  2. Scroll to the bottom.
  3. Add a predicate with a name such as atxmlpage, using Extension as the predicate type in the drop down, and click Add.
    Add CTR Predicate
  4. After saving, change the settings. Set the extensions value to atxmlpage and the content type in the drop-down to Page, then click Change.
    CTR Predicate Editing

The content type registry is now setup.  On your local disk, create a file somewhere named mynewxmlpage.atxmlpage and give it contents as shown below:

<?xml version="1.0" ?>
<metadata xmlns="http://plone.org/ns/archetypes/"
xmlns:cmf="http://cmf.zope.org/namespaces/default/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="adobe:ns:meta">
<cmf:type>
Document
</cmf:type>
<dc:title>
My first page from XML
</dc:title>
<dc:description>
Congratulations! You have successfully installed Plone.
</dc:description>
<field id="text">
<![CDATA[
<h2>Initial Content</h2>
<p>This content came from an XML file on disk.</p>
]]>
</field>
</metadata>

Save this file, then return to cadaver.  In the top folder of your Plone site, use cadaver to add the file:

put mynewxmlpage.atxmlpage

cadaver uploads the file to Plone.  The CTR sees the extension and knows to create a Page (Document) using the ATXML marshaller, which reads the XML file for all the initial settings.

In Plone you can now go to the URL http://localhost:8080/atxml/mynewxmlpage.atxmlpage and see your new page.

Sure, there are a bunch of caveats to note:

  1. Be very careful to ensure you don't have a field with id="id" in your upload, nor a uid entry.
  2. Would be nice if the .atxmlpage extension disappeared.
  3. If you want to see all the settings, such as dc:subject and workflow state, that you can serialize to XML, go change some things on an existing Page and open it via cadaver.  There's lots there!  Sidnei's XML format was meant to capture lots of semantics.
  4. In a potential future addition to this how-to, I'll cover how to use cmf:type in the XML to create different types without the use of crazy file extensions.


Conclusion

What we did and what more we could do.

Plone is a CMS, and a CMS should have good facilities for getting stuff in and out.  Plone is especially neat, in that Archetypes lets you define new kinds of semi-structured content types.  Those, also, should provide a nice way to talk to the outside world.

Marshall is one approach to doing this.  Hopefully this how-to provided enough information to get started.