Personal tools
You are here: Home Products XMLForest

XMLForest

Export and Import a hierarchy of Archetypes based content using IMS-ContentPackages and Marshalls XML-Marshallers.

Current release

No stable release available yet.

If you are interested in getting the source code of this project, you can get it from the code repository.

Project Description

0. Introduction

XMLForest is a tool for importing and exporting a bunch of Archetypes based content.

1. Installation

XMLForest shold live in the Products folder of your Zope instance, simply put it there and it should work.

2. Dependencies

I have developed XMLForest on my box with the following components installed.

  • Archetypes version >=1.3.7
  • Plone version 2.1.2
  • CompoundField version 1.0-beta build 241 (only for (doc-)tests)
  • Marshall (version 1.1, better gogo-jensens-merge-task branch)
  • Relations version 0.5 (UNRELEASED)
  • Zope 2.8.6 or 2.9.1

3. Testing

To run the included doctest start "./bin/zopectl test Products/XMLForest" in your Zope root directory. The doctest is a good source for information on the implementation of XMLForest.

4. What does it do?

XMLForest is a portal tool you can use to export or import Archetypes in XML format. The doctest shows you a quick example. In the doctest we set up a few objects, then we export them as xml and re-import them to check some of their attributes as well as a contentish relation. You can see how contentish and non-contentish references are exported and imported. You can also see how 'ordinary' objects are exported.

5. What about IMS support?

IMS is a XML standard for learning technologies. We thought it would be rather challenging and extremely enriching if we would support a broad standard like IMS. The core documentation on the parts of IMS that are important for XMLForest are found here: http://www.imsglobal.org/content/packaging/cpv1p1p3/imscp_bindv1p1p3.html As you can see IMS specifies a so called 'manifest' tag. This is the root tag of the manifests we use for import. The manifest would explain what data is to be imported and it will reflect the hierarchical and relational structure of the objects. To accomplish import and export the manifest has to major parts: organizations and resources.

Organizations define the structure of the objects. There are three types of organizations in XMLForest: Hierarchical, relational and 'Hephaistos'. The organizations would just describe the structure of the objects, not their content. The content is defined in the resource tags.

'Hierarchical Organizations' reflect the containment of objects. One object can contain another like e.g. a folder might contain a file. This containment is reflected in XML as a tag that contains another tag. I give you an example of a hierarchical organization that contains some items:

<organization identifier="HIERARCHICAL" structure="hierarchical">
    <item identifier="ForestTestFolder" identifierref="1">
        <item identifier="item1" identifierref="2">
            <item identifier="subitem1" identifierref="3">
                <item identifier="subitem" identifierref="4" />
            </item>
        </item>
        <item identifier="item2" identifierref="5" />
    </item>
</organization>

When you look at the 'item' tags you see that they are nested. The item with the identifier 'ForestTestFolder' contains an item with the identifier 'item1'. When you look closer you see that 'ForestTestFolder' also contains another item with the name 'item2'. The item 'subitem' shows how a single item can be contained in more containers, it is within the 'subitem1' tag and 'subuitem1' is within 'item1'. This shows how deeper levels of containemt are translated into a XML structure.

The next type of organization I would like to explain is the 'Relational Organization'. The items contained in a hierarchical tree can carry references to other items with them. Such references can be called 'relations'. Relations do have a certain type, e.g. obe relation could be named 'Father_of' while another is called 'Mother_of'. One object could have two relations, one to their father object, one to their mother object. Those References can also carry information themselves, there are so called 'container objects'. These container objects are like any other archetype object but the 'belong' to a certain reference and are stroed next to it automatically. Relations are a way to introduce very complex structures into data, raising the dimensionality and density of information. Therefore we support them in XMLForest:

<organization identifier="RELATIONS" structure="relations" xmlns="http://zuccaro.biblhetz.it/xsd/xmlforest.xsd">

   <item PortalType="foresttestitems_other"
         identifier="83aa4d62f296c28ea2cf2c6bf982ca44">
     <metadata>
       <rnode sourceidentifierref="1"
              targetidentifierref="2"
              type="foresttestitems_other" />
     </metadata>
   </item>

   <item PortalType="ForestContentReference"
         identifier="261a8cde16d4cb9211806bd1458e623e"
         identifierref="4">
     <metadata>
       <rnode
         contentidentifierref="4"
         sourceidentifierref="5"
         targetidentifierref="6"
         type="ForestContentReference" />
     </metadata>
   </item>

</organization>

The first item in this example for relational organizations is a simple non- contentish relation. It points out that tere is a 'foresttestitems_other' relation from the item with the identifier '1' to an other item with the identifier '2'. The reference itself has the identifier '83aa4d62f296c28ea2cf2c6bf982ca44'. That odd identifier is just an autoatically generated identifier, since references are allways stored automatically by the underlying references/relations system.

The second item is a 'contentish' relation, it has a source object, a target obejct (just like the non-contentish reference before had), but it has also got a 'content' object and a pointer to its resource.

There is also a 'hephaistos' organization implemented. This organization is sort of a 'don't care' organization that just delegates object creation to a mechanism called hephaistos that would do 'the right thing' and knows where all the objects should be stored.

So the first part of the manifest file is explained now, leaving the other half in obscure foggy distance. No, wait! I will explain it right now.

The second part of the manifest file keeps so called 'resources'. A resource is a definition of the content of an object. As we have seen before the structural data allways had references ro resources in the tags. On object creation those references are resolved, and the content of objects is written into the created objects as well. Let us put that mechanism on a glass plate and have a closer look.

Whenever an object is being created it will be filled with the content that is being stored for that object on export. Those resources can be extra files or it can be a part of the manifest file that holds the so called 'metadata'. You can copy/paste the metadata from any exported resource xml file into the manifest and XMLForest should be able not to follow a 'file' reference but take the metadata that has been pasted. I give you can example:

<resources>
   <resource UID="4260799500487c24c9682da3dde073c8"
             identifier="1"
             type="file">
       <file href="4260799500487c24c9682da3dde073c8.xml" />
   </resource>
   <resource UID="079275ea242e7b1ec9db18bfc868aeaf"
             identifier="2"
             type="file">
       <file href="079275ea242e7b1ec9db18bfc868aeaf.xml" />
   </resource>
   ....
</resources>

This resource definition just uses files. It holds references to files on the file system that contain all the content information or 'metadata' that is needed to restore the object. When a object is found in the the organization definitions and it is created a lookup takes place, mathing the resource reference from the structural definitions with one identifier in the resources part of the manifest. XMLForest then knows what to restore into the object after it has been created. The other possibility is to put the metadata part into the resoource tag like this:

<resource identifier="RESOURCE_ID_01" type="inline">
   <metadata xmlns="http://plone.org/ns/archetypes/"
       xmlns:cmf="http://cmf.zope.org/namespaces/default/"
       xmlns:dc="http://purl.org/dc/elements/1.1/"
       xmlns:xmp="adobe:ns:meta">

       <dc:creator> portal_owner </dc:creator>
       <xmp:CreateDate> 2005-11-25T11:29:08Z </xmp:CreateDate>
       <xmp:ModifyDate> 2005-11-25T11:29:08Z </xmp:ModifyDate>
       <field id="pattr"> {'y': (1,), 'x': (2,)} </field>
       <field id="allowDiscussion"> 0 </field>
       <field id="id"> item1 </field>
       <field id="tattr">blabla blablabla</field>
       <field id="xpattr"> {'y': (3,), 'x': (4,)} </field>
       <uid> 079275ea242e7b1ec9db18bfc868aeaf </uid>
       <cmf:type> ForestTestItem </cmf:type>
       <cmf:workflow_history>
       <cmf:workflow cmf:id="plone_workflow">
       <cmf:history>
       <cmf:var cmf:id="action" cmf:type="None"
                cmf:value="None"/>
       <cmf:var cmf:id="review_state" cmf:type="str"
                cmf:value="visible"/>
       <cmf:var cmf:id="actor" cmf:type="str"
                cmf:value="portal_owner"/>
       <cmf:var cmf:id="comments" cmf:type="str" cmf:value=""/>
       <cmf:var cmf:id="time" cmf:type="date"
                cmf:value="2005-11-25 12:29:08"/>
             </cmf:history>
       </cmf:workflow>
       </cmf:workflow_history>
       <cmf:security>
       <cmf:local_role cmf:role="Owner" cmf:user_id="portal_owner"/>
       </cmf:security>
   </metadata>
</resource>

There is no 'file' attribute in the resource definition, it simply contains the metadata information as a tag. This format is also valid for XMLForest.

The metadata structure itself is not part of the XMLForest specification and implementation. It is part of the Marshall Product. I have done a primary/binary plugable namespace for Marshall. If anyone needs to export and import binary data feel free to contact me. It can handle images/audio and other multimedia files as uu-encoded or base64 encoded binary data.

6. How does XMLForest do that?

To answer that question I will go into detail of the implementation. Let us assume we export a tree of objects using XMLForest. The most important command will be:

tool.export_tree(root_object, directory)

This command tells XMLForest to export the root_object given, and all sub- objects it contains, recursively. The second parameter, "directory", is a valid filesystem path to the folder where all data should be put. XMLForest will create a number of files a typical xml import/export directory layout looks like this:

0d670efa20d8dc7cfdf96b27f913283a.xml a2ccba0cdc80cbf9bf8528fb2595d7a6.xml
0f0a74010887766cd7d1a2c3769af85d.xml ab8e5b89407f93a61f9b58d8d2503833.xml
15f5eff422d0428a49c95eca9e174603.xml afcbc1d7d35233de34ef82af069d5e9d.xml
206da6f7f828a332dbf0bbeed187cb29.xml b321df7ee39c0523131e81420f2fa0b3.xml
22f3b66739440c657c8a627bfa4327f2.xml b6e934ea42c01acb479a58a52d5bfbac.xml
313413b7538cee2eb7a0c5536f15d905.xml c7ed17925ac9d5774cdb04b37ab8797a.xml
356e3876310c67f26ec26f87dd05bef9.xml cc6f124c0c6c5682f55c7f87801c9d53.xml
44a42d89934f8ab04fcb5481b6eaefbd.xml d19fdaf004d8ae0749d1a179ac098fef.xml
4cf01b0099ec8ebc461653a9e38ed2ed.xml d4ae5ed5b2dd529fdf66c255e79d1f80.xml
50da5b8a5f173dbbb981b9edafc679a9.xml e075c90db05cf2079b1ac68beed9acd5.xml
6da66d74f116dc06372cb9df168820da.xml e5220d1557104e0a00afe26479122aea.xml
7558b88d4bee7c52deb3ec153f96e850.xml e5fd71c1d33d19f821ad604f2a2b85bf.xml
76154f4d247a1f7a393217e9638162c8.xml f74bc45b93b295469ccf884a7f5f1c39.xml
7abb80f09eab110f509752fd5b6951c0.xml fa3aedf97b9a46d1a90a3ce947ea023a.xml
8127af64f9fda189d6719b934c1a6596.xml fd73770c4fcb35c2f23daeb6767a7d25.xml
97916f68774397d750644744300e9e2f.xml fddbf9ddd9f49fd53e6823f71b86bbd2.xml
9ddec7f637767cb1ef3756a1ba602687.xml manifest.xml

Don't panic. As I said I will go into detail in this very README.txt and sort things out.

The most important file in the output directory is "manifest.xml". It contains all meta-information of the export. First I want to explain the structure of this file.

The outermost tag in "manifest.xml" is the tag <manifest>. Within that tag all information about the hierarchy and the references of the export/import is declared. The first part of the file contains all hierarchy nodes called <hnode>. A <hnode> may contain other <hnode> elements, reflecting the structure of the tree. I give an example:

<hnode
    dataref="b321df7ee39c0523131e81420f2fa0b3.xml"
    id="b321df7ee39c0523131e81420f2fa0b3"
    name="folder2"
    type="ForestTestFolder">
    <hnode
        dataref="50da5b8a5f173dbbb981b9edafc679a9.xml"
        id="50da5b8a5f173dbbb981b9edafc679a9"
        name="item_nested_0"
        type="ForestTestItem"/>
</hnode>

That is only a fragment of the hnode section from the doctest. The attributes used explain in which file the content information of the Archetypes object is kept and what type and id the object has. For example the first hnode given in the fragment above has the following attributes:

dataref="b321df7ee39c0523131e81420f2fa0b3.xml"
id="b321df7ee39c0523131e81420f2fa0b3"
name="folder2"
type="ForestTestFolder"

The "id" attribute is an id that is uniquely used in the "manifest.xml" file. It will be used later, in the part of the file that holds the information for references.

The "name" attribute contains the Zope id of the Archetypes object. During the import the object represented by the <hnode> will have the id given here.

The "type" attribute reflects the portal_type of the Archetype object. When XMLForest imports data it will create the object with the given meta_type.

The "dataref" attribute tells XMLForest where to look for the content data of the Archetypes object. In my example the hierarchy of the output directory is flat and all parts of the export are in one folder. However, when you want to do a custom import, you might have a different directory layout and "dataref" would also contain a valid path to the file containing content data. The path will typically be relative to the directory parameter given, but it may also be absolute. When you do a custom import you are not bound to the directory layout the "export_tree" method uses. Just make sure the path is resolvable. XMLForest uses the UID as a pert of the filename for content data. You may use any file name, as long as it is unique.

I will now show an actual content data file, to keep the context clear I choose "b321df7ee39c0523131e81420f2fa0b3.xml":

<?xml version="1.0" ?>
<metadata xmlns="http://plone.org/ns/archetypes/"
          xmlns:at_data="uuns:ns:data"
          xmlns:cmf="http://cmf.zope.org/namespaces/default/"
          xmlns:dc="http://purl.org/dc/elements/1.1/"
          xmlns:xmp="adobe:ns:meta">
    <dc:creator>
        portal_owner
    </dc:creator>
    <xmp:CreateDate>
        2005-11-25T15:07:41Z
    </xmp:CreateDate>
    <xmp:ModifyDate>
        2005-11-25T15:07:41Z
    </xmp:ModifyDate>
    <field id="allowDiscussion">
        0
    </field>
    <field id="id">
        folder2
    </field>
    <uid>
        b321df7ee39c0523131e81420f2fa0b3
    </uid>
    <cmf:type>
        ForestTestFolder
    </cmf:type>
    <cmf:workflow_history>
        <cmf:workflow cmf:id="plone_workflow">
            <cmf:history>
                <cmf:var cmf:id="action"
                 cmf:type="None" cmf:value="None"/>
                <cmf:var cmf:id="review_state"
                 cmf:type="str" cmf:value="visible"/>
                <cmf:var cmf:id="actor" cmf:type="str"
                 cmf:value="portal_owner"/>
                <cmf:var cmf:id="comments" cmf:type="str" cmf:value=""/>
                <cmf:var cmf:id="time"
                 cmf:type="date" cmf:value="2005-11-25 16:07:41"/>
            </cmf:history>
        </cmf:workflow>
    </cmf:workflow_history>
    <cmf:security>
        <cmf:local_role cmf:role="Owner" cmf:user_id="portal_owner"/>
    </cmf:security>
</metadata>

The data in a content file is generated by the "Marshall" Product. The <metadata> tag contains information about the different namespaces that are used. You might be missing the "uuns" namespace, since "uuns" is a custom namespace I wrote for exporting and importing binary data. The installation of "uuns" is not explained here and you might only want to use it if you do have binary data. Namespaces are registered in the "Marshall" Product itself (in Marshall/namespaces/__init__.py). Each namespace will put a section of data into the export file. On import the different namespaces are called to deserialize the data into an Archetypes object.

References

We return to the "metadata.xml" file and there is an optional second section for references. This section consists of <rnode> tags. Those nodes represent metadata of references between objects. I have another fragment of data from the doctest "manifest.xml" ready as an example:

<rnode
    id="ba93aa00fb6e28105b6b7a095f94e961"
    sourceID="e5fd71c1d33d19f821ad604f2a2b85bf"
    targetID="97916f68774397d750644744300e9e2f"
    type="foresttestitems_other"/>

This fragment represents one reference, in this case a reference that has no content object.

The attribute "id" is an identifier that is unique within "manifest.xml".

The attribute "sourceID" is a reference to another identifier within "manifest.xml". It is used to declare which of the objects is a source object in a reference. References exist between two objects, one is called the source object, the other one is called target object.

The attribute "targetID" is a reference within "manifest.xml" that refers to the target object.

The attribute "type" contains the meta_type of the reference.

With the attributes mentioned so far a reference is defined and can be established during an import. In my example here I can show you how to find out which source object is referencing which target object. XMLForest will take the sourceID given to look up the source object in a storage of <hnode> elements it creates during the first (hierarchy) pass. In our case the identifyers used are the UIDs of the Archetypes objects, but when you do a custom import you will not be able to provide that information during you export. To say it in one sentence: "sourceID" refers to an "id" within "manifest.xml", and the identifyer will be resolved by performing a lookup.

In this example the lookup will take "sourceID" and it contains the value "e5fd71c1d33d19f821ad604f2a2b85bf". This value is used to look for an entry in the <hnode> section. It will find the following <hnode> tag:

<hnode
    dataref="e5fd71c1d33d19f821ad604f2a2b85bf.xml"
    id="e5fd71c1d33d19f821ad604f2a2b85bf"
    name="item0" type="ForestTestItem"/>

Since the object has been created in the first pass it will have a new Archetypes UID and this new UID is kept in the internal storage of XMLForest during the import. The lookup resolves the "sourceID" attribute and finds the coresponding Archetypes object to establish the relation. For "targetID" the same procedure is executed and XMLForest finds the right pair of objects. Once the pair is found a reference of the "type" given in the <rnode> tag is created.

In this example "sourceID" refers to an object with the Zope id "item0", that is an instance of "ForestTestItem". The attribute "targetID" refers to another object with the Zope id "item1", another "ForestTestItem". XMLForest will create a reference with the type "foresttestitems_other" between those two objects.

Relations

XMLForest also handles contentish references, in this "README.txt" I call them relations. I have an example for an <rnode> element that defines a contentish relation in the doctest, too:

<rnode
    dataref="cc6f124c0c6c5682f55c7f87801c9d53.xml"
    id="ec8e3504522190ee905e0c72c5ddfba3"
    sourceID="8127af64f9fda189d6719b934c1a6596"
    targetID="d19fdaf004d8ae0749d1a179ac098fef"
    type="ForestContentReference"/>

As you can see it holds the same information as the <rnode> tag explained before, but now we also find an additional attribute "dataref". The attribute "dataref" has the same meaning that the "dataref" attributes you find in <hnode> tags. It is used to provide a reference to the data file that contains the information from the content object of the relation. As soon as the relation between the objects is established XMLForest looks up the content object and fills it with the data from the file defined in "dataref".

7. Further documentation

As mentioned exhaustively in this "README.txt" you will find a lot of hints in the doctest. You find the doctest at "XMLForest/doc/testXMLForestTool.txt".

8. Epilog

During the design of XMLForest I have been discussing the implementation with several people in the plone community. Kapil was giving me a lot of input, we found that it is necessary to do an import in two passes. We also discussed that it might be useful to provide a "path" attribute, whenever it is possible, to speed up the lookup process for objects. We hope that we found a stable basis for future XML import/export tasks, such as importing and exporting not a "tree" of objects, but also a "soup", meaning that we try to provide an XML schema that is also able to hold meta-information of objects organized in a "cloud" and not a hierarchy. The "path" attribute is reserved for such imports and exports. Further there are several optional "...UID" attributes that can be used to import e.g. references between Archetypes objects that already live in the ZODB.

I hope you now understand the internal behavior of XMLForest, so you are able to create your own custom imports. Feel free to contact me if you have any questions about XMLForest. I would also like to thank Ullrich Eck for his input during the design process of XMLForest. A lot of "thank-yous" go out for Philipp Auersperg, who flattened my path during the implementation of XMLForest. He kept my spirit up!

And finally the sponsoring credits go to the ZUCCARO project team of Biblotheca Hertziana in Rome, namely Martin Raspe and Georg Schelbert!

Regards, Georg Gogo. BERNHARD <gogo@bluedynamics.com>