Catalog anything in Zope/Plone

by Yuri last modified Oct 01, 2010 06:14 AM
This how-to shows how to catalog non-object data in the catalog, and search for them. Useful when you've a list of items you wish to use like an sql table.

Purpose

This how-to shows how to catalog non-object data in the catalog, and search for them. Useful when you've a list of items you wish to use like an sql table.

Prerequisities

None, works well in any version of Zope, tested in Zope 2.6

Step by step

I've developed an approach to store data in the catalog without creating objects (which requires creating a product). The example uses a File, but you can use virtually anything. The File content is indexed "line by line", so I can search for an item "very fast" using the catalog (instead of a File, you can use a property from the PropertySheet, I've used a File because it can be easly edited and uploaded TTW).

Some numbers:

I need heavy searches, and loading & searching in the file taked 0.35 sec.
With the catalog, it is about 0,003 sec, a 100x factor less (because of BTree, I had 3000 authors to manage)

Step 1

  • Add a catalog to a folder (in this example, it is called catalogo_autori, which is a normal zope catalog - from Add... menu -)
  • Go to Indexes tab and add an index called "keys", type KeywordIndex
  • Go to the Metadata tab and and a metadata called "keys"

Step 2

Add a Python Script with this code:

# File format: author|description
data = context.the_folder['the file']
autori = str(data)
#or context.the_folder.getProperty('authors') if you prefer PropertySheet.

# clear the catalog
context.catalogo_autori.manage_catalogClear()
# cicle through the file line by line 

for aut in autori.split('\n'):
 record = {}
 toks = aut.split('|')
 autore = toks[0]
 description = toks[1]
 record[autore] = ''
 record[description] = ''
 context.catalogo_autori.catalog_object(record, autore)

An alternative could be to use a dtml object (in a temporary folder, so it does not touch the FileStorage) with a PropertySheet and catalog the properties, using a better name than "keys" and having two distinct values for "autore" and "description" (script not tested, should be slower in indexing but a win in metadata extraction):

# File format: author|description
data = context.the_folder['the file']
autori = str(data)
#or context.the_folder.getProperty('authors') if you prefer PropertySheet.

# clear the catalog
context.catalogo_autori.manage_catalogClear()
# cicle line by line the file

for aut in autori.split('\n'):
 toks = aut.split('|')
 autore = toks[0]
 description = toks[1]
 context.temp_folder.manage_addProduct['OFSP'].manage_addDTMLDocument('test')
 context.temp_folder.author.manage_addProperty('autore', autore, 'string')
 context.temp_folder.author.manage_addProperty('description', description, 'string')
 context.catalogo_autori.catalog_object(record, autore)
 context.temp_folder.manage_delObjects(ids=['test']

Step 3

Run it, you will see the catalog being populated by the lines of the File. You've to run this every time you upload a new file or edit it TTW, to catalog the new entries.

How do this works?

Cause dictionaries have a method called "keys()" (which return the list of the dictionary keys) this can be used to store and retrieve values, so when the catalog search for the method "keys" for the Index "keys", it gets back a list, which is stored in the metadata "keys".

So you can search using the "keys" Index, and retrieve the result using the "keys" Metadata.

For example, you can retrive all the entries with:

for autore in context.catalogo_autori():
 print autore.keys

This print the metadata keys, which stores the autore and the description, as a list. The list is unordered, so you have to do some trick to understand which is the author and which the description.

For example, I do:

record['1' + autore] = ''
record['2' + description] = '' 

and search with (the_author is a string with the author I'm looking for the description):

result = context.catalogo_autori(keys='1' + the_author)
data = result[0].keys # this is the Metadata called "keys"
data.sort() # with sort(), author is the first in the list, description the second
#strip out '1' and '2'
author = data[0][1:]
description =  data[1][1:]

Further information

Contact me, or add a comment, for improving and suggestions.

Comments (2)

Christian Vielma Apr 29, 2009 07:39 PM
Hi! this is a very good article and very interesting too. I may suggest you to indicate a little better the paths of the things that you do in order to show it more clearly.

I had some troubles doing some steps, but it worked to me.
Yuri Sep 30, 2009 07:53 AM
I'm glad it helped you :)can you post your script, so we can improve the how to? Thanks :)