How to search in file fields in custom content-types
Purpose
Suppose you have a content-type (called "MyType", of course) which has a file upload field (called "file"). Suppose you want the site search to also look into the uploaded files, and include their containing MyType objects in the search results if the contents match the search. Alternatively, you way want to scan the files for certin words through a script. In both cases, you want this file field indexed.
Prerequisities
You'll need:
- Plone: the product TextIndexNG (http://plone.org/products/textindexng/)
- Your operating system: converters on (http://www.zopyx.de/projects/TextIndexNG3/documentation/external-converters)
This how-to is intended for Plone 2.5.
Step by step
- Install TextIndexNG3
- You might try easy_installing Products.TextIndexNG3, possibly in a workingenv (see referenced items) to prevent "corrupting" your system-wide python 2.4.
- Make sure it downloads the TextIndexNG3.1 branch which is intended for use with Plone 2.5.
- Example:
easy_install-2.4 "Products.TextIndexNG3<3.2"
- Check that it works, for example by adding a "File" item somewhere (so don't add a MyType item, we'll get to that), and confirm its contents get indexed.
- Import textindexng stuff in your class MyType
from textindexng.interfaces import IIndexableContent from textindexng.content import IndexContentCollector as ICC
- Make your class MyType implement IIndexableContent
implements(IIndexableContent)
- Add a method indexableContent to MyType as defined below: (do not call "getFile()" it if your file upload field is called differently)
def indexableContent(self, fields): if 'SearchableText' in fields: file = self.getFile() if file: # file is a file object mimetype = file.getContentType() icc = ICC() icc.addBinary('SearchableText', str(file), mimetype, 'iso-8859-15', None) return icc return None - In the TextIndexNG Preferences, click "recreate indexes"
- You should now be able to search for words that appear in your file upload field
Further reading
See "related items" below. Other hints for further reading are welcome.
Updates to this how-to
August 2008:
- Added reference to "Integrating MS Office and PDF files into Plone 2" and "How to hack your Zope 2 instance so that you can install Python packages using easy_install"
- Added a way to install TextIndexNG3 using easy_install
