How to search in file fields in custom content-types

by Kees Hink last modified Jun 12, 2009 12:52 PM
If your custom content type contains a file upload field, there's a good chance you want the uploaded files to be indexed, too. The key to get this done is TextIndexNG. This how-to will hopefully get you started quickly. Note that it's intended for Plone 2.5.

Purpose

Suppose you have a content-type (called "MyType", of course) which has a file upload field (called "file"). Suppose you want the site search to also look into the uploaded files, and include their containing MyType objects in the search results if the contents match the search. Alternatively, you way want to scan the files for certin words through a script. In both cases, you want this file field indexed.

Prerequisities

You'll need:

  • Plone: the product TextIndexNG (http://plone.org/products/textindexng/)
  • Your operating system: converters on  (http://www.zopyx.de/projects/TextIndexNG3/documentation/external-converters)

This how-to is intended for Plone 2.5.

Step by step

  • Install TextIndexNG3
    • You might try easy_installing Products.TextIndexNG3, possibly in a workingenv (see referenced items) to prevent "corrupting" your system-wide python 2.4.
    • Make sure it downloads the TextIndexNG3.1 branch which is intended for use with Plone 2.5.
    • Example:
      easy_install-2.4 "Products.TextIndexNG3<3.2"
  • Check that it works, for example by adding a "File" item somewhere (so don't add a MyType item, we'll get to that), and confirm its contents get indexed.
  • Import textindexng stuff in your class MyType
    from textindexng.interfaces import IIndexableContent
    from textindexng.content import IndexContentCollector as ICC
    
  • Make your class MyType implement IIndexableContent
    implements(IIndexableContent)
    
  • Add a method indexableContent to MyType as defined below: (do not call "getFile()" it if your file upload field is called differently)
        def indexableContent(self, fields):
            if 'SearchableText' in fields:
                file = self.getFile()
                if file:
                    # file is a file object
                    mimetype = file.getContentType()
                    icc = ICC()
                    icc.addBinary('SearchableText', 
                                  str(file),
                                  mimetype,
                                  'iso-8859-15',
                                  None)
                    return icc
            return None
    
  • In the TextIndexNG Preferences, click "recreate indexes"
  • You should now be able to search for words that appear in your file upload field

Further reading

See "related items" below. Other hints for further reading are welcome.

Updates to this how-to

August 2008:

  • Added reference to "Integrating MS Office and PDF files into Plone 2" and "How to hack your Zope 2 instance so that you can install Python packages using easy_install"
  • Added a way to install TextIndexNG3 using easy_install