Enable Indexing of pdf and word docs with Windows in Five steps:three minutes of your times without problems!

Very Simple Five istructions to index pdf and word documents in Plone with Windows

Purpose

Clear written and useful istructions for indexing pdf and word docs on windows.

 

Step by step:  only Five!

 

First: install OpenOffice.org on your system. It's very simple to use and replace very good Microsoft Office (c) at least for most users.

Secondly, take the Windows xpdf  package (http://www.foolabs.com/xpdf/download.html). You can download the Windows version, following this link: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl1-win32.zip

Third: unpacking the files. Zip xpdf inside C: \ WINDOWS \ system32

Fourth launch Plone, check inside Plone/portal_transform if there is the transform    word_to_html

Fifth: click on Add Transform; Enter in ID:  pdf_to_text
Enter in Module: Products.PortalTransforms.transforms.pdf_to_text

Now you can post your word and pdf documents and will be automatically indexed.

To find out what has been indexed of incorporated  documents you can look at SearchableText inside Plone/portal_catalog/Catalog/   for documents tracked in the index.

 

Further information

For a POSIX guide, see http://plone.org/documentation/how-to/enable-full-text-indexing-of-word-documents-and-pdfs-in-plone-3-0-gnu-linux/?searchterm=index%20pdf

 

For an alternative "hard" Windows guide, see: http://plone.org/documentation/how-to/enable-full-text-indexing-of-word-documents-and-pdfs-in-plone-3-0-windows/?searchterm=indexing%20windows

 

I need a help

Posted by rajkumar at Mar 05, 2008 01:57 PM
I logged into the Plone on my local computer.
I got the following exception on the step (Very Simple Five istructions to index pdf and word documents in Plone with Windows).

Can any of you help me.

Some thing wrong ?

Posted by marie christine olchanski at Mar 28, 2008 12:07 PM
Traceback (innermost last):
  Module ZPublisher.Publish, line 119, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 42, in call_object
I follow the step, except for OpenOffice as I have Word.
And after the fifth step I have this error :

  Module Products.PortalTransforms.TransformEngine, line 389, in manage_addTransform
  Module Products.PortalTransforms.TransformEngine, line 263, in _mapTransform
  Module Products.MimetypesRegistry.MimeTypesRegistry, line 218, in lookup
   - __traceback_info__: ("'BROKEN'", 'BROKEN')
  Module Products.MimetypesRegistry.MimeTypesRegistry, line 449, in split
MimeTypeException: Malformed MIME type (BROKEN)

Where am I wrong ?
Plone3, windows XP...

Note

Posted by Stefano Saltannecchi at Mar 28, 2008 03:46 PM
Note that you may install OpenOffice with the complete UNO product to reach the complete transform of word files. See openoffice.org site to obtain more istructions.
All Plone versions from 3.0.0 need or OpenOffice (with UNO)or other programs to transform Word files (until word2003 version). If you want to transform Word2007 files you must use other specific Plone Products.

Good with pdf files, still problems with doc files

Posted by marie christine olchanski at Apr 02, 2008 09:47 AM
On my PC where I have Word, the search is OK for pdf and .doc

On other machine, I have OpenOffice.
It's OK for pdf files, but not for doc files
But I install, at the five step, OpenOffice...
Can it be the reason ? Does I have to re-install in your order ?

catalog

Posted by marie christine olchanski at Apr 02, 2008 09:53 AM
I forget to say that in both case (see above), the doc files are in the catalog...

Can install Plone in server

Posted by ernesto fuentes at Oct 30, 2008 03:57 PM
Can I install Plone in a server provider?

No indexing and this here doesn't work :-(

Posted by Carsten Kirck at Jun 01, 2009 08:43 PM
I installed Plone 3.2.2. From the products description there should be nothing else necessary for plone to index word and pdf-files.
I tried to updload some word-files - none of the content was indexed!
Then I found this description - followed it but - nothing happens but an error! After having tried to add the portal_transform as decribed above, there is only an error message and "I'm glad, as I am an administrator, I am able to read the message". That message is as helpful as an orange!

Why is this not working "out of the box" as promised? Or is anybody able to write a description how to make full-text indexing of Office-documents (I would also like to index Powerpoint, Excel, Openoffice, ...) with plone 3?

At the moment I use plone 2.5 and ploneexfile and I really would like to upgrade to plone 3...

If you could help, I really would appreciate your answer. My mail-address is: carsten.kirck@gmx.net

Best Wishes!
Carsten.