wildcard.pdfpal

by Nathan Van Gheem last modified Oct 12, 2011 02:38 AM

PDF Thumbnail generation, OCR indexing and extra views integrated with plone.app.async

Project Description

Introduction

This package provides some nice integrations for PDF heavy web sites.

  • Generates thumbnails from PDF
  • Adds folder view for pdfs so it can use the generated thumbnail
  • Adds OCR for PDF indexing
  • Everything configurable so you can choose to not use thumbnail gen or OCR
  • Ability to create searchable PDFs with HOCR
  • use the @@async-monitor url to monitor asynchronous jobs that have yet to run

OCR

OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages:

# sudo apt-get install ghostscript tesseract

Searchable PDFs

Requires svn checkout of tesseract version 3.0.1 or 3.0.0 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350

In addition, you'll need exactimage and pdftk installed

# sudo apt-get install exactimage pdftk

Plone 3

  • Requires hashlib

Extra

You can convert all at once by calling the url @@queue-up-all.

Changelog

0.7b1 ~ 2011-01-06

  • fixes for quality and size issues [vangheem]

0.6b2 ~ 2011-01-04

  • fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem]

0.6b1 ~ 2011-01-04

  • added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version.

0.5b5 ~ 2010-12-07

  • did not conditionally import plone.app.async

0.5b4 ~ 2010-12-06

  • better info on async monitor
  • only reindex searchabletext when doing OCR so the modification date on the object does not get set.
  • make sure to catch exceptions so it doesn't leave around files after a bad conversion
  • add colorbox for pdf folder view

0.5b3 ~ 2010-12-02

  • add ability to queue up all pdf files

0.5b2 - 2010-12-02

  • fix async monitor view

0.5b1 - 2010-12-02

  • Initial release

Current Release
wildcard.pdfpal 0.7b1

Released Oct 12, 2011 — tested with Plone 4.1, Plone 4, Plone 3

see docs
More about this release…

Download file Get wildcard.pdfpal for all platforms
wildcard.pdfpal-0.7b1.tar.gz
If you are using Plone 3.2 or higher, you probably want to install this product with buildout. See our tutorial on installing add-on products with buildout for more information.

All Releases

Version Released Description Compatibility Licenses Status
0.7b1 Oct 12, 2011 see docs More about this release…
Plone 4.1
Plone 4
Plone 3
GPL final
0.5b1 More about this release… GPL pre-release

Comments (0)