TextIndexNG
TextIndexNG is a new fulltext index for Zope and is the most feature-complete solution for fulltext indexing under Zope.
Current release
No stable release available yet.
Project Description
Features
- DocumentConverters
- StemmerSupport for 13 languages
- SimilaritySearch for english text (based on the Levenshtein distance)
- NearSearch,
- PluggableParsers
- extended StopWords support
- full integration in ZCatalog
- TestFunctionality through ZMI
- ExtensibleArchitecture
- being MoreEfficient than the current !TextIndex
- full globbing support (wildcard search)
- NormalizationSupport (e.g. reducing accented characters to their base form)
- full UnicodeAwareness
- Relevance ranking of search results added. Searches are now ranked using an extended cosine measure. The cosine measure is based on a vector model and calculates the document "score" based on the frequency of the query terms inside the document result set.
- Much faster phrase/near search: the old implementation of TextIndexNG had to perform a very expensive job at query time when phrase/near search was performed. Re-using the !WidCode module of !ZCTextIndex made this operation less expensive.
- Left-truncation added: TextIndexNG can be configured creation-time time to support left-truncation (means you can search for "*suffix") Left-truncation is an option because this feature requires a second reverted index inside the lexicion and much more memory!
- optional auto-expansion support: This optional feature allows you to get better search results when some of the query terms could not be found. The index expands a query term "foo" to "foo*" if there was no hit for "foo". This expansion is currently global for the index. This feature will be available on a per-query basis in a later version. (Auto-expansion will be extended in a later version to search for similiar terms)
- improved HTML converter: now using Chris Withers "Strip-o-Gram" module instead of the Strip-Tag-Parser
- added converter for text/sgml
- Similarity search (soundex, metaphone, doublemetaphone) dropped and replace with a more general approach and language indepedant approach using the Levenshtein distance.
- range searches like "Fi..Foo"
- substring searches "substring"
