TextIndexNG
Category: Services
—
Other products by this author
Current release
No stable release available yet.
Experimental releases
Upcoming and alpha/beta/candidate releases
- Alpha releases should only be used for testing and development.
- Beta releases and Release Candidates are normally released for production testing, but should not be used on mission-critical sites.
- Always install on a separate test server first, and make sure you have proper backups before installing.
Project Description
- Project resources
Features
- DocumentConverters
- StemmerSupport for 13 languages
- SimilaritySearch for english text (based on the Levenshtein distance)
- NearSearch,
- PluggableParsers
- extended StopWords support
- full integration in ZCatalog
- TestFunctionality through ZMI
- ExtensibleArchitecture
- being MoreEfficient than the current !TextIndex
- full globbing support (wildcard search)
- NormalizationSupport (e.g. reducing accented characters to their base form)
- full UnicodeAwareness
- Relevance ranking of search results added. Searches are now ranked using an extended cosine measure. The cosine measure is based on a vector model and calculates the document "score" based on the frequency of the query terms inside the document result set.
- Much faster phrase/near search: the old implementation of TextIndexNG had to perform a very expensive job at query time when phrase/near search was performed. Re-using the !WidCode module of !ZCTextIndex made this operation less expensive.
- Left-truncation added: TextIndexNG can be configured creation-time time to support left-truncation (means you can search for "*suffix") Left-truncation is an option because this feature requires a second reverted index inside the lexicion and much more memory!
- optional auto-expansion support: This optional feature allows you to get better search results when some of the query terms could not be found. The index expands a query term "foo" to "foo*" if there was no hit for "foo". This expansion is currently global for the index. This feature will be available on a per-query basis in a later version. (Auto-expansion will be extended in a later version to search for similiar terms)
- improved HTML converter: now using Chris Withers "Strip-o-Gram" module instead of the Strip-Tag-Parser
- added converter for text/sgml
- Similarity search (soundex, metaphone, doublemetaphone) dropped and replace with a more general approach and language indepedant approach using the Levenshtein distance.
- range searches like "Fi..Foo"
- substring searches "substring"