CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex

by Junyong Pan last modified Feb 16, 2011 02:03 AM

CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text. CJKSplitter makes Zope/Plone capable of search CJK words.

Project Description

CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text stored as Unicode. It uses a simple, but workable, "hack" instead of trying to do real word splitting from dictionaries. Compared to a dictionary based word splitter, this results in a bigger index and more matches than necessary, but it is a cheap price to pay for the reduced complexity.

Feature

  • compatible with defualt English white space splitter
  • support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5. provide 3 splitters(more to come):
    • 'CJK splitter' : support unicode/utf-8 encoding.
    • 'CJK GB splitter' : support unicode/gb18030/gbk/gb2312/mbcs encodings.
    • 'CJK BIG5 splitter' : support unicode/big5/mbcs encodings

About ZOpen

ZOpen is a professional Zope/Plone consulting company located in Shanghai, China. We are also runs CZUG.org (China Zope User Group). We are trying to make Zope/CMF/Plone works for the Chinese people.

Current Release

No stable release available yet.

If you are interested in getting the source code of this project, you can get it from the Code repository .

All Releases

Version Released Description Compatibility Status

Comments (0)