CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex
Category: Services
—
Other products by this author
Current release
No stable release available yet.
If you are interested in getting the source code of this project, you can get it from the code repository.
Experimental releases
There are no experimental releases available at the moment.
Project Description
- Project resources
CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text stored as Unicode. It uses a simple, but workable, "hack" instead of trying to do real word splitting from dictionaries. Compared to a dictionary based word splitter, this results in a bigger index and more matches than necessary, but it is a cheap price to pay for the reduced complexity.
Feature
- compatible with defualt English white space splitter
- support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5. provide 3 splitters(more to come):
- 'CJK splitter' : support unicode/utf-8 encoding.
- 'CJK GB splitter' : support unicode/gb18030/gbk/gb2312/mbcs encodings.
- 'CJK BIG5 splitter' : support unicode/big5/mbcs encodings
About ZOpen
ZOpen is a professional Zope/Plone consulting company located in Shanghai, China. We are also runs CZUG.org (China Zope User Group). We are trying to make Zope/CMF/Plone works for the Chinese people.