scikits.learn.feature_extraction.text.CharNGramAnalyzer¶

class scikits.learn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶

Compute character n-grams features of a text document

This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.