6.15.4. scikits.learn.feature_extraction.text.CharNGramAnalyzer¶

class scikits.learn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶

Compute character n-grams features of a text document

This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.

Because of this, it can be considered a basic morphological analyzer.

Methods

analyze(text_document)

__init__(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶

white_spaces = <_sre.SRE_Pattern object at 0x292f988>¶

Contents

6.15.4. scikits.learn.feature_extraction.text.CharNGramAnalyzer¶