6.15.4. scikits.learn.feature_extraction.text.CharNGramAnalyzer¶
- class scikits.learn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
Compute character n-grams features of a text document
This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.
Because of this, it can be considered a basic morphological analyzer.
Methods
analyze(text_document) - __init__(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
- white_spaces = <_sre.SRE_Pattern object at 0x292f988>¶