scikits.learn.feature_extraction.text.CharNGramAnalyzer¶
- class scikits.learn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
Compute character n-grams features of a text document
This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.
Because of this, it can be considered a basic morphological analyzer.
Methods
- __init__(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
- analyze(text_document)¶
From documents to token