9.16.2.3. sklearn.feature_extraction.text.CharNGramAnalyzer¶
- class sklearn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
Compute character n-grams features of a text document
This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.
Because of this, it can be considered a basic morphological analyzer.
Methods
analyze(text_document) From documents to token set_params(**params) Set the parameters of the estimator. - __init__(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)¶
- analyze(text_document)¶
From documents to token
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- white_spaces = <_sre.SRE_Pattern object at 0x49a8258>¶