This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.7.2.3. sklearn.feature_extraction.text.CharNGramAnalyzer

class sklearn.feature_extraction.text.CharNGramAnalyzer(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)

Compute character n-grams features of a text document

This analyzer is interesting since it is language agnostic and will work well even for language where word segmentation is not as trivial as English such as Chinese and German for instance.

Because of this, it can be considered a basic morphological analyzer.

Methods

analyze(text_document) From documents to token
set_params(**params) Set the parameters of the estimator.
__init__(charset='utf-8', preprocessor=RomanPreprocessor(), min_n=3, max_n=6)
analyze(text_document)

From documents to token

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :
white_spaces = <_sre.SRE_Pattern object at 0x3683310>