8.7.2.6. sklearn.feature_extraction.text.Vectorizer¶

Vectorizer(analyzer=WordNGramAnalyzer(charset='utf-8', max_n=1, min_n=1,

preprocessor=RomanPreprocessor(), stop_words='english',

token_pattern=u'\b\w\w+\b'), max_df=1.0, max_features=None, norm='l2', use_idf=True, smooth_idf=True)

Convert a collection of raw documents to a matrix

Equivalent to CountVectorizer followed by TfidfTransformer.

Methods

`fit`
`fit_transform`
`inverse_transform`
`set_params`
`transform`

Vectorizer.fit(raw_documents)¶: Learn a conversion law from documents to array data

Vectorizer.fit_transform(raw_documents, y=None)¶

Learn the representation and return the vectors.

Parameters :

raw_documents: iterable :

an iterable which yields either str, unicode or file objects

Returns :

vectors: array, [n_samples, n_features] :

Vectorizer.inverse_transform(X)¶

Return terms per document with nonzero entries in X.

Parameters :

X : {array, sparse matrix}, shape = [n_samples, n_features]

Returns :

X_inv : list of arrays, len = n_samples

List of arrays of terms.

Vectorizer.set_params(**params)¶

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :	self :

Vectorizer.transform(raw_documents, copy=True)¶

Transform raw text documents to tf–idf vectors

Parameters :

raw_documents: iterable :

an iterable which yields either str, unicode or file objects

Returns :

vectors: sparse matrix, [n_samples, n_features] :

Citing

This page

8.7.2.6. sklearn.feature_extraction.text.Vectorizer¶