scikits.learn.feature_extraction.text.TfidfTransformer¶
- class scikits.learn.feature_extraction.text.TfidfTransformer(use_tf=True, use_idf=True)¶
Transform a count matrix to a TF or TF-IDF representation
TF means term-frequency while TF-IDF means term-frequency times inverse document-frequency:
http://en.wikipedia.org/wiki/TF-IDF
The goal of using TF-IDF instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than feature that occur in a small fraction of the training corpus.
TF-IDF can be seen as a smooth alternative to the stop words filtering.
Parameters : use_tf: boolean :
enable term-frequency normalization
use_idf: boolean :
enable inverse-document-frequency reweighting
Methods
- __init__(use_tf=True, use_idf=True)¶
- fit(X, y=None)¶
Learn the IDF vector (global term weights)
Parameters : X: sparse matrix, [n_samples, n_features] :
a matrix of term/token counts
- transform(X, copy=True)¶
Transform a count matrix to a TF or TF-IDF representation
Parameters : X: sparse matrix, [n_samples, n_features] :
a matrix of term/token counts
Returns : vectors: sparse matrix, [n_samples, n_features] :