scikits.learn.feature_extraction.text.TfidfTransformer¶
- class scikits.learn.feature_extraction.text.TfidfTransformer(use_tf=True, use_idf=True)¶
- Transform a count matrix to a TF or TF-IDF representation - TF means term-frequency while TF-IDF means term-frequency times inverse document-frequency: - http://en.wikipedia.org/wiki/TF-IDF - The goal of using TF-IDF instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than feature that occur in a small fraction of the training corpus. - TF-IDF can be seen as a smooth alternative to the stop words filtering. - Parameters : - use_tf: boolean : - enable term-frequency normalization - use_idf: boolean : - enable inverse-document-frequency reweighting - Methods - __init__(use_tf=True, use_idf=True)¶
 - fit(X, y=None)¶
- Learn the IDF vector (global term weights) - Parameters : - X: sparse matrix, [n_samples, n_features] : - a matrix of term/token counts 
 - transform(X, copy=True)¶
- Transform a count matrix to a TF or TF-IDF representation - Parameters : - X: sparse matrix, [n_samples, n_features] : - a matrix of term/token counts - Returns : - vectors: sparse matrix, [n_samples, n_features] : 
 
