This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.4.1.3. sklearn.datasets.fetch_20newsgroups_vectorized

sklearn.datasets.fetch_20newsgroups_vectorized(subset='train', data_home=None)

Load the 20 newsgroups dataset and transform it into tf-idf vectors.

This is a convenience function; the tf-idf transformation is done using the default settings for sklearn.feature_extraction.text.Vectorizer. For more advanced usage (stopword filtering, n-gram extraction, etc.), combine fetch_20newsgroups with a custom Vectorizer or CountVectorizer.

Parameters :

subset: ‘train’ or ‘test’, ‘all’, optional :

Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both, with shuffled ordering.

data_home: optional, default: None :

Specify an download and cache folder for the datasets. If None, all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

Returns :

bunch : Bunch object

bunch.data: sparse matrix, shape [n_samples, n_features] bunch.target: array, shape [n_samples] bunch.target_names: list, length [n_classes]