8.4.1.3. sklearn.datasets.fetch_20newsgroups_vectorized¶
- sklearn.datasets.fetch_20newsgroups_vectorized(subset='train', data_home=None)¶
Load the 20 newsgroups dataset and transform it into tf-idf vectors.
This is a convenience function; the tf-idf transformation is done using the default settings for sklearn.feature_extraction.text.Vectorizer. For more advanced usage (stopword filtering, n-gram extraction, etc.), combine fetch_20newsgroups with a custom Vectorizer or CountVectorizer.
Parameters : subset: ‘train’ or ‘test’, ‘all’, optional :
Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both, with shuffled ordering.
data_home: optional, default: None :
Specify an download and cache folder for the datasets. If None, all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
Returns : bunch : Bunch object
bunch.data: sparse matrix, shape [n_samples, n_features] bunch.target: array, shape [n_samples] bunch.target_names: list, length [n_classes]