8.4.1.2. sklearn.datasets.fetch_20newsgroups¶
- sklearn.datasets.fetch_20newsgroups(data_home=None, subset='train', categories=None, shuffle=True, random_state=42, download_if_missing=True)¶
Load the filenames of the 20 newsgroups dataset.
Parameters : subset: ‘train’ or ‘test’, ‘all’, optional :
Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both, with shuffled ordering.
data_home: optional, default: None :
Specify an download and cache folder for the datasets. If None, all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
categories: None or collection of string or unicode :
If None (default), load all the categories. If not None, list of category names to load (other categories ignored).
shuffle: bool, optional :
Whether or not to shuffle the data: might be important for models that make the assumption that the samples are independent and identically distributed (i.i.d.), such as stochastic gradient descent.
random_state: numpy random number generator or seed integer :
Used to shuffle the dataset.
download_if_missing: optional, True by default :
If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.