8.4.1.2. sklearn.datasets.fetch_20newsgroups¶
- sklearn.datasets.fetch_20newsgroups(data_home=None, subset='train', categories=None, shuffle=True, random_state=42, download_if_missing=True)¶
- Load the filenames of the 20 newsgroups dataset. - Parameters : - subset: ‘train’ or ‘test’, ‘all’, optional : - Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both, with shuffled ordering. - data_home: optional, default: None : - Specify an download and cache folder for the datasets. If None, all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders. - categories: None or collection of string or unicode : - If None (default), load all the categories. If not None, list of category names to load (other categories ignored). - shuffle: bool, optional : - Whether or not to shuffle the data: might be important for models that make the assumption that the samples are independent and identically distributed (i.i.d.), such as stochastic gradient descent. - random_state: numpy random number generator or seed integer : - Used to shuffle the dataset. - download_if_missing: optional, True by default : - If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site. 
