.. _example_document_classification_20newsgroups.py: ====================================================== Classification of text documents using sparse features ====================================================== This is an example showing how the scikit-learn can be used to classify documents by topics using a bag-of-words approach. This example uses a scipy.sparse matrix to store the features instead of standard numpy arrays. The dataset used in this example is the 20 newsgroups dataset which will be automatically downloaded and then cached. You can adjust the number of categories by giving there name to the dataset loader or setting them to None to get the 20 of them. This example demos various linear classifiers with different training strategies. To run this example use:: % python examples/document_classification_20newsgroups.py [options] Options are: --report Print a detailed classification report. --confusion-matrix Print the confusion matrix. **Python source code:** :download:`document_classification_20newsgroups.py ` .. literalinclude:: document_classification_20newsgroups.py :lines: 31-