.. _example_document_classification_20newsgroups.py: ====================================================== Classification of text documents using sparse features ====================================================== This is an example showing how the scikit-learn can be used to classify documents by topics using a bag-of-words approach. This example uses a scipy.sparse matrix to store the features instead of standard numpy arrays and demos various classifiers that can efficiently handle sparse matrices. The dataset used in this example is the 20 newsgroups dataset which will be automatically downloaded and then cached. You can adjust the number of categories by giving their names to the dataset loader or setting them to None to get the 20 of them. **Python source code:** :download:`document_classification_20newsgroups.py ` .. literalinclude:: document_classification_20newsgroups.py :lines: 18-