.. _example_document_classification_20newsgroups.py:


======================================================
Classification of text documents using sparse features
======================================================

This is an example showing how the scikit-learn can be used to classify
documents by topics using a bag-of-words approach. This example uses
a scipy.sparse matrix to store the features instead of standard numpy arrays.

The dataset used in this example is the 20 newsgroups dataset which will be
automatically downloaded and then cached.

You can adjust the number of categories by giving there name to the dataset
loader or setting them to None to get the 20 of them.

This example demos various linear classifiers with different training
strategies.

To run this example use::

  % python examples/document_classification_20newsgroups.py [options]

Options are:

  --report
      Print a detailed classification report.
  --confusion-matrix
      Print the confusion matrix.


**Python source code:** :download:`document_classification_20newsgroups.py <document_classification_20newsgroups.py>`

.. literalinclude:: document_classification_20newsgroups.py
    :lines: 31-