This documentation is for scikit-learn version 0.11-gitOther versions


If you use the software, please consider citing scikit-learn.

This page

2.4.1. Tutorial setup

The following assumes you have extracted the source distribution of this tutorial somewhere on your local disk. Alternatively you can use git to clone this repo directly from github onto your local disk:

% git clone
% cd doc/tutorial/text_analytics

In the following we will name this folder $TUTORIAL_HOME. It should contain the following folders:

  • data - folder to put the datasets used during the tutorial
  • skeletons - sample incomplete scripts for the exercices
  • solutions - solutions of the exercices

You can aleardy copy the skeletons into a new folder named workspace where you will edit your own files for the exercices while keeping the original skeletons intact:

% cp -r skeletons workspace Install scikit-learn build dependencies

Please refer to the scikit-learn install page for per-system instructions.

You must have numpy, scipy, matplotlib and ipython installed:

  • Under Debian or Ubuntu Linux you should use:

    % sudo apt-get install build-essential python-dev python-numpy \
      python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib \
  • Under MacOSX you should probably use a scientific python distribution such as Scipy Superpack

  • Under Windows the Python(x,y) is probably your best bet to get a working numpy / scipy environment up and running.

Alternatively under Windows and MaxOSX you can use the EPD (Enthought Python Distribution) which is a (non-open source) packaging of the scientific python stack. Build scikit-learn from source

Here are the instructions to install the current master from source on a POSIX system (e.g. Linux and MacOSX):

% git clone
% cd scikit-learn

You can then build it locally and install this working directory as an “editable” python package:

% python build_ext -i
% pip install -e .

Alternatively you can install the library globally (or in a virtualenv):

% python build
% sudo python install

You should also be able to launch the tests from anywhere in the system (if nose is installed) with the following:

% nosetests sklearn

The output should end with OK as in:

Ran 589 tests in 36.876s


If this is not the case please send a mail to the scikit-learn mailing list including the error messages along with the version number of all the afore mentioned dependencies and your operating system.

In the rest of the tutorial, the path to the scikit-learn source folder will be named $SKL_HOME.

As usual building from source under Windows is slightly more complicated. Checkout the build instructions on the scikit-learn website. Download the datasets

Machine Learning algorithms need data. Go to each $TUTORIAL_HOME/data sub-folder and run the script from there (after having read them first).

For instance:

% cd $TUTORIAL_HOME/data/languages
% less
% python