How to select the right algorithm for the task
==============================================

To conclude this session here are some practical hints for selecting
the right algorithm when facing a practical problem.

- If the data is high dimensional and sparse (text data), most of the time
  linear classifiers with a bit of regularization will work well.

- If the data is dense, low to medium dimensional: try to further reduce the
  dimensionality with PCA for instance and try both linear and non linear
  models (e.g. SVC with RBF kernel).

- ``SVC`` with gaussian RBF kernel and ``KMeans`` clustering can
  benefit a lot from data normalization either with feature feature with
  scaling using ``Scaler`` or doing whitening with ``RandomizedPCA``
  with ``whiten=True``. Try various values for ``n_components`` with
  grid search to be sure no to truncate the data too hard.

- There is no free lunch: the best algorithm is data-dependent. If
  you try many different models, reserve a held out evaluation set
  that is not used during the model selection process.

A comprehensive practical guide / FAQ / Howto is under work. Stay tuned!

  http://scikit-learn.org