In Classification of Hand-written Digits (1), I qualitatively described the machine learning task of classification and sketched out two classic examples, then went into more detail about another well-known example: the classification of hand-written digits. The challenge here is to program a classifier that correctly predicts the value represented in a scanned image of a hand-written digit.

### Analysis Strategy

In any analysis, it pays to have a sensible strategy for how to proceed. I typically go about a machine learning analysis like this:

• Plot and summarize training data to facilitate understanding of the problem you’re trying to solve; ideally, this can reveal underlying relationships in the data and point you toward avenues for further analysis. It’s generally useful to plot full distributions for each feature variable (plus interesting multivariate combinations) as well as calculate the usual summary statistics (mean, variance, skewness, etc.).
• Choose a learning algorithm and implement a basic version that fits your problem/dataset and produces a baseline result to which you can compare subsequent iterations. The choice of algorithm is important but subjective, since for a given task there may be many valid options. Implementing a bare-bones version first is relatively quick and easy, and gives you an idea of how well the algorithm works with your problem/dataset. If it doesn’t work out, for whatever reason, at least you haven’t wasted your time implementing all the bells and whistles!
• Optimize algorithm parameters to improve upon your baseline performance. Obviously the tweakable parameters depend on the algorithm, e.g. the number of hidden layers in an artificial neural network, the number of neighbors to include in a nearest-neighbor algorithm, etc. A common technique for estimating your model’s prediction error is cross-validation, in which only a fraction of the training set is used to train your model, and the remainder (called the validation set) is used to verify your model’s predictions using the actual values. The idea is to choose model parameters that minimize prediction errors.
• Run optimal model on test data to get your final result, and interpret/predict away! If you’ve done a good job in steps 1–3, you should feel confident that the result is on solid statistical footing.

The specifics of one’s analysis strategy depend on the problem at hand, of course, but in general, what I’ve written above will get you moving in the right direction. :)

### Visualizing Hand-written Digits

As it turns out, there’s not a whole lot to visualize in the hand-written digits training set. Each digit (class) is represented by a few thousand examples, and one of the biggest challenges is in correctly accounting for the variability within each class. So, on that note, I produced a plot showing ten examples of each of the ten digits:

Besides the obvious variations in shape (e.g. 4s closed/open, 7s with/without hanging lines or cross-bars, etc.), one thing I noticed is that some examples are much bolder than others. To see if that varied systematically by digit (which could potentially be a useful discriminating feature!), I plotted the mean pixel brightness (on a scale from 0 to 255, i.e. white to black) for each class: