Deciding What to Try Next
If you are developing a machine learning system or trying to improve the performance of a machine learning system, how do you go about deciding what are the proxy avenues to try next?
If you find that this is making huge errors in this prediction. What should you then try mixing in order to improve the learning algorithm?
One thing they could try, is to get more training examples. But sometimes getting more training data doesn’t actually help.
Other things you might try are to well maybe try a smaller set of features.
There is a pretty simple technique that can let you very quickly rule out half of the things on this list as being potentially promising things to pursue. And there is a very simple technique, that if you run, can easily rule out many of these options, and potentially save you a lot of time pursuing something that’s just is not going to work.
Machine Learning Diagnostics, and what a diagnostic is, is a test you can run, to get insight into what is or isn’t working with an algorithm, and which will often give you insight as to what are promising things to try to improve a learning algorithm’s performance.
Evaluating a Hypothesis
If there is any sort of ordinary to the data. That should be better to send a random 70% of your data to the training set and a random 30% of your data to the test set.
Model Selection and Train_Validation_Test Sets
To send 60% of your data’s, your training set, maybe 20% to your cross validation set, and 20% to your test set.
Diagnosing Bias vs. Variance
The training set error, will be high. And you might find that the cross validation error will also be high. It might be a close. Maybe just slightly higher than a training error. The algorithm may be suffering from high bias. In contrast if your algorithm is suffering from high variance.
Regularization and Bias_Variance
Looking at the plot of the whole or cross validation error, you can either manually, automatically try to select a point that minimizes
the cross-validation error and select the value of lambda corresponding to low cross-validation error.
Learning curves is often a very useful thing to plot. If either you wanted to sanity check that your algorithm is working correctly, or if you want to improve the performance of the algorithm.
To plot a learning curve, is plot j train which is, say, average squared error on my training set or Jcv which is the average squared error on my cross validation set. And I’m going to plot that as a function of m, that is as a function of the number of training examples.
In the high variance setting, getting more training data is, indeed, likely to help.
Deciding What to Do Next Revisited
- getting more training examples is good for high variance
- a smaller set of features fixes high variance
- adding features usually is a solution for fixing high bias
- similarly, adding polynomial features
- decreasing lambda fixes fixes high bias
- increasing lambda fixes high variance
It turns out if you’re applying neural network very often using a large neural network often it’s actually the larger, the better.
Using a single hidden layer is a reasonable default, but if you want to choose the number of hidden layers, one other thing you can try is find yourself a training cross-validation, and test set split and try training neural networks with one hidden layer or two hidden layers or three hidden layers and see which of those neural networks performs best on the cross-validation sets.