08 Aug 2016
Overfitting and why it’s dangerous
In the world of machine learning, predictive performance is key.
A learning machine that can make accurate predictions of future performance based on analysis of past data is an invaluable tool for any business.
In fact, from a business perspective, predictive performance is really the whole point of using machine learning.
A machine that can even somewhat accurately predict the future behaviour of a target audience, customer base, or stock can dictate a strong strategy which will drive revenue.
This is a process that is simultaneously both incredibly simple and complex.
There are several factors that separate effective predictive performance from ineffective predictive performance in machine learning. The ability to analyse data, learning how to separate the “signal” from the “noise” in that data and identifying the key generalisations and patterns about past performance contained in that signal all affect the extent to which future performance will be effectively predicted.
Understanding how to discern between what is “signal” and what is “noise” is key.
The signal and the noise
When sitting in the cinema watching a movie, there are 2 clear information streams we want our brains to focus on: The images on the screen and the film’s soundtrack.
These 2 streams of information are the signal. With these, we are able to follow, understand and enjoy the film we paid to watch.
But, as anyone who has ever gone to the cinema will undoubtedly know,
there are other things going on in our awareness while we are focused on the film and its soundtrack:
– The guy on your left loudly crunching his popcorn
– The couple behind you who are making out
– The kid who keeps getting up to go to the bathroom
– The lady 3 rows down who is texting on her bright mobile phone screen
This information, while possibly louder, closer, or more intrusive than the information being broadcast on the screen or pumped out of the cinema’s speakers, will not help us understand the film we are watching.
Such irrelevant information is the noise.
Luckily for us, our brains have the ability to filter out such irrelevant information. Allowing us to differentiate between signal and noise, ignore what we don’t need, and focus on what we do.
This process is called “selective filtering”.
This allows us to prioritise the dialogue of the film over the crunching of nearby popcorn.
It isn’t a perfect system, but it is pretty close.
Rarely will we be so distracted by irrelevant information that we are rendered incapable of following the plot of the film we are watching (especially if we are watching a Michael Bay film).
Which brings us to overfitting.
Overfitting and why it is dangerous
Overfitting occurs when a learning machine proves incapable of generalising trends from the training data it has been given and attributes equal importance to all variations in the data. This causes the machine to substantially decrease, or even lose altogether, its predictive analysis power.
In overfitting (if we can return to our cinema analogy for a moment), equal credence is given to the sound of popcorn being chewed and to the dialogue of the film.
We remember the scene in the film where one of the characters forgot to turn his phone off, then answered it while running up the aisle toward the exit, whispering “hold on one minute, please”, and attempt to use this information to understand the film’s plot.
What ever happened to that character, anyway? I bet he knows where Dory is!
The problem is obvious.
Thank goodness for selective filtering.
Sadly for our machine learning counterparts, selective filtering does not happen automatically.
We must do it for them.
Every machine learning algorithm has a target function- an end goal. Some data is helpful in completing this target function, and some is not.
The parameters for identifying useful data (that with a causal relation to the target function) and irrelevant data (that with no causal relation to the target function) must be set by humans before effective machine learning can begin.
Begin this process with too many parameters, or not enough training data, and you get overfitting.
Rather than learning to generalise and identify trends, the machine will instead lean towards memorising the training data. This, while allowing the machine to perform exceptionally well when training, is effectively useless when unknown data is added into the mix.
This hampers predictive performance and leaves us with an algorithm that has not learned how to filter information or generalise at all.
While helpful for organising and classifying known data, attempting to use such an algorithm for predicting future data is an exercise in futility.
Sort of like making out in a cinema.