When we reference “white noise” in modeling, the “noise” is in reference to there being no set pattern and all variations being random. The “white” is in reference to all the frequencies being represented equally.
You may have heard of “white noise” when it comes to signal processing, but this is not the only application for the term. It is also a way to describe data in modeling.
White noise can be most easily defined as the variations in data that are unexplainable by any regression model.
There are some clear benefits to using white noise when building time series models:
Having a model for the data is necessary for all statistical tests. Being able to model white noise helps make sense of data.
One benefit of white noise is that it can be used to check for errors as well as being able to determine if a model is wrong. This can help protect the data scientist from unfavorable results.
Using a white noise model is helpful for demonstrating things like thermal noise in real-world systems.
The white noise model is important to understand for the following reasons:
Should you find that your data is essentially white noise surrounding a fixed level, you should fit a model around that fixed level.
By understanding white noise, you could show the strength of your model if it turns out that the residual errors you have are merely white noise.
The white noise model is important to understand on the way to working with the “random walk” model.
A company is wanting to look at the changes in its inventory of several items over its history. To do this, a great deal of data was collected from inventory managers and the organization’s records. When mapped out, however, some correlations seemed questionable. After some testing, it was found that the data contained a significant amount of white noise.
Here are some best practices to consider when working with white noise models:
Running tests to determine if data is white noise should be one of the first things that data scientists do. This way, there is little time wasted fitting models into sets that do not have meaningful information to extract.
If it turns out that a set of data is not white noise once it has been fitted to a model, testing for white noise on residual errors will give an indicator as to how much information can be extracted from the data.
A couple of useful techniques for analyzing time series data to determine if it is white noise would be the Ljung-Box test and autocorrelation plots.
In time series data, it will be represented by Yi = Li + Ni. For instances where the current level is proportionate to the random variation extent, there is a multiplicative version that is expressed as Yi = Li * Ni.
“Random walk” models are created from data that appears to be highly correlated but is actually auto-correlated white noise.
Some examples would be the autoregressive model, the Gaussian AR(p) model, the MA(q) moving average model, and the random walk model.
There may be times when you run across variations in data that are simply unexplainable. This data, nevertheless, will need to be accounted for. Thankfully, if you have an understanding of white noise and how to present it, this should not create a problem.