In other articles, we’ve discussed discrete data, attribute data, and continuous data. Now it’s time to talk about variable data. Let’s look at what variable data is, contrast it with some of the other types of data, and suggest some best practices for dealing with variable data.
Overview: What is variable data?
Simply put, variable data is the value you get when you measure something with a measuring device (scale, tape measure, stopwatch, etc.) that (1) can take on any value over a continuum of possible values and (2) can be logically subdivided given the resolution of the measuring device.
The term continuous data is used interchangeably with the term variable data. Some examples are weight, volume, time, length, and speed. All are measured, can take on any value, and can be logically subdivided into smaller and smaller units.
By contrast, discrete or attribute data are counted, not measured. You can have 5 people, 10 boxes, or 10 invoice errors. It makes no sense to talk in terms of 5.3 people or 10.636 errors on an invoice.
There is another condition whereby you take counts but can treat the data as something we call pseudo continuous or variable data. This occurs when you have large counts that have a significant range of values, and the values are distributed across that range.
For example, if you wanted to know the average number of cases of a product produced in a day, you could count each day’s production for a month. Counting a case, by definition, would be discrete data. In most situations, that would be a large number. If the range of each day’s production was wide (more subjective than objective), and the values were distributed across the range of values, then you might decide to consider the data as variable and use the appropriate variable data statistical tools.
3 benefits of variable data
While you might not have a choice of the type of data you can collect, you should strive to use variable data as often as you can.
1. Sample size
Variable data does not need as large of a sample size to provide a good understanding of the underlying distribution.
2. Resolution
By having the ability to subdivide the data into smaller and smaller logical values, you gain greater resolution, which will allow you to distinguish between values. If you could only measure in units of 1 foot, you couldn’t discriminate between 2 inches and 5 inches. Or between 4 inches and 8 inches. Everything would be 1 foot.
3. Allows for easy prediction
Using the probability distribution function (PDF), you can predict the probability of a single value occurring, or the probability of some value being larger or smaller than a value of interest.
For example, given the PDF of some normal distribution, you can calculate the probability that your processing time will be greater than 25, or less than 10, or between 10 and 25. You can also calculate the probability of your processing time being 21.
Why is variable data important to understand?
Understanding the type of data you have is important because it determines the type of analysis you will do.
1. Correct statistical tools
The tools for statistical analysis are different for discrete and variable data. Using the wrong tool will result in misleading conclusions and decisions.
2. Correct statistics
While all data distributions can be described by their center, spread, and shape, you use different statistical descriptors for discrete and variable data.
3. Cost implications
Data costs money. You should consider the type of data you want to collect in order to maximize the value of the information and minimize the cost of obtaining that information. Variable data is generally better than discrete data, if you have a choice.
An industry example of using variable data
Steve, a warehouse manager, was required to take a weekly inventory of the cases of product on hand in the warehouse. As a trained Green Belt, he knew that case count was discrete data. He wanted to construct a control chart to monitor the variation of his inventory but wasn’t sure of which one to use so he asked his Black Belt, Bonnie, what she thought.
Bonnie suggested that Steve think about classifying his data as pseudo variable, or continuous data, rather than a discrete count. She explained her thinking by pointing out that the count numbers were quite high (in the thousands) and that it ranged pretty wide from week to week.
Steve agreed and therefore chose the ImR chart to track his inventory levels.
Shown below is what his control chart looked like. He questioned what happened at point 31, when the inventory level went out of the upper control limit.
3 best practices when working with variable data
Data just doesn’t show up on your desk or on your computer. You have to go out and collect it. Here are a few best practices for collecting and analyzing variable data.
1. Use sample size formulas
There are a number of sample size formulas and calculators that will help you determine the minimum sample size you’ll need to achieve your desired degree of confidence and precision.
2. Use statistical software where possible
The days of hand calculations of complex statistical analysis are (fortunately) gone. When using computer software, be sure you select the correct functions that apply to variable data if that’s what you are analyzing.
3. Use graphics to supplement your analysis
Plotting the data will provide you a visual look that can be a powerful directional indicator and give you some foresight as to what you might expect once you do your statistical analysis.
Frequently Asked Questions (FAQ) about variable data
1. What’s the difference between variable and discrete data?
Variable data is something you measure, while discrete data is something you count.
2. What are some examples of variable data?
Dimensions, weights, volumes, speed, and time are all examples of variable data. These are things that are measured by a measurement device and can be logically divided into smaller and smaller units.
3. Is variable data preferred to discrete data?
Yes. You don’t need as much variable data to draw conclusions about your process. And, the greater resolution of variable data allows for greater discrimination between things.
Some final thoughts on variable data
We’ve defined variable data as the values derived from measuring things with a measurement device that can take on any value along a continuum of possible values and can be logically subdivided into smaller and smaller units.
If given a choice, you should try to use variable data to understand your processes and make decisions. The ability to gain greater resolution will allow for greater discrimination between the things you’re measuring. The statistical tools you can use for variable data are more powerful than those used for discrete data — plus, you won’t need as much data to do your analysis.