The world of data consists of things that you measure, and things that you count. The terms attribute data and discrete data are similar but distinct enough to warrant a closer look. Let’s explore the differences so you will have a better understanding about attribute data, how to use it, what the advantages and disadvantages there might be, and some best practices in collecting and analyzing that type of data.
Overview: What is attribute data?
In the world of data, there are things we measure and things we count. Data that we derive from measuring things is called continuous data. A good definition of continuous data is that it is measurable by some measuring device (e.g., stopwatch, scale, tape measure), it can take on any value across a continuum of possible values, and it can be logically subdivided. For example, your height can be measured with a tape measure, it can take on any value between a continuum of possible values, and it can be logically subdivided into feet, inches, one-quarter inches, one-eighth inches, etc. Continuous data is valued because of the precision allowed by the logical subdivision of values.
Discrete or attribute data are things that can be counted. Discrete data can be further refined into discrete numeric data and discrete attribute data. Examples of discrete numeric data might be: the number of errors on your invoice, the number of rejected parts on your manufacturing line, and the number of people on hold waiting for your customer service rep to pick up the phone.
Discrete attribute data is a little different. This type of data will assign a numeric value to some qualitative characteristic. If these qualitative characteristics have a logical order, we can refer to them as discrete ordinal data. A classic example is the Likert scale. Here we can order some attributes such as: Strongly agree, Moderately agree, Neutral, Moderately disagree, and Strongly disagree. In our survey we would assign a numeric value to them such as 5, 4, 3, 2, or 1. We can then count the number in each category.
We can also have some discrete attribute data that is not ordered. For example, we can define our attributes in terms of types of product. A glass company may categorize its products as laminated glass, tempered glass, insulated glass, and coated glass. There is no logical order or preference, they are just different. I can assign a numerical code to them and can even count the number of each.
Why does it make a difference what kind of data we have? Because the type of analytical tool we use is based upon the type of data you have. As stated earlier, we prefer continuous data because it is more robust or flexible and provides a greater refinement of the data. Many people are tempted to collect continuous data and then convert it to discrete or attribute data. For example, you are collecting delivery time for each order. That would be continuous data. Unfortunately, you then converted it into a binary attribute data consisting of on time/not on time. By doing so, you lose a lot of information that would likely be useful in analyzing that process.
3 drawbacks of attribute data
While it may seem it is easier to understand and apply, attribute data has many drawbacks that detract from its usefulness.
1. Requires more data for analysis
You can quickly gain insight into a process with continuous data. You need a considerably larger sample size of attribute data to understand the underlying process.
2. Measurement system
The basis of any good data set is the accuracy and precision of the measurement system capturing the data. Attribute data relies on a human to collect the data. This is inherently worse than what a measurement device would be.
3. Operational definition
Unless there is an agreed-upon definition of the attribute you’re collecting data on, there is a strong likelihood that there may be confusion as to what you are really collecting data on. What does “Strongly agree” really mean? Different people may have different definitions of this term. That has to be resolved so you can have confidence that everyone is collecting the data the same way.
Why is attribute data important to understand?
Making the distinction between attribute and continuous data and even attribute and discrete data is critical to collecting and analyzing your data.
1. Attribute vs. discrete data
While these two terms are often used interchangeably, there is sufficient difference that you must understand in order to properly define and collect your data.
2. Using the correct analytical tool
Using the wrong analytical tool for the data you’ve collected can result in incorrect conclusions.
3. Correct statistics
Understanding the correct tool to use is one challenge. Using the correct statistics to describe your sample and assumed population is another challenge. The purpose of any data collection is to learn about your process. The type of data you collect and use is the foundation for proper analysis.
An industry example of attribute data
In an effort to better understand how employees feel about the company, the head of human resources distributed a survey to its employees. A number of questions were asked with the possible responses being in the form of a Likert scale of Strongly Agree (5), Agree (4), Neutral (3), Disagree (2), and Strongly Disagree (1). They tabulated the results, and an HR Manager prepared a report to be distributed to senior leadership. Prior to dissemination, the manager asked her Master Black Belt (MBB) to review and comment on the presentation.
Unfortunately, rather than treat the data as pure attribute data, which it was, the manager chose to report out the results as if it were pure continuous data, which it was not. She added up all the numbers and calculated averages for all the categories. For example, she reported that for one critical question, the average response was 3. The MBB pointed out that the 3 could have been calculated with half the values being 1 and half being 5. Or half being 2 and half being 4. Or all the values could have been a 3. This made no sense.
In the end, the MBB convinced the manager that it would be better to present the data as attribute data and not try to treat it as continuous. That meant that the results should have been presented as the number of responses for 5, the number of responses for 4 and so forth. Plus the manager could report the values out as percentages. That is, 15% of the responses were for Agree, while 25% were for Strongly Disagree. This was the better way to report out the attribute data.
4 best practices when thinking about attribute data
In most cases, you may delegate your statistical analysis to those more experienced and knowledgeable about statistics. In any case, you should be aware of some of the best practices so you can assess whether your experts are doing what you need them to do.
1. Process stability
Process stability, or common cause variation, is assessed through the use of SPC control charts. If your process is stable, it is then predictable. Therefore, the data that you are collecting should come from a stable process.
2. Plot the data
If possible, always plot the data before embarking upon any complex statistical analysis. A picture is worth a thousand words, so make use of graphs such as frequency diagrams, bar charts, and even control charts.
3. Computer software
The days of doing statistical or graphical analysis are long gone. There is a plethora of software programs, both sophisticated and basic, you can use to do your analysis.
4. Data collection plan
Be sure you have a solid data collection plan that clearly defines what you’re going to collect data on, how you are going to collect it, who is going to collect it, and how much data you need to collect.
Frequently Asked Questions (FAQ) about attribute data
1. Can my attribute data ever be used as continuous data?
Maybe. If you have counts of data, there is enough of them, and they have a large enough range, you can treat them as what some people call pseudo-continuous data. While they may not be technically continuous data, they may behave as such, which can be useful.
2. Is it wise to convert continuous data into attribute or discrete data?
No. Continuous data possesses a lot more information. Taking something like processing time measured in minutes and converting it into a binary category of “less than 10 minutes” and “more than 10 minutes” loses a lot of valuable information and resolution.
3. What is ordinal attribute data?
This is attribute data that has a logical sequence or preference. For example, a college student may be classified as a freshman, sophomore, junior, or senior. There is a logical sequence to that classification. There is no logical order to an attribute like hair color which might be described as black, brown, blonde or gray.
Some final thoughts on attribute data
Attribute data is a form of discrete data. It is represented by counts rather than measurements. It can be numeric, ordinal, non-ordered, and even binary. It’s generally descriptive in nature. If your process is such that it only generates attribute data and that’s all you can collect data on, then that’s what you have to work with. Do not convert continuous data into attribute data.
It is important to properly define and understand the nature of your data so you can utilize the appropriate statistics and analytical tools. Using the wrong tools for the wrong type of data greatly diminishes the value of your analysis and conclusions.