iSixSigma

TaaG Analysis – Fast and Easy for Comparing Trends in Large Data Sets

TaaG (trends at a glance) analysis is a fast way to compare trends of subsets of data across large data sets. It is an ideal tool to use in the Measure and Control phases of DMAIC (Define, Measure, Analyze, Improve, Control) projects.

The value of TaaG analysis is best understood by way of example. Suppose you had the following data that includes three distinct groups tracking the number of negative incidents each group worked on, each month, over a rolling 13-month period.

Table 1: Example of TaaG Analysis
Year 1 Year 2
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec. Jan.
Group 1 34 44 77 70 101 117 95 117 114 137 168 180 192
Group 2 106 91 118 105 95 111 96 105 114 100 92 110 107
Group 3 195 187 195 152 145 116 117 105 101 88 86 93 82

One might wish to make three run charts with trend lines representing the linear regression of each group. This type of view provides a quick visual depiction of which groups are trending toward the positive and which ones are not.

Figure 1: Group 1 – Number of Incidents by Month

Figure 1: Group 1 – Number of Incidents by Month

Figure 2: Group 2 – Number of Incidents by Month

Figure 2: Group 2 – Number of Incidents by Month

Figure 3: Group 3 – Number of Incidents by Month

Figure 3: Group 3 – Number of Incidents by Month

What happens, however, if there are 100, 200 or 500 groups that require a similar type of analysis? The task grows dramatically more difficult. The arduous chore of creating 500 run charts with trend lines is daunting. And after the charts are complete there is still the task of comparing those 500 charts to see which has the most pronounced trends. A much easier and faster solution can be found in TaaG analysis and the construction of a TaaG table.

Building a TaaG Table

As you look at the charts above, notice that each chart can be represented by two key values:

  1. The arithmetic mean of the data points
  2. The slope of the linear regression

Although the average of the three groups is fairly similar, the slopes of the three trend lines are different. Group 1 has a slope that is a positive value, Group 2 has a slope of almost zero and Group 3 has a slope that is a negative value.

Handpicked Content:   Attribute Data: Making the Most of What's Available

A TaaG table is simply a table (like Table 1) which lists all the data points pertaining to the individual groups, with the addition of their arithmetic means and slopes added. Although the equation for each of these two numbers is listed below, the easiest way to calculate them is to simply use the functions already built into your favorite spreadsheet or statistics software.

Equation for Arithmetic Mean

Equation for Arithmetic Mean

Equation for Slope of Linear Regression

Equation for Slope of Linear Regression

The following is an excerpt of an example TaaG table with 500 groups.

Table 2: TaaG Table with 500 Data Groups
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec. Jan. Mean Slope
Group 1 4 1 3 6 3 2 2 4 2 7 8 8 1 3.92 0.22
Group 2 6 7 5 4 4 4 5 6 5 3 2 6 7 4.92 -0.05
Group 3 4 7 8 3 3 3 1 7 2 2 3.08 -0.49
Group 4 17 8 6 5 6 1 1 7 4 1 4.31 -0.94
Group 5 4 1 2 1 6 1 5 8 4 10 12 4.15 0.77
Group 6 2 1 4 3 3 3 4 7 6 10 15 1 4.54 0.66
Groups 7-497 not shown
Group 497 11 13 7 14 17 7 11 7 14 6 5 6 10 9.85 -0.43
Group 498 11 9 8 6 4 2 4 7 4 5 4.62 -0.66
Group 499 12 8 4 3 5 6 5 2 1 3 1 3.85 -0.71
Group 500 1 1 6 4 2 1 7 3 11 8 3.38 0.45

Once the data is in this format it is easy to sort the data by slope to see which group has the most prominent trends. For example, below is the sorted table showing the top 10 most prominently trending groups from the whole data set.

Table 3: TaaG Table Sorted by Slope
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec. Jan. Mean Slope
Group 316 34 44 77 70 101 117 95 117 114 137 168 180 38 99.38 7.12
Group 160 1 4 6 15 12 14 26 26 32 33 32 37 48 22.00 3.61
Group 115 3 10 7 16 17 16 22 29 23 20 32 40 53 22.15 3.23
Group 135 2 4 5 12 11 13 16 15 26 30 35 29 42 18.46 3.14
Group 310 3 5 5 10 15 17 17 23 15 26 28 34 43 18.54 2.92
Group 225 4 3 6 9 11 19 14 18 19 26 22 37 42 17.69 2.90
Group 275 2 5 6 10 8 9 14 15 28 30 23 32 31 16.38 2.65
Group 340 2 5 7 12 9 13 14 20 22 13 30 30 40 16.69 2.64
Group 235 1 3 8 5 17 15 16 14 14 27 33 29 32 16.46 2.61
Group 190 1 8 8 11 13 12 17 14 19 25 17 28 43 16.62 2.44
Handpicked Content:   Analytical Treatment of Discrete Ordered Category Data

In this view you can see that far and away the most significant trend is from Group 316. The slope of 7.12 for this group signifies that on average, for the past 13 months, they have increased to just over seven incidents each month. Since these incidents are negative incidents, this represents a worsening trend. Why is this group having a greater impact than the others? The insight from a TaaG table will help guide a Six Sigma practitioner toward which subsets of data are driving the greatest need for improvement.

It is easy to quickly sort the data using normal spreadsheet software. The analysis is easily repeatable if data is gathered on a regular basis (for example, monthly reporting). This makes TaaG analysis ideal for the Control phase of improvement efforts as well as the Measure phase.

Studying Relative Slope

Another aspect of TaaG analysis that can be valuable is the study of relative slope. With large groups of data, it is likely that some subgroups will trend more significantly than others. The trend relative to the normal performance of that subgroup, however, may be better than other subgroups.

For example, consider the data in Table 3 above. Group 316 has the most significant trend at 7.12, with a mean of 99.38, whereas the second group (Group 160) has a “better” trend of 3.61 and a mean of 22.00. It is noteworthy to consider that while the overall trend of the first group is worse, its trend relative to its mean performance is better. To highlight this point consider Table 4 where the relative slope equals slope/mean.

Table 4: Subgroups of Relative Slopes
Mean Slope Relative Slope
Group 316 99.38 7.12 0.01716
Group 160 22.00 3.61 0.1641

While Group 316 has a “worse” trend, it represents about 7 percent of its mean monthly performance, whereas Group 160 represents about 16 percent of its mean monthly performance. At times it is valuable to sort a TaaG table by the relative slope instead of the slope to see which subgroups of the whole data set are trending the most poorly – relative to their individual performance. Table 5 is an example from the same data set used in Table 4 – except it is sorted by relative slope; Table 5 shows the 10 subgroups out of the 500 that have the greatest relative slopes.

Handpicked Content:   Data Management Plans Can Reduce Project Cycle Times
Table 5: Subgroups with Greatest Relative Slopes
Jan. Feb. March April May June July Aug. Sept. Oct. Nov. Dec. Jan. Mean Slope Relative Slope
Group 425 3 1 2 5 3 5 12 9 5 10 4.23 0.86 0.2039
Group 440 2 3 1 4 9 1 7 8 8 10 4.08 0.83 0.2035
Group 5 1 1 1 1 4 3 5 4 7 9 7 12 4.23 0.85 0.2013
Group 465 1 5 7 3 6 11 11 15 17 18 32 26 11.69 2.31 0.1978
Group 245 1 1 2 2 3 3 3 5 5 5 9 12 3.92 0.77 0.1975
Group 80 2 1 5 4 5 2 5 8 8 12 11 4.85 0.93 0.1916
Group 90 1 1 4 4 4 3 7 6 5 7 12 4.15 0.77 0.1865
Group 185 2 5 7 6 5 5 14 9 13 12 22 7.69 1.43 0.1864
Group 320 1 2 1 3 5 3 2 3 5 15 7 10 4.38 0.81 0.1842
Group 210 1 1 3 2 4 4 6 4 7 7 7 3.54 0.65 0.1832

Note that there is no overlap between the two tables of top 10 subgroups. Although the subgroups with the highest relative slopes do not appear to have as significant an effect on the total data set as a whole, taking notice of relative slopes can often bring insight to poorly trending subgroups. That notice, in turn, can result in movements toward improvement.

Trending – Better or Worse

Note in the examples above the higher the positive value of slope, the “worse” the trend because the data set represents the set of negative incidents occurring. If a subgroup has an increase in incidents, this is an increase in negative impact. However, if the data set is, for example, something like stock prices, the higher the slope the better – it represents increasing stock value.

Therefore, when considering the use of a TaaG table in analysis, take note of the data set and whether positive slopes are “good” or “bad” and sort the TaaG table accordingly.

Summary

TaaG analysis can be useful if there is a need to quickly and easily evaluate large amounts of data and compare trends across subsets of data. In whatever industry the improvement opportunity may exist, TaaG analysis provides a rapid first look at a data set to guide the practitioner into deeper analysis toward the root causes of issues and then enable corrective actions.

Comments 6

  1. Lew Yerian

    I wouldn’t make a decision on this data without knowing the R-squared values. Assuming linearity could lead to incorrect decisions.

  2. Nik

    Brad,

    Thanks for posting the article. I may not use this so much for analysis, but it might have metric/visual management applications. For example, my company manages hundreds of products of varying size and complexity. At the supervisor level, we were looking at charts of individual product performance (<50 products each), however senior management felt this was too much information for their level (<1000 products each). TaaG could be a way to summarize some key statistics at a higher level so senior management can still see where to focus without looking at each of a thousand charts and identify where to focus.

  3. Filiep Samyn

    So this is really nothing more than a simple linear regression which Excel and Minitab will create without using any formulas. Why do we have to introduce a new name for a standard tool?
    I do not see how this will steer to root causes, possible causes maybe.
    What do you suggest when we have multiple x’s that influence the y? TaaG will only work with a single or two x’s, beyond that graphical analysis breaks down. You stated large data sets. I would definitely want some software to crunch the numbers for me.
    When comparing slopes between regression lines you will need to use hypothesis testing, you cannot compare point estimates for the slopes to determine differences. As a minimum you will need CIs for the betas.

Leave a Reply