# Analysing data

Six Sigma – iSixSigma Forums Old Forums General Analysing data

Viewing 4 posts - 1 through 4 (of 4 total)
• Author
Posts
• #43729

Sinnicks
Participant

What is the best way to analise large amounts of data to indentify the key variable.
For example if I had data for every sec over a 24hr period (temp, speed, load%, Air Pressure, flow, etc.)

0
#139140

Heebeegeebee BB
Participant

I’d generate I/MR or XBAR/R Charts for each variable and then compare/contrast.
Couple Capa Six Packs for each and compare/contrast

0
#139144

Participant

You can using the data mining technique.

0
#139171

Robert Butler
Participant

If we use your example as a talking point then the best way to analyze this data to identify the key variable(s) is to throw it out and start over.
After you have done this sit down with your team and follow the usual steps in the define and measure phases of the effort.  Some of the goals will require an examination of historical data.  This data will probably be of the type cited in your example.  The value of looking at this kind of data during the measure phase of your effort will be to help you better understand what you don’t know.  You may get lucky and actually find a critical variable or two but I wouldn’t count on it.  The reasons are as follows:
1. The data is happenstance data – it is the result of a process which, presumably, was producing an acceptable product.  The fact that such a process exists means any variable which is already known to be important to the final process result is subject to some kind of control.  Control usually manifests itself in the form of severely restricting the range over which a variable is allowed to move.  In other words the odds using happenstance data to see the effect of a known critical variable on the process is next to nothing.  This, in turn, means that the odds of seeing the effect of the known critical variables in relation to the other process variables is also highly improbable..
2. The data is happenstance data – This means it is very unlikely that changes in known critical variables are independent of one another. If the changes in the critical variables aren’t independent of one another you cannot use the data to identify their effects on the process (we are, of course, assuming that #1 doesn’t apply).
3. The data is happenstance data – This means any unknown, uncontrolled critical variable will not only not be recorded but it could very likely have changed as the same time and in the same manner as one or more of the variables whose values were recorded and whose impact is of no consequence.  If it did, and if it was important, and if it moved over a range that did have an impact on the process, your analysis will identify the known variable that trended along with it as the “critical” variable. If you attempt to change your process by changing this known, confounded, and unimportant variable you will go wrong with great assurance and achieve nothing.
What you can do with happenstance data that was gathered as a result of decisions reached in the design and measure phase is to spend a lot of time graphing it looking for trends not only with the final critical process properties but also with respect to how the recorded process variables were trending with one another during the time you were recording their behavior.
In your case happenstance data gathered for every second is going to present you with an additional problem when it comes to trying to use it to get some sense of the process.  This data is highly autocorrelated which means any attempt at formal statistical analysis will require you to have a good understanding of the issues surrounding the analysis of repeated measures.

0
Viewing 4 posts - 1 through 4 (of 4 total)

The forum ‘General’ is closed to new topics and replies.