# Statistically Determine Critical X’s from PI ProcessBook Data

Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Statistically Determine Critical X’s from PI ProcessBook Data

- This topic has 2 replies, 3 voices, and was last updated 7 years, 6 months ago by Robert Butler.

- AuthorPosts
- February 11, 2011 at 10:26 pm #53726
__General Question__:

During an internship I worked in a technical sales role at a paper making plant, and this plant used PI ProcessBook. One of the things I would like to do is look at a particular Unit Op, extract the time series data for various inputs, and then see which of these inputs has the greatest/most significant effect on my Y.For example, the spreadsheet data would look like:

Column 1: Time (measured in any interval: 1 second, 5 second, 1 minute, 1 day, etc.)

Column 2: X1

Column 3: X2

Column 4: X3

Column 5: Y1What type of analysis does one perform to determine which X (or X’s) has the greatest affect on Y using the time series data, with the ability to report significance at a chosen confidence level (95% or other)?

__Now here are a lot of related questions__:

Is this simply a regression analysis with a lot of data points? A time series analysis? Is there a more efficient way? What about multiple Y’s? Interacting Y’s?__My background__:

Currently job seeking, but think this might be a useful tool to extract SOME critical X’s for project development/execution. No experience in Lean Six Sigma, but have been looking into the methodology a little. This statistical technique (whatever it is) would have been very useful in my internship (although it was certainly not required).Thanks to anyone who can help!

-Mike

0May 23, 2012 at 7:27 am #193384

MICHAEL WILLIAMSParticipant@mlwilliams51**Include @mlwilliams51 in your post and this person will**

be notified via email.If this is all continuous data, then do a multiple regression analysis to determine the best fit. This can be done easily in Minitab.

0May 23, 2012 at 11:14 am #193385

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The short answer is-it’s going to be a real beast and at the end of the day you may have nothing at all.

Question: Time interval in column 1. Are we assuming that all X’s and the various Y’s are measured at the same time interval (once a second, once a minute or whatever) or are we assuming different time intervals for different measurements of the X’s and the Y’s?

Assuming the same time interval then the question becomes which time interval?

If you time interval is too “short” (and the definition of “short” will vary depending on the X’s and the Y’s) then what you have is auto-correlated data and your measures of your X’s and your Y’s will not be independent therefore you will have to use repeat measures regression methods to examine your data-this is not trivial.However, before you go there you will have to examine your X matrix for several things:

1. Are the various X’s independent? – Need to run an eigenvalue/condition index/VIF analysis.

2. Are some or all of the X’s controlled as part of running the process? If they are then there is an excellent chance that they will not correlate with whatever it is you are measuring.In summary, if this is production data, you will have serious confounding of the X’s and many of the X’s will have controls on them which will guarantee that they will not show up as a significant term in any regression analysis. The probability of lack of independence of measurements is high and therefore will require repeated measures methods for analysis.

Edit:

If you are interested in looking at a process in this manner I would recommend that you first plot the data in time sequence. You will need to get a plotting routine that will allow you to show your Y’s and your X’s of interest as a function of time and either allow you to layer them in the same graph or plot them as a series of stacked graphs. Once you have them in this form, take a look and see what you can see. As part of this exercise, take the time to find out which X’s are controlled and what their respective limits are – put those control limits on the appropriate graph. This should give you a sense of individual X stability when visually comparing the X’s and the Y’s. Also find out about process practices – i.e. things like “when X1 drops below critical value p then we adjust X1 upward and also adjust X3 downward to compensate for the momentary change in X1.” etc.

This won’t be as elegant as a repeated measures regression analysis but it has the potential to provide you with some guidance with respect to process behavior and the kinds of questions you might want to ask concerning process behavior/improvement.

0 - AuthorPosts

You must be logged in to reply to this topic.