# Regression as a screening tool

Six Sigma – iSixSigma › Forums › Old Forums › General › Regression as a screening tool

- This topic has 5 replies, 4 voices, and was last updated 11 years, 8 months ago by Lorax.

- AuthorPosts
- May 25, 2009 at 6:06 pm #52406
Folks,

I’ve got a bunch (10) of “potential” Xs (they are potential because their variation has not yet been proven to influence the Y).

In order to identify which ones are the “critical few” I’m regressing each one in turn against the Y. This gives me a bunch of things. Firstly is tells me whether that particular Potential X is significant or not, it also lets me know the amount of variation in the Y which is attributable to that Potential X.

My question is around the criteria for a regression model. Is it critical at this stage that my DF are greater than 4?

How critical is it that the data follows a normal distribution? Can I get away with “nearly normal”?

Comments on the approach is also very welcome.

Lorax0May 26, 2009 at 11:08 am #184390Hai Lorax,

your method sounds sound but it isn’t. You have 2 risks that can garbage your conclusions.

Risk 1: the X’s are not independent.

Risk2: the X’s have interactions (Y feels X1 an X2 together different than seperately because they strengthen their influence on Y)

The recommended method.

1: Check if the X’s are independent;2: Make a regression model of Y as function of all X’s together (also interactions)

How to:1: Matrixplot of X’s (only 1-on-1 independency or use Principal Components Analysis (BB level knowledge)2: (in MINITAB) use GLM or analyse as a linear DoE; p-values tell you which X’s or interactions to remove; R-sq how much you still are missing.

Good luck,

Remi0May 26, 2009 at 12:23 pm #184393Thanks Remi, some good thoughts.

You are right about the possible existence of interactions. I dont know if a matrix plot is going to help get round this though.

What needs to be done is the reduction of a big list of potential Xs down to a smaller list which includes only the “juiciest” ones (those which have the biggest impact and are conceivably possible to control or manage).

Perhaps I should hold off analyzing anything until I get all the data and then do some sort of analysis of them all against the Y at that point (your GLM idea or a multiple regression).

Thanks again

Lorax

0May 26, 2009 at 12:37 pm #184395You are on the right track. The cautions by Remi are technically

correct but overly cautious at this point. Instead of just regressing all x’s against the Y, do a matrix plot of

all variables. You will see strong relations with the output if they

exist. You will also see inputs that are playing together (lack

independece) Anything that looks like a relationship demands the scrutiny

demanded by Remi. Everything else can be dropped (screened

which is your objective). Make sure your measurements of both the inputs and outputs are

credible first. A bad measurement system will mask relationships.0June 11, 2009 at 1:17 pm #184524Hi:

I have found that using Stepwise Multiple Regression with all the X’s in the Model along with Two Factor Interactions of those X’s is a useful tool to begin the screen of multiple X’s (some of which may be co-linear). Stepwise Regression is a natural way to identify & remove co-linear X’s from the model. Hope this helps. Have a great day, Tony

0June 11, 2009 at 4:44 pm #184530Thanks folks,

I’m leaning more and more toward a GLM to give me the answer as to which Potential Xs are most influential.

Lorax0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.