iSixSigma

Doubt-Stepwise Regression for a SIX SIGMA project

Six Sigma – iSixSigma Forums Old Forums General Doubt-Stepwise Regression for a SIX SIGMA project

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #39135

    chhabra
    Participant

    I am trying to fit a regression model with some 40 parameters in a chemical plant.I have identified the controllable and the un controllable regressors.But the problem in doing a stepwise regression is that of multicollinearity among the controllable variables.Actually there is a strong correlation between some of the controllable regressors.
    I thought of doing the ridge regression but then how will I get the best subset of  the regressors. Also another method may be Principal Component Regression.But in this how will I get back to the original regressors after transforming them in to principsl components.
    Kindly help me.This is urgently needed.
    Thanks n Regards
    Amit

    0
    #118418

    Robert Butler
    Participant

      The short answer is you can’t.  The long answer is you need to do a lot more work before you are ready to attempt a regression analysis.  Your post gives the impression that all you have done is put together a shopping list (40 parameters), grabbed some happenstance (i.e. production data), and done nothing more than dump the data into a regression program and press run. 
      After the fact you have discovered by running (?) a simple correlation matrix on the X’s that colinearity is present.  If this is a reasonable summary of your efforts then you need to stop what you are doing and give the situation some more thought.
      Production data is terrible data to use for model building – by definition it is data from a process that was producing something. Consequently, most of the variables of interest will have been controlled.  The fact that they are controlled will probably mean they were not allowed to vary enough to permit their effects to register – that is, after all, the whole idea of variable control. When your process seems to change for the worse, your methods of control will, historically, probably require simultaneous changes to a number of your X variables.  This kind of change is going to induce correlation between otherwise independent variables.
      I would recommend you sit down with your team and really discuss those 40 parameters and their impact on the process.  If you follow the DMAIC methods you will be able to drastically reduce that number.  You can take a look at your production data with this reduced set if you wish but chances are that even the reduced set will exhibit correlation within your production data for exactly the reasons outlined above.
      You will need to take your reduced set of identified critical variables and run some screen designs – there are any number to choose from.  If all of the variables are continuous I’d recommend a 2 level saturated design with two center points.  The center points will give you your replication and the will also give you a check for curvilinear behavior and the design.
      For a design – measure all of your Y’s of interest and build separate models for each one. An examination of the models will give you some sense of how might be able to independently control your apparently correlated Y’s.

    0
    #118439

    McD
    Participant

    Robert has some very good points, but he overlooked something very basic.  If this is a chemical plant, then it is governed by the laws of kinetics … get out your chemical engineering texts.
    Chances are, you can calculate almost everything from first principles.  In my experience, which covers dozens of chemical plants, your plant will, in fact, follow the rules of nature, even though the engineers in the plant always claim “my plant is different”.
    Now, once you have a first principles model, now you might go looking for variations from that model.  That may well lead you to a problem.  There are lots of bits and pieces in a chemical plant, and any one of them might not be quite the bit you think it is, or that it once was.  The trick is using your chemical engineering skills to factor out all the stuff you already know.
    Many times I have done exactly what you have done — taken a whole pile of data and run a regression to see if anything interesting shows up.  Of course there will be lots of correlations, nature dictates that.  The trick is fishing out the correlations that you don’t expect from first principles.
    Regression can be a very useful tool for debottlenecking a plant.  But it is not such a great tool for rediscovering the laws of physics.
    –McD
     

    0
    #118478

    chhabra
    Participant

    Hi
    Thanx for your suggestion Robert and McD.But I got a little confused with the replies from both of you.Robert, you are saying that I should screen the variables by running some screen designs.Do you mean that I should not go for the regression analysis and directly go for design of experiments?
    And McD how can I fish out the correlations that is not expected from the first principles?Is it by the methods which I discussed about in my first post?
    right now I am having these doubts.Kindly help me.Will get back to you in future.
    Thanx once again
    Regards
    Amit

    0
    #118499

    Robert Butler
    Participant

    Amit,
      McD and I are going after two different aspects of the same problem. I think we will need some clarification from you concerning your efforts. When you stated “I am trying to fit a regression model with some 40 parameters in a chemical plant.I have identified the controllable and the un controllable regressors” I took your statement to mean you were attempting to trying to identify those things that changed the properties of the final product.  I think (McD correct me if I’m wrong here) McD’s view was that of looking for variables that would impact things like reaction rates and conversion efficiencies.
     As for the question in your second post – if you are looking at variables impacting final product properties you first need to sit down and think about the 40 parameters with respect to your current understanding of how they might impact these properties.  You can look over production data to get some idea of how product properties might correlate with changes in your variables of choice but, as I said, you will probably have controlled many of the critical variables to the point where they will be confounded with one another and/or they have been so well controlled that they will not show up as significant in a regression analysis. 
      For this phase you should run a lot of scatterplots – one variable at a time – against the product properties.  Such graphs will give you a visual sense of trending or lack thereof and they will help you focus on the actual ranges of the X variables.  Just looking at the ranges of the X’s will help you appreciate how well they have been controlled.
      Only after carefully evaluating your variables using methods such as those described above to shrink the list should you consider trying a design.  Do not just go out and run a 40 variable saturated design – it will be a real waste of time and effort.

    0
    #118525

    McD
    Participant

    I tend to line up pretty similarly to what Robert says, but in some ways I am more pragmatic/less theoretical, in other ways, less so.
    A 40 parameter model will be useless as a predictive model.  However, it can give you some insight into where to start drawing those scatterplots Robert mentions.
    But stepping back a bit, I find it hard to picture a unit op with 40 parameters.  And in virtually any chemical plant, almost everything is controlled, or should be.  Further, there is no black art going on here.  If you take those things you control, you should be able, from chemical engineering principles, to calculate everything else.  The things of interest are those things where the measurement differs from the calculation, and application of some regressions can help give you some insight into where something in the plant is not as you suspect it to be.
    Also recognize that virtually nothing in the plant will be linearly related to anything else.  Virtually every relationship at play is nonlinear, and you need to understand the fundamental principles to know the form of the nonlinearity.  Without that, you are just shooting in the dark.  Robert’s scatterplots can at least help you get an idea of the shape, but you should already know the shape from kinetics/physics/chemistry that governs whatever it is you happen to be measuring.
    Of course, while you should know everything, experience can help you identify those places where things might not be as you suspect.  For example, you can expect that over time, packing in a column will become plugged, a heat exchanger will become fouled, a valve will get worn.  But to isolate those bad actors, you need to understand how the unit op of which they are a part is not behaving according to design.
    It might help if you were to describe your problem in a little more detail.  If you don’t want to advertise your process quite so much I would certainly be willing to entertain an email conversation.
    But I would suspect that you want to go after your problem one unit op at a time.  At the level of a unit op you should be able to calculate how the unit op should behave, and then if the actual behavior differs from the predicted, you can then start to understand what it is that isn’t behaving the way you would like.
    –McD
     

    0
    #118528

    Six Sigma Tom
    Member

    Your problem is fraught with difficulties that go far beyond multi-collinearity. To name a few: range restriction in the Xs, happenstance data, the liklihood that you’ll get a significant response from 40 variables due to random variation, non-linearities inherent in chemical proceses, etc.Having said this, I know it’s a lot of fun to do this kind of knowledge discovery/exploratory data analysis. Social Science researchers make frequent use of a fun tool called structural equation modeling (SEM.) (Of course, social science hasn’t made a lot of progress either.) If you can keep in mind that you’re just looking for good questions to investigate further (say with DOE) and not really building a cause and effect model, you might find that SEM helps you a lot.

    0
Viewing 7 posts - 1 through 7 (of 7 total)

The forum ‘General’ is closed to new topics and replies.