iSixSigma

Simple Linear Regression Analysis

Six Sigma – iSixSigma Forums Old Forums General Simple Linear Regression Analysis

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #31409

    woey
    Member

    I have a simple regression analysis where the predictor (x) is pressure, and the response (y) is flowrate. The data pairs are (5, 5.55), (2.5, 2), (1, 0.202), (0.5, 0.04), (0.4, 0.004), (0.3, 0.004), (0.25, 0), (0.2, 0.006), (0.18, 0.006), (0.15, 0), (0.12, 0), and (0.08, 0). Questions:(1) Each response is a mean value of 5 data point because the testing is repeated 5 times. However, some responses data from low pressure looks like categorical data. For instance, for pair (0.18, 0.006), the 0.006 is the average of 0, 0, 0, 0.03, and 0 (0+0+0+0.03+0)/5 = 0.03. Does it make sense to use 0.03 in the regression analysis or shall I use 0.006? Any thoughts?(2) At high pressure, the x and y seem to correlate pretty well. However, x and y do not correlate well when the pressure is low. If I were to ignore data pair from high pressure, and do a regression using low pressure pairs, the regression is pretty bad. In case like this how should I handle the regression? It is kind of worry me because I need to get a good model so that I can extrapolate the model to really low pressure to get the corresponding flowrate value. (3) If I were to extrapolate, what kind of statistical analysis that I can do to ensure that I got a reliable flowrate value when the pressure is really really low?Any thought? Thanks.

    0
    #82790

    Michael Schlueter
    Participant

    Woey,Interesting question. So your measurement system consists of 2 parts:1. the instrument (flowrate meter) 2. your algorithm (take 5 samples and average)Your data seem to follow a powerfit y ~ x^2.328 . You can find more details in the attached Excel sheet. I leave it up to you to justify that your meter should actually behave almost like x^2 ;-)What I did: A) I checked your original data (sheet “posted data”) in a lin-lin, lin-log, log-lin and log-log view. Respectively, a linear, an exponential, a logarithmic or a power dependence should be plotted as a linear in those diagrams. Assuming a power-function is the most reasonable thing I can do with your data.B) To find the powerfit I had to transform your data just a little bit (sheet “powerfit”). There are 2 steps: 1. map log(y) vs. log(x) 2. shift the curve towards the origin (orange area). The diagram shows you the results.C) Those data now should follow a linear curve through (0,0). When I go through a little bit of Taguchi’s calculations I find the slope beta=2.328 and the residual error +-0.335 in the log-log-view.D) The final fit is [(log(y) + 1.2859) = 2.328 * (log(M) + 0.1961) +- 0.335], which you can boil down to the y~x^2.3 equation.E) In the lower part of sheet “powerfit” you will see my comparision between your original data and the fit-formula. It looks like you should review/exclude (0.4, 0.004) and (0.3, 0.004).Best regards,Michael Schlueter Download: Simple Linear Regression Analysis [Excel file]Viewing Tip: Usually, you can click on a link to view the document — it may open within your browser using the application (in this case Microsoft Excel). If you are having difficulty, try right clicking the link and selecting “Save Target As…” or “Save As…” to save it to your computer harddrive.

    0
    #82791

    Opey
    Participant

    You need a new flow meter that measures in the low range you’re interested in.  Don’t fiddle with stats until you get one, especially if it’s really critical that your prediction is right.

    0
    #82793

    Robert Butler
    Participant

      Metering systems are usually built for response over some part of a range of possible measurements.  From your description it sounds like you have the situation that Opey has cautioned against.  In addition to Opey’s comments I would also recommend against building a regression model on the averages.  Your flow measurements will be individual readings and you will want to know the ability of your model to predict those individual readings. 
      If you run your regression against the individual measurements (we’re assuming the five measurements at each point are replicate and not duplicate measurements) and then plot the fitted line and its associated confidence intervals against the individual points not only will you better highlight the regions of poor regression fit you will also highlight those regions where the sensitivity of your metering system begins to fail. 
     

    0
    #82794

    Zilgo
    Member

    Since you know the numbers that are averaged to get the y value in each of your ordered pairs, why not just use each of the five numbers as separate points?  You have more data to fit a regression to in that case.
    Your other concern about the low pressure readings might sugget that data is not really linear, maybe a log or exponential transformation is neccesary.  The powerfit spreadsheet in the earlier post looks reasonable.  Minitab would have found that for you by doing a Box-Cox transformation.

    0
    #82795

    Michael Schlueter
    Participant

    Woey,
    The remarks from Opey, Robert and Zilgo are very valid; the limitations of your instrument are quite obvious to me, now.
    Why do you need to use it? Why do you run it in its critical range? What do you want to achieve, finally?

    0
    #82805

    TCJ
    Member

    Sounds like your “Simple Linear Rgression” might be quadratic.

    0
    #82832

    woey
    Member

    What I want to achieve is to establish a specification for a medical device that my company is making. Given a acceptable pressure value to average patient population, I need to find out the best fit flowrate value. From the flowrate, I can find out the relationship between volume and time. Thanks everyone for the inputs.

    0
    #82853

    woey
    Member

    Thanks for all the work. I tried to do a polynomial regression, and got a pretty good y and x relationship. The R^2 is 0.998. I would like to use powerfit, however I have no idea how good the powerfit in relating y and x.

    0
    #82877

    Robert Butler
    Participant

    Woey,  the tone of your 9 February e-mail worries me. If I’ve misunderstood your post please accept my apologies in advance.  You made the comment that you “tried to do a polynomial regression, and got a pretty good y and x relationship. The R^2 is 0.998. ”  You leave the impression that you are more enamoured of R2 than the propriety of the analysis.  R2, by itself, is of little value and it is very easy to make it anything you want.  Indeed, if you have no replicated measures of results of Y for a given X you can, with enough terms, get an R2 of 1. 
      Since you are measuring individual patients you need to run your regression on individual readings.  If you build a model on average results the confidence limits around the regression will be for averages and they will be much tighter that the confidence limits for individual measurements. If you don’t take this into account you could wind up thinking that the precision of your instrument is much better than it is.
      If you are going to use polynomial regression you need to keep a constant check on the significance of your polynomial terms as you add them, the corresponding reduction in model error, the residual vs predicted plots, and of your lack-of-fit at each stage of the model building process. It’s a very easy matter to overfit. Finally, your initial missive suggested that you will need to extrapolate outside the ranges that you have measured.  You will need to carefully check the behavior of the regression equation and the confidence limits in these regions.  It is very easy to get an excellent  polynomial fit that falls apart the minute you go beyond the range of actual data. 

    0
Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.