# Simple Linear Regression Analysis

Six Sigma – iSixSigma Forums Old Forums General Simple Linear Regression Analysis

Viewing 10 posts - 1 through 10 (of 10 total)
• Author
Posts
• #31409

woey
Member

I have a simple regression analysis where the predictor (x) is pressure, and the response (y) is flowrate. The data pairs are (5, 5.55), (2.5, 2), (1, 0.202), (0.5, 0.04), (0.4, 0.004), (0.3, 0.004), (0.25, 0), (0.2, 0.006), (0.18, 0.006), (0.15, 0), (0.12, 0), and (0.08, 0). Questions:(1) Each response is a mean value of 5 data point because the testing is repeated 5 times. However, some responses data from low pressure looks like categorical data. For instance, for pair (0.18, 0.006), the 0.006 is the average of 0, 0, 0, 0.03, and 0 (0+0+0+0.03+0)/5 = 0.03. Does it make sense to use 0.03 in the regression analysis or shall I use 0.006? Any thoughts?(2) At high pressure, the x and y seem to correlate pretty well. However, x and y do not correlate well when the pressure is low. If I were to ignore data pair from high pressure, and do a regression using low pressure pairs, the regression is pretty bad. In case like this how should I handle the regression? It is kind of worry me because I need to get a good model so that I can extrapolate the model to really low pressure to get the corresponding flowrate value. (3) If I were to extrapolate, what kind of statistical analysis that I can do to ensure that I got a reliable flowrate value when the pressure is really really low?Any thought? Thanks.

0
#82790

Michael Schlueter
Participant

0
#82791

Opey
Participant

You need a new flow meter that measures in the low range you’re interested in.  Don’t fiddle with stats until you get one, especially if it’s really critical that your prediction is right.

0
#82793

Robert Butler
Participant

Metering systems are usually built for response over some part of a range of possible measurements.  From your description it sounds like you have the situation that Opey has cautioned against.  In addition to Opey’s comments I would also recommend against building a regression model on the averages.  Your flow measurements will be individual readings and you will want to know the ability of your model to predict those individual readings.
If you run your regression against the individual measurements (we’re assuming the five measurements at each point are replicate and not duplicate measurements) and then plot the fitted line and its associated confidence intervals against the individual points not only will you better highlight the regions of poor regression fit you will also highlight those regions where the sensitivity of your metering system begins to fail.

0
#82794

Zilgo
Member

Since you know the numbers that are averaged to get the y value in each of your ordered pairs, why not just use each of the five numbers as separate points?  You have more data to fit a regression to in that case.
Your other concern about the low pressure readings might sugget that data is not really linear, maybe a log or exponential transformation is neccesary.  The powerfit spreadsheet in the earlier post looks reasonable.  Minitab would have found that for you by doing a Box-Cox transformation.

0
#82795

Michael Schlueter
Participant

Woey,
The remarks from Opey, Robert and Zilgo are very valid; the limitations of your instrument are quite obvious to me, now.
Why do you need to use it? Why do you run it in its critical range? What do you want to achieve, finally?

0
#82805

TCJ
Member

0
#82832

woey
Member

What I want to achieve is to establish a specification for a medical device that my company is making. Given a acceptable pressure value to average patient population, I need to find out the best fit flowrate value. From the flowrate, I can find out the relationship between volume and time. Thanks everyone for the inputs.

0
#82853

woey
Member

Thanks for all the work. I tried to do a polynomial regression, and got a pretty good y and x relationship. The R^2 is 0.998. I would like to use powerfit, however I have no idea how good the powerfit in relating y and x.

0
#82877

Robert Butler
Participant

Woey,  the tone of your 9 February e-mail worries me. If I’ve misunderstood your post please accept my apologies in advance.  You made the comment that you “tried to do a polynomial regression, and got a pretty good y and x relationship. The R^2 is 0.998. ”  You leave the impression that you are more enamoured of R2 than the propriety of the analysis.  R2, by itself, is of little value and it is very easy to make it anything you want.  Indeed, if you have no replicated measures of results of Y for a given X you can, with enough terms, get an R2 of 1.
Since you are measuring individual patients you need to run your regression on individual readings.  If you build a model on average results the confidence limits around the regression will be for averages and they will be much tighter that the confidence limits for individual measurements. If you don’t take this into account you could wind up thinking that the precision of your instrument is much better than it is.
If you are going to use polynomial regression you need to keep a constant check on the significance of your polynomial terms as you add them, the corresponding reduction in model error, the residual vs predicted plots, and of your lack-of-fit at each stage of the model building process. It’s a very easy matter to overfit. Finally, your initial missive suggested that you will need to extrapolate outside the ranges that you have measured.  You will need to carefully check the behavior of the regression equation and the confidence limits in these regions.  It is very easy to get an excellent  polynomial fit that falls apart the minute you go beyond the range of actual data.

0
Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.