    I have a data set with several outliers. After I deleted the outliers, the R-square decrease instead of increase. Could anybody give me any explaination? Based on my common sense, you should get a higher R-square value after deleting the outliers.
    Unless you’re just “playing” to see what happens, don’t JUST delete outliers.  You need to investigate why those data points are outliers – that may take some root cause analysis.



    First of all, I agree totally with mpl about removing outliers. Take great care with such things.
    But, if you are just playing around (or have some grand insights as to why these are ‘outliers’), I expect you may have deleted observations that are well out on each end of the tails of the x-y relationship. If so, the remaining data will be more ‘bunched’ up and, therefore, reduce your R-sq because you cannot predict the equation as well.
    Think of it another way: When I keep those observations that are way, way out at the ends of the fit line, they really have a lot of influence on the calculation of the line. Therefore, they would probably increase the R-sq since, mathematically, I have data that is spread out over a longer scale and get a “better fit.”
    Note that in Minitab you get the little warning statements about observations that have unusually strong influence on the equation.

