# Handling Null Values in Minitab

Six Sigma – iSixSigma Forums General Forums Tools & Templates Handling Null Values in Minitab

Viewing 6 posts - 1 through 6 (of 6 total)
• Author
Posts
• #254012

gunabalan
Participant

I have a doubt that, how Minitab handling the null values in Worksheet while creating Individual control charts.

For example:
Sample column = [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -100.0, nan, nan, -100.0, nan, nan, nan, 100.0, nan, nan, 33.33333333333333, nan, 0.0, nan, nan, -100.0, -100.0, nan, nan, nan, 0.0, nan, -100.0, 0.0, 100.0, nan, nan, nan, -100.0, nan, 0.0, nan, -100.0, nan, nan, 100.0, nan, nan, 100.0, nan, 100.0, nan, nan, nan, nan, nan, nan, 100.0, nan, nan, nan, nan, nan, 100.0, -100.0, nan, nan, nan, nan, nan, nan, -100.0, 100.0, nan, nan, -100.0, nan, nan, nan, nan, nan, nan, nan, 0.0, -100.0, nan, nan, nan, nan, nan, -100.0, -100.0, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, nan, 100.0, 100.0, nan, nan, nan, -100.0, -100.0, nan, nan, nan, nan, nan, nan, -100.0, nan, nan, nan, 100.0]

for the above column I obtained the following values from minitab(Calculated by minitab control chart)
UCL(2 sigma level) = 124.2
X bar = 13.7
LCL (2 sigma level) = -151.6

when I use the same exact formula used by minitab and replacing the nan values with mean of the column, I obtained these values.(Calculated by Excel)

UCL (2 sigma level) = 53
x bar = -14
LCL (2 sigma level) = -80

So I don’t know what exactly to do with missing values from the column when I manually creating the control charts.

0
#254013

Robert Butler
Participant

A missing value is a missing value – what you have done is construct an entirely different set of data with a lot of average values in place of the original missing values. Under these circumstances you should expect to get something different because the two data sets are not equivalent.

I don’t have Minitab but my guess is that Minitab is using the values that are something other than missing and running the calculation using only the non-missing values.

0
#254022

Robert Butler
Participant

Addendum – I played around with the data you posted and I must admit I don’t see how Minitab generated those estimates. The average of the non-missing data is -14.1 and the standard deviation of the non-missing data is 90.9 which means your 2 sigma limits would be (upper) = -14.1 +2*90.9 = 167.7 and the lower would be -195.9.

If I substitute -14.1 (the average) for all of the missing values I get a mean of -14.1 – as one would expect, and the 2 sigma upper and lower are 79.3 and -107.5 – so I don’t see how you are getting your limits.

The formula I’m using for the 2 Sigma levels is +/-2*sample standard deviation

0
#254034

gunabalan
Participant

This is the formula that I use to calculate UCL (2 Sigma level) using python

UCL2 = statistics.mean(data[‘x’])+2*statistics.mean(data[‘mR’][1:len(data[‘mR’])])/1.128

I framed this formula after referring the minitab formula from this link.

https://blog.minitab.com/en/marilyn-wheatleys-blog/methods-and-formulas3a-how-are-i-mr-chart-control-limits-calculated

It successfully gives me the exact minitab calculation only for non-missing data column, but whereas for the columns having missing values its gives me the wrong calculations.

That’s why I am wondering what is minitab doing with missing values from columns while doing calculation.

1
#254061

Thomas Subia
Participant

Your data set contains nan. I suspect that data set source might have created with R software. In R, NaN (not nan) literally means not a number. ON the other hand, NA in R, means missing value. There will a big difference in ones data analysis if one is not careful using NaN or NA.

Now back to your question with Minitab. If in fact you have missing values in your data, code them with an asterisk. *. Minitab will not use any data whose cell content is an asterisk.

Here is an example of how this works.
I’ve created a variable called Normal_100. Thia is a random sample of size 100 taken from a Normal distribution, An I MR chart was created from that data.

Here is that data summary:
Descriptive Statistics: Normal_100, Normal_100 3 missing

Total
Variable Count N N*
Normal_100 100 100 0
Normal_100 3 missing 100 97 3

I Mr charts for both variables are attached. As expected, one gets different control limits since the data sets are not identical.

As for your problems with Minitab control limits, I suspect the issue is not with the conrtol limit formula but the coding of your data. Without a reproducible data, it’s impossible to see how you obtained your control limits.

You might want to rerun your analysis using R since you have nan data. If you do, rename your nan Values to NA. This will enable R to recognize missing data and deal with the missing data appropriately. To create the I MR chart use the R library, qcc. I’ve not attempted to use a data set containing NA values using qcc. I suspect that it may calculate control limits based on non missing data.

Hope this helps!

###### Attachments:
1. I-MR-Chart-of-Normal_100.mgf
You must be signed in to download files.
2. I-MR-Chart-of-Normal_100-3-missing.mgf
You must be signed in to download files.
0
#254066

MinitabUser1829
Participant

For missing values in Minitab, consider the following situation:

Suppose I have 3 observations, then a missing value, then 3 more observations as shown below:
12.1
13.6
11.9
*
12.7
14.2
13.3

Minitab does not treat 11.9 and 12.7 as if they were adjacent. In other words, when computing the moving ranges for an I-Chart, Minitab does not compute the moving range between 11.9 and 12.7. In the end, there are only 4 moving ranges, not 5.

Keep that in mind when comparing Minitab results to other software.

1
Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.