Handling Null Values in Minitab
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Handling Null Values in Minitab
- This topic has 5 replies, 4 voices, and was last updated 1 year, 2 months ago by
MinitabUser1829.
-
AuthorPosts
-
May 21, 2021 at 11:22 am #254012
gunabalanParticipant@gunabalanInclude @gunabalan in your post and this person will
be notified via email.I have a doubt that, how Minitab handling the null values in Worksheet while creating Individual control charts.
For example:
Sample column = [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -100.0, nan, nan, -100.0, nan, nan, nan, 100.0, nan, nan, 33.33333333333333, nan, 0.0, nan, nan, -100.0, -100.0, nan, nan, nan, 0.0, nan, -100.0, 0.0, 100.0, nan, nan, nan, -100.0, nan, 0.0, nan, -100.0, nan, nan, 100.0, nan, nan, 100.0, nan, 100.0, nan, nan, nan, nan, nan, nan, 100.0, nan, nan, nan, nan, nan, 100.0, -100.0, nan, nan, nan, nan, nan, nan, -100.0, 100.0, nan, nan, -100.0, nan, nan, nan, nan, nan, nan, nan, 0.0, -100.0, nan, nan, nan, nan, nan, -100.0, -100.0, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, nan, 100.0, 100.0, nan, nan, nan, -100.0, -100.0, nan, nan, nan, nan, nan, nan, -100.0, nan, nan, nan, 100.0]for the above column I obtained the following values from minitab(Calculated by minitab control chart)
UCL(2 sigma level) = 124.2
X bar = 13.7
LCL (2 sigma level) = -151.6when I use the same exact formula used by minitab and replacing the nan values with mean of the column, I obtained these values.(Calculated by Excel)
UCL (2 sigma level) = 53
x bar = -14
LCL (2 sigma level) = -80So I don’t know what exactly to do with missing values from the column when I manually creating the control charts.
0May 21, 2021 at 12:27 pm #254013
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.A missing value is a missing value – what you have done is construct an entirely different set of data with a lot of average values in place of the original missing values. Under these circumstances you should expect to get something different because the two data sets are not equivalent.
I don’t have Minitab but my guess is that Minitab is using the values that are something other than missing and running the calculation using only the non-missing values.
0May 21, 2021 at 3:15 pm #254022
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.Addendum – I played around with the data you posted and I must admit I don’t see how Minitab generated those estimates. The average of the non-missing data is -14.1 and the standard deviation of the non-missing data is 90.9 which means your 2 sigma limits would be (upper) = -14.1 +2*90.9 = 167.7 and the lower would be -195.9.
If I substitute -14.1 (the average) for all of the missing values I get a mean of -14.1 – as one would expect, and the 2 sigma upper and lower are 79.3 and -107.5 – so I don’t see how you are getting your limits.
The formula I’m using for the 2 Sigma levels is +/-2*sample standard deviation
0May 22, 2021 at 12:43 am #254034
gunabalanParticipant@gunabalanInclude @gunabalan in your post and this person will
be notified via email.This is the formula that I use to calculate UCL (2 Sigma level) using python
UCL2 = statistics.mean(data[‘x’])+2*statistics.mean(data[‘mR’][1:len(data[‘mR’])])/1.128
I framed this formula after referring the minitab formula from this link.
It successfully gives me the exact minitab calculation only for non-missing data column, but whereas for the columns having missing values its gives me the wrong calculations.
That’s why I am wondering what is minitab doing with missing values from columns while doing calculation.
1May 24, 2021 at 10:45 am #254061
Thomas SubiaParticipant@tgs77mInclude @tgs77m in your post and this person will
be notified via email.Your data set contains nan. I suspect that data set source might have created with R software. In R, NaN (not nan) literally means not a number. ON the other hand, NA in R, means missing value. There will a big difference in ones data analysis if one is not careful using NaN or NA.
Now back to your question with Minitab. If in fact you have missing values in your data, code them with an asterisk. *. Minitab will not use any data whose cell content is an asterisk.
Here is an example of how this works.
I’ve created a variable called Normal_100. Thia is a random sample of size 100 taken from a Normal distribution, An I MR chart was created from that data.Here is that data summary:
Descriptive Statistics: Normal_100, Normal_100 3 missingTotal
Variable Count N N*
Normal_100 100 100 0
Normal_100 3 missing 100 97 3I Mr charts for both variables are attached. As expected, one gets different control limits since the data sets are not identical.
As for your problems with Minitab control limits, I suspect the issue is not with the conrtol limit formula but the coding of your data. Without a reproducible data, it’s impossible to see how you obtained your control limits.
You might want to rerun your analysis using R since you have nan data. If you do, rename your nan Values to NA. This will enable R to recognize missing data and deal with the missing data appropriately. To create the I MR chart use the R library, qcc. I’ve not attempted to use a data set containing NA values using qcc. I suspect that it may calculate control limits based on non missing data.
Hope this helps!
Attachments:
- I-MR-Chart-of-Normal_100.mgf
You must be signed in to download files.
- I-MR-Chart-of-Normal_100-3-missing.mgf
You must be signed in to download files.
0May 24, 2021 at 11:30 am #254066
MinitabUser1829Participant@MinitabUser1829Include @MinitabUser1829 in your post and this person will
be notified via email.For missing values in Minitab, consider the following situation:
Suppose I have 3 observations, then a missing value, then 3 more observations as shown below:
12.1
13.6
11.9
*
12.7
14.2
13.3Minitab does not treat 11.9 and 12.7 as if they were adjacent. In other words, when computing the moving ranges for an I-Chart, Minitab does not compute the moving range between 11.9 and 12.7. In the end, there are only 4 moving ranges, not 5.
Keep that in mind when comparing Minitab results to other software.
1 - I-MR-Chart-of-Normal_100.mgf
-
AuthorPosts
You must be logged in to reply to this topic.