# Need Clarifications about Normal Data

Six Sigma – iSixSigma Forums Old Forums General Need Clarifications about Normal Data

Viewing 15 posts - 1 through 15 (of 15 total)
• Author
Posts
• #33659

Linga Reddy
Participant

Hi,
I am doing GB Project to reduce Data loading Process time from 61 minutes to 10 minutes (I have defined every thing in my Define and Measure Phases).
Actual data is Normal, But in Analyze phase (Step 4), I found that Data is Not Normal (With the help of Run Chart and Normality test ‘P Value less than 0.05’. I am getting P-value= 0), If Data is not normal, How to proceed to identify the initial Process capabilities.
Thanks in advance.
Thanks&Regards, Linga

0
#91333

DaveG
Participant

Use subgrouping to solve the normality issue.
If you are reducing a process average, why is normality important?
You may have a one-sided distribution, in which case normality is impossible.
Actual Data is Normal but Data is not.  How is that possible?

0
#91334

Linga Reddy
Participant

Hi Dave, Thanks for you reply.
As you said, My data is one-sided Distribution.
As per my operational Defenition, I am considering all the activities as one operation.
Operational Definition: Time spent on data load process = Time Spent on  Logging into OFA + Time Spent on Running Delete Data Loader program + Time Spent on  Running Data Loader Program + Time Spent on Running Solves + Time Spent on Distribute Structures/Data  + Time Spent on Email to Intimate the users about Data load.
Now, How can I take subgrouping to resolve Normality issue.
Thanks&Regards, Linga

0
#91335

Fontanilla
Participant

Linda, I would not be concerned with normalizing data.  To determine process capability, assuming that 10 minutes is your upper spec limit (USL) look at the data.  Any thing greater than 10 minutes is a defect. Divide defects by total data points then multiply result times 1000,000 to get Defects Per Million Opportunities.  Then you can compute the process capability.  Does that help?

0
#91341

DaveG
Participant

1) Dan’s advice is good.
2) A 1-sided distribution is naturally non-normal, so there is no reason to normalize it.  To analyze process variability, you need special statistical tools, one of which is a Pearson Analysis.  Certain software packages can do this for you.  However, I don’t think that’s necessary for you, just follow Dan’s advice.
3) Remember that when X = A + B + C +…, the VARIANCE of X equals the sums of the Variances of A, B, C, etc.  Variance is Sigma squared, so Sigma(X) is the square root of the sum.  In other words, you can’t add Sigmas in a series, but you can add variances.

0
#91641

Rekhi
Participant

Hi Linga,

I agree with Dan. There are ways to normalize the continuous data by sub-grouping them. There are tools to handle the non-normal data, but in your case you can simply convert the data to discrete: Calculate the defects and the dpmo and thus the capability.
Regards,
Taran

0
#91698

Alpaslan Terekli
Participant

The question for me why actual data is normal but data is not. Did you check it with control chart? Is there any shift in process or some outlier? (like meintance or breakdown)

0
#91700

Soumish Dev
Member

Would you please explain the difference between Actual data and Data rfom your project perspective?

0
#91707

Rick Pastor
Member

More Clarification!

All the steps in your process appear to be the times required to complete a process step.  That does not produce a one‑side distribution.  The time needed to complete an event is equivalent to the measurement of a physical dimension.  Both measurements, assuming random error, will end up normally distributed.  With time you have a one‑side specification limit.   You might want a two‑side specification limit when errors are generated when someone is working too fast.
To the question of why your data is not normal.  Perhaps you do not have enough data points.  How many data points in the population?  Did you use the same person to take all the data?  Are the subgroup normal, etc., etc? You may find that answering these questions can add insight into how to shorten the process time.  For example, if several people were involved in the study  you need to examine the data for each person.  Perhaps one person need training!

0
#91708

DaveG
Participant

There is no difference.  I just wanted a clarification of this statement:
“Actual data is Normal, But in Analyze phase (Step 4), I found that Data is Not Normal”
from this post:  https://www.isixsigma.com/forum/showmessage.asp?messageID=34755

0
#91776

Jim Grizzard
Participant

I have run into the same issue evaluating efficiency problems. Typically, the distributions will be skewd one way due to machine constraints. Bimodal distributions or distributions with outliers often point toward method or man issues. Some of the lean tools such as process mapping, pace studies and work sampling often provide good insight into actual capabilities.

0
#91794

Hemanth
Participant

Hi Reddy
I agree with Jim, your data must be skewed..and hence failing the normality test. What intrigues me is you said your actual data is normal but in analyse phase your data is not normal…
If I understand this correctly you had historical data (a huge data bank) which you initially checked for normality and found it passed the test. But when you did the test on data collected during measure phase it failed to pass the test. If this is true then it is possible because you are checking only a subset of the whole population, now your population may be behaving normally but the subset (or the sample) may not. I did this through MINITAB by generating samples of size 50 and 100 by randomising a normal distribution and I found that not all passed the normality test.
Now, coming to your query I would suggest dont get bogged down by calculating capablity as your target is to shift the mean from 60 to 10. In the end all you need to prove is if your improved process does give you an average processing time of 10 or not (hypothesis testing..)
Hope this was helpful.

0
#91795

Hemanth
Participant

And if at all you wish to calculate the limits for your average, using Chebychev’s inequality might be of help.

0
#91802

Mikel
Member

Bad advice!
Time data is typically skewed right (any data which is naturally bounded will skew away from the bound). The closer you get to your objective (the natural bound) the more the left side will bunch up. There are distributions known to model time data well – poission and exponential for example. Look to queueing therory for advice.

0
#91920

Vivek Shrivastava
Member

Hi linga,
Seeing your operational definition of the CTQ and your goal statement.. I’ll say that though you have one sided specification , your distribution should be two tailed and a normal and not one tailed.
Secondly one of the reason that you are not getting a normal distribution may be the discrimination you are choosing to measure the time… If you are choosing minutes, and your actuall process width is not more than 10 minute, take seconds. I expect u’ll get a normal distribution. Further I can comment only on seeing the histogram and normality graph ….on the potential causes of you not getting normality.
I hope this gives u a direction to think..
vivek

0
Viewing 15 posts - 1 through 15 (of 15 total)

The forum ‘General’ is closed to new topics and replies.