Vaseem
Hi

I am currently working on a project when the data type is discrete. i.e. Number of Errors. The data for errors is collected for past 7 months.

Can anyone let me know which type of measure should I use..Is it Mean or Median.
And also can you please explain the reason for the same.

Restagno
I use the Median when my data is not normal and by its nature have outliers.
Unlike the mean, the median is not affected by the outliers (extreme points on the data).
An example of a process with outliers would be the salary by associate on my company: If someone ask me how much is the pay for the people that works in my plant and I use the mean for getting the data, the salary of very few people like Directors would inflate the metric. Instead, I would use the mean which will not be sensitive to the outliers, and I would have more reasonable idea of how much money most of the associates make.

Regards
Sergio

Can you really use the mean as a measure of central tendency with discrete data? A mean is used for continuous data.

Gopal
Hi,

The use of Mean/Median/Mode or any other statistics depends upon the type of the data i.e., whether it is nominal/ordinal/interval/ratio. Ratio data is the one where a true zero exists, hence errors is the count and there exists a true zero i.e., its starts with a zero. For more information on the type of data please check the below link.

http://en.wikipedia.org/wiki/Ratio_data

For ratio data all statistics can be used.

One small clarification. I am assuming that your project BIG Y is Count of Errors.
One can take count as the metric when the process is always constant. Example every month you process the same number of documents which result in say data entry errors. Its obvious that in this case as data entry documents increase errors increase (directly proportional). Hence we should be taking % Errors/Documents processed as the metric. Because when one drills down the BIG Y to let us say employee wise, document type here errors are all related to the underlying number of documents processed in that category, which may lead to wrong analysis.

Just a thought keeping in view experience. Correct me if I am wrong.

Venu

I agree with Venu,
Mean/median/mode is used to measure central tendency of the data  especially those gathered from measurement data (continuous data). But if you are dealing with number of errors (counting data) you will be likely have a poison or binomial distribution data, then use proportion or counting: if your sample size is the same, then use counting of errors while use proportion for different sample size.

M
As pointed in earlier posts…do not go for mean and median here. Instead, check for sigma level of the process. Also, decide what you want to focus on – defects or defectives. Check the different type of activities within the process, may help you in narrowing down. Use appropriate control charts. Localization is important…

Remember – questions lead, tools follow. Why do you want to look for mean/median? If this is for gaging the potential or current state of your process then there are other better and appropriate methods for discrete data.

Darth
Katie Barry
Kimmy Burgess
I wish to point out the fact that one of the common complaints against six sigma is that it seeks to search for variations and search for significant factors which lead to variation. It does not address the robustness of a process which would eliminate the need for searching variations. Therefore I suggest walk through the process and identify the weak links and then use the statistical techniques.You will acieve what you sought out to be achieved.

Jeremiah Lewis
Statistical tests using the median are usually considered nonparametic statistics. The drawback to these is that they hold less statistical power than their parametric counterparts.

I side with one of the best of the best (Dr. Doug Montgomery) when I say that it would be easier and all-around better to find an appropriate transformation if your data is not normal.

Also, while the F-test is not robust to non-normality and outliers, the T-test is quite robust.

