Can anyone give me some advice/criteria on choosing between the mean and median as measures of central tendency of a data set?
Many have suggested normality of the data set as a deciding factor (i.e. compare means for normally distributed data vs compare medians for non-normal data). I understand that the mean can be greatly influenced by outliers while the median is only influenced by the number of data points and the values of those points in the ‘middle’ of the data but is Normality the best and/or only criteria to use when making this decision?
Since the mean is probably one of the more common and accepted ways to look at the central tendency (especially outside of Six Sigma) I feel like I should have some pretty solid justication and selection criteria to back up myself if I need to recommend that the median be used rather than the mean.
Any thoughts / suggestions?
Start by looking at a picture of your data (a histogram).
If it is basically symmetrical, go with mean.
If non-symmetrical, understand why. If it makes sense use median. If it doesn’t make sense, understand why and don’t worry if it’s mean or median until you have a better understanding of what you are seeing.
Things that make sense for median – bounded distributions, truncated distributions (if you know why), …