Estimated Loss After Completing a Sample Audit

Six Sigma – iSixSigma Forums General Forums General Estimated Loss After Completing a Sample Audit

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
  • #54025

    Michael Logue

    I am doing some project work for an airport, and as part of it have carried out an audit on accuracy of passenger numbers provided by airlines. From the data gathered I am attempting to estimate potential miscalculating of passengers over a month or year from inaccurate passenger data. Because it is just a sample with a spread of different results, I am assuming I cannot just take an average and multiply it out, rather I would expect I have to apply some sort of statistical theory (or maybe not). Can someone advise the best method.

    I have included 3 sample spreads of data below (0 means accurate):

    1) Airline A. 1300 flights across period, 2% audit so 26 flights, results;

    2) Airline B. 1100 flights across period, 2% audit so 22 flights, results;

    3) Airline C. 9900 flights across period, 2% audit so 198 flights, results;

    I realise that 2% is a small audit, but there was limited resource to carry out the audit, so we went with that. If the answer is that you cannot determine anything due to how small the audit is, I would still appreciate any steer.

    Thanks in advance


    Robert Tripp

    You pose many questions rolled into one post here and as usual they all come down to “what do you want to know?”. Judging by the title of your post, you seem to want to estimate an expected number passengers “lost” from any given flight. But if all you want to do is determine the likelihood that a flight will have an inaccurate count of passengers (regardless of how inaccurate the count is) that would be another question. And finally, because you are sampling (and the samples are fairly small relative to your population), a little bit of applied statistics would be useful.
    So I will try to take a stab at this:
    1) Expected # of passengers “lost” from any given flight. Without having analyzed the data other than reading your post it appears that Airlines A and B had a couple of wacky flights. Maybe a weather thing causing missed connections? Maybe disparity in size of the airplanes? Given the nature of the occurrences, my sense is that you will have better success in estimating the number of miscalculated seats expected in a month or year than you would in trying to predict the problem for any given flight. That said, I would try to get the total number of reservations on each flight in your sample (the expected number of passengers – ie “opportunities”), sum it up across your sample and use that as the denominator for a ratio with a numerator that is the sum of “lost” passengers. Calculate your percent of lost seats and the confidence interval around it.
    2) Likelihood of an inaccurate flight. Same idea as #1 except that each flight is an opportunity (number of flights examined is your denominator) and your numerator is just the count of inaccurate flights. Again, another percent calculation with a confidence interval.
    3) Sample size. It really depends on what you are trying to do, what kind of precision you want in your estimates, and how confident you want to be in your decisions. My first concern in your sample size is those weird events on airlines A and B. It begs some investigation to determine what happened in each case and how repeatable you can expect the conditions that caused the weirdness to be. Assuming that the events in A & B are not out of the ordinary, then your next decision point is what are you going to do with the results? If your intent is to compare A to B to C then a chi-square test for percent defectives would be fine, but you first need to determine the difference you are trying to detect. Once you know what a significant “practical” difference is, then you can run a power and sample size test to determine if your sampling plan will give you the ability to detect a difference if there is one (power of the test). If all you want to do is estimate the proportion of lost seats or inaccurate flights then use the confidence interval around your calculated percentage to make a decision on whether you have enough precision in your estimate.
    Honestly, I like the approach of starting with a small sample size and then deciding if it gives you enough of what you want to see based on the differences you calculate and the power that your sample size gives you. You can always go back and get more data if you don’t trust what you find on the first sniff. As I write this I realize that assuming you are trying to predict the lost passengers (or inaccurate flights) over some timeframe, then the first 2 examples really are nothing more than your original hunch: take an average and multiply it out. But if you calculate the confidence intervals for your percentages then you have the ability to estimate a range of probable outcomes and build best- and worst-case scenarios. The “statistical theory” comes into play as you recognize that these are estimates and you will almost never accurately predict a specific outcome or value – your goal should be to predict an acceptable range of outcomes. So hopefully you have some stats software at your disposal. If you have the software, calculating a result is the easy part, planning and capturing the right amount of data is what is hard. Good luck.


    Michael Logue

    Robert thanks very much for your advice. You are right that I am trying to predict lost passengers over a timeframe. My thoughts were that I would come up with a range of probable outcome values, however, I am not sure what the method is to do that. I do have access to Minitab, what function/method should I look to apply to the spread of data?

    Appreciate your further help.



    @michaellogue – If you develop a mathematical transfer function for how the predictors (inputs) affect the outputs (passengers) and the inputs have probability distributions (or are unknown, but have a way to describe their behavior) then you would be better off using a MonteCarlo simulation. My preference is CrystalBall which integrates into Excel (and other business software now that they are part of Oracle).



    As Robert pointed out, you seem to have a couple of points that are very unsual. Statistically speaking, your data is likely not in statistical control as there is at least one special cause giving you the occasional erratic point. Having more data would determine whether this is happening with some frequency and you might everage it out, but I would be investigating to find out the cause of those points and then eliminating them.

    Following that, it really comes down to what question you want answered. If you only care about performance within any one airline, treat each sample as a separate dataset and get a confidence interval on it, as well as a 1-sample t-test to determine if it differs from zero or not. Since you have Minitab, using the Assistant (assuming you have Minitab 16) will incorporate some power numbers into the output so you have an idea whether ot not you have enough data to tell a difference. If you can’t tell a difference with practical implications and power is sufficient, you can just treat that airline as “no difference” but continue motnoring it.

    If you want to find out whether there are differences between airlines, then you’ll need to do ANOVA with the same information from above.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.