Sampling plan for QC of data entry errors

Six Sigma – iSixSigma Forums Old Forums General Sampling plan for QC of data entry errors

This topic contains 11 replies, has 6 voices, and was last updated by  U 13 years, 10 months ago.

Viewing 12 posts - 1 through 12 (of 12 total)
• Author
Posts
• #40912

Transactional BB
Member

Hello Everyone
I am trying to develop a sampling plan for QC of a data entry process. For example, suppose that I have assigned  a project to an outside vendor and the objective of the project is to input the names, addresses and phone numbers of all the people in the phone book into an excel sheet and deliver it to me.
When I receive that file I would not like to 100% QC it or in other words go through the entire phonebook and figure out whether the file i received from the vendor has less than a preivously determined acceptable level of error rate.
Does someone know of a method for calculating the sample size for such a QC process? My confidence level is 95%, maximum acceptable error level would vary but we can assume 5% for now and the population is finite.
Thanks.

0
#127834

AB
Participant

Check these links. The first is an article that will help you understand the calculation. The second is a tool you can use to calculate sample size.
https://www.isixsigma.com/library/content/c000709.asp
http://www.surveysystem.com/sscalc.htm

0
#127838

Transactional BB
Member

I am aware of these formulas that are used for surveys. As far as I know these don’t work for Data Entry QC. I check with the American Stats Association and some other people…there is something called a context survey calculator that could be used in this case..

I am just not sure how to get more information..I appreciate your response but I am still wondering if anyone has ever encountered such a problem before…
Thanks

0
#127842

Haugen
Participant

First you need to determine how you characterize defects – are you counting each character as a defect, each data field as a defect, each document?  Depends on what/how you want to control.  Then determine what your current defect rate is – take random samples of the work until you have a good measure of current defect rate. Data entry error rate for data field errors can be around 2-5%.
Once you know the defect rate and the distribution(data entry can be characterized as Poisson – but check yours) you can determin your sample strategy.
We reduce the (random) sample rate on the top performers (why sample good work?), and use a rolling 3-month average that equals the 1-month sample size of the lower performers. Gives incertive on several levels – the lower performers want to get better, so their work isn’t checked as frequent, and the top performers want to stay there because of pride (+ quality & productivity bonus each month).

0
#127843

AB
Participant

Thanks transactional BB. You got me curious so I did a little search. I did not understand why the sample size calculation for data entry should be any different from any other process. I still do not believe there is a need for different calculation approach.
During my search, I came across this article that analyzes the data entry quality process. I thought you might find it useful, (though I possibly am still far from answering your question). But here it is anyway
Cheers
AB

0
#127844

AB
Participant
#127845

Transactional BB
Member

Hi JimH
Thanks for the information.
How I characterize defect is that if the Person’s name, address and phone number do not match up the information in the phone book then its a defect. All the fields for a record have to match up with the source to pass the test.
Can you explain further how I devise my sampling strategy (formulas etc. to determine sample size) with this information?
AB: Thanks for your help. I had found that article too but it didn’t answer my question. Thanks for the information anyways

0
#127852

Haugen
Participant

They have to match 100%, or do you excuse some errors?  For example, many data entry contracts allow small errors as long as the data is still searchable – as long as the first 3 letters in the last name are correct, it doesn’t count as a defect.
You have the choice of counting each field as a defect – this allows you to collect data on root cause – if highest defect is address vs the others, that may give you clues as to root cause.  Also, counting each feild makes the data entry person (more) aware (careful) of each field.
We count each field as a defect, and each data sheet as 1 opportunity, so the defect rate is # defective fields/sheet.  We do that because some of the sheets vary on the # of fields, and we don’t want to count fields.
So, we have a defect rate, we took random samples of everyones work for a month (turned out to be 300 data sheets per person audited), and ended up with an average error rate of 5.5%.  Best person was at 1%, and the initial DE error rate for meets requirements was set as a range of  3-7%, anything above 7% is a does not meet, and subject to “coaching”.  Anything below 3% error is an exceeds, and gets an incentive each month.  Everyone is evaluated on a rolling 3-month average, so they don’t get whipsawed with meets/don’t meet/exceeds.

0
#127853

BB Transactional
Participant

JimH
What I was originally interested in asking was how did you come up with the 300 number i.e. your sample size? Can you explain please?
Thanks

0
#127871

Haugen
Participant

Like anything else, a combination of what we needed and what was reasonable.  In order to get a baseline of everyone, and knowing that historical error rates were running at 1% – 8% on average, we originally wanted 500 samples for each person over 1 month.  That way we would have a good chance of catching errors from the best (1% of 500 = 5), and not have to start with zero as a estimate for them.  But the auditors were getting pulled off for other crisis, so we stopped at 300.
Important note – we did GR&R first, and had to fix a couple things with the auditors first (their initial error rates were worse than the data entry people!).

0
#128303

six sigma hack
Member

Hi Jim
Read how you calculated , number of defects/ opportunity and i think you need to consider each field as an opportunity seperately. While the importance of this may not be evident in the example used, however if one person had to fill out a more complex sheet than another person, by counting the number of fields, you’d inherently tackle the relative complexity. Subsequently one could build this up to a DPMO measure.

0
#128336

U
Member

Hi Transactional BB,
Just so that I am clear on your original question:
1-  At this stage you are only interested in the number of defective records, and not interested in defects.
2- The definition of a defective record is where a record has one or fields incorrectly entered as benchmarked against the phone book.
Since we are dealing with discrete data (i.e. count of defective records) and assuming your population is large, I would suggest the following equation to obtain sample size:
(1.96/Precision)*(1.96/Precision)*P*(1-P)
where “1.96” relates to 95% confidence interval,
“Precision” relates to how precise you want to be (I usually use 5%), and
“P” is the estimated proportion of defects. I use 0.5 when I don’t know what it is, or I do a “sampling test” where I sample until I get 5 defects and divide 5 over the test sample thus far to get the estimate.
So plugging your requirements into the equation, we would get:
(1.96/0.05)*(1.96/0.05)*0.5*(1-0.5)
= 385
or round it up to 400
I would also suggest getting the definition of your measure of defect really tight. For example, if uppercase letters are used, “-” instead of “/” is used, etc., by the data entry operator, would these make a record defective? This is particularly important if you are using a team of measurers and each may have their own idea of what classifies as a defect.
Hope this helps…

0
Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.