SUNDAY, MARCH 26, 2017
Font Size
Topic Confidence interval for Attribute Data

Confidence interval for Attribute Data

Home Forums Old Forums General Confidence interval for Attribute Data

This topic contains 2 replies, has 3 voices, and was last updated by Profile photo of Polish Prince Polish Prince 11 years, 2 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
  • #98854

    Dear Blackbelts,
    I need some expert advices from you regarding the approach to sampling, i.e. how do we determine sampling size and CI.
    Below are the high-level snapshot of our situation, and a few detailed question.
    Many thanks in advance for supporting!
    Six Sigma Project Scope
    We have a Six Sigma project focusing on improving the accuracy of our policy data-processing center. We process about 100,000 policies per year in the center. A policy is either accurate or inaccurate based on certain QC-criteria. Currently, the overall accuracy of our center is about x%.
    The data-processing is split into four teams: Motor, Fire, Home, and Property. The distribution of policy volume among the four teams/segments are roughly: 40%, 30%, 20%, and 10%.
    Issue Faced
    We are unsure of how best to determine our sampling size. To assist in the analysis stage, we wish to compute weekly accuracies for: (a) Motor accuracy, (b) Fire accuracy, (c) Home accuracy, (d) Property accuracy, as well as the (e) Overall accuracy of the center.
    Our resource constraint will limit our sampling to 1000 policies a week.
    We can think of three approaches for determining the sampling size:
    (1)   One Big SegmentEach policy has a unique policy number. We can generate a random set of 1000 policy numbers and then ask our QC team to test the selected policies.
    (2)   Divide Policies into Segments: 250 Policies per SegmentWe can randomly sample 250 per team.
    (3)   Divide Policies into Proportional SegmentsWe can randomly sample according to the policy volume distribution, i.e. 400 for Motor, 300 for Fire, 200 for Home, and 100 for Property.
    Pros and Cons of Approach #1In our mind, approach #1 is the easiest approach. However, the confident interval (CI) for the segment accuracy reading (e.g. Home) might fluctuate from week to week. It will then be difficult for us to compare accuracies of Home from week to week. This will especially true if Home’s actual volume proportion is very low – about 1% of total volume, e.g.. random sampling will yield about 10 Home samples per week (so sometimes 10 samples, sometimes 7 samples).
    Pros and Cons of Approach #2
    In our mind, approach #2 is the next easiest approach. With 250 policies per segment, we can know in advance the CI of our segment accuracy. Note: we are using the formula: CI =      Z(alpha) sqrt[{p(1-p)}/n]                                     
    For the overall accuracy, we calculate using a weighted average of the segment accuracies. However, we are not sure if we can use this formula to determine the CI of the overall accuracy.
    Pros and Cons of Approach #3
    In our mind, approach #3 is the most complicated approach. However, this is the approach that we are using now because we think that this approach will lead to the highest CI for the overall accuracy reading. 

    For approach #2, how to calculate the CI for the overall accuracy? Can we use the formula and just plug in 1000 for n?
    For approach #3, how to calculate the CI for the overall accuracy? Can we use the formula and just plug in 1000 for n?
    In your opinion, which is the best approach (amongst the three or a new approach)? For your recommended approach, how do we calculate:(i) Accuracies and CI for each of the segment?(ii) The overall accuracy and CI?


    Do you intend to pay one of us to spend the time needed to adequately answer this post or will you settle for a knee jerk answer?  For future reference, most responders will not even read something this long unless they have nothing better to do.  I am not saying your post is not worthy just that quick questions requiring quick responses have a higher liklihood of being responded to.


    I would recommend you using the 250 for each team. Easier for comparison when doing the tests.
    You can use the following formula to help you determine the CI for the overall accuracy:
    CI = 1.96 [ sqrt {pa (1-pa)}/n1+n2} ]
    n1 – sample size of sample 1
    n2 – sample size of sample 2
    pa (weighted accuracy) =  (n1p1 + n2p2)/n1+n2
    p1 – accuracy of sample n1
    p21 – accuracy of sample n2
    Hope this helps, Cheers

Viewing 3 posts - 1 through 3 (of 3 total)

The forum ‘General’ is closed to new topics and replies.

Login Form