January 14, 2013 at 10:56 am #54277
JohnsonParticipant
be notified via email.Would it ever be acceptable/correct to use a 2 proportions test for dollars. For example if I wanted to test the percent of ‘leakage dollars’ out of ‘total dollars’ both pre and post improvement. A mean or median test does not provide the percentage perspective that the company is interested in? Athough dollars are continuous data in this case, I am categorizing them as leakage or non leakage.
0January 15, 2013 at 7:12 am #194557
JohnsonGuest
be notified via email.Thank you!
0January 22, 2013 at 11:28 am #194598
Jeff MarthParticipant
be notified via email.Will it work, yes. However, a better way to run it is by making each day a data point (percent) and running it as a standard hypothesis test (ANOVA, 2sample T, etc.) for the before/after time periods.
Based on what you explained, you could run the before and after (columns) with the number of days there is/isn’t leakage (rows) or acceptable/unacceptable levels of leakage.
0January 25, 2013 at 6:13 am #194629
JohnsonParticipant
be notified via email.When I run the comparisons as percentages both set of data are non normal, so I I am left using the Mann Whitney test for medians. Because a high percentage of the claims in the sample do not have any leakage, the medians become zero.
Regression shows that the leakage varies by size of the claim, but it is not a strong predictor (pvalue of .001 and RSq (adj) of 29.3.)
The challenge is that there is a large improvement in dollars between the two years, but a median test is not picking it up…..that is why I was asking about the 2 proportions test.
I am open to a better way…..Thanks!
0January 25, 2013 at 10:18 am #194631
Robert ButlerParticipant
be notified via email.You might want to go back and take a look at that regression again.
Some thoughts:
What does the relationship between leakage and size of claim look like with just the accounts that have leakage?
What does the residual pattern look like for the regression when run against all of the data? – patterns, clusters, banding, etc.
Same question for just the leakers
Is the percentage of nonleakers the same before and after?
Is the kind of leaker account the same before and after?
If the claims have some kind of classification are there any differences in proportions of claims within classification before and after?
If the claims have some kind of classification are there any differences in amounts leaking before and after?
If the claims have some kind of classification what happens to your regression when you include a dummy variable for claim type when running the analysis of leakage vs. claim size?0January 28, 2013 at 11:26 am #194643
JohnsonParticipant
be notified via email.1) Without the files that have no leakage, the pvalue is .002 with an RSq (adj) of 51.7%, so there is a relationship, but still not a strong predictive capability. The residual pattern is acceptable…no ‘unusual’ patterns…. and the residuals are ‘close’ to a normal dist.
2) The percentage of files with leakage is lower after the improvement, but not a statistically significant number. (pvalue of .25) That is what is so challenging…… there was a 10%+ improvement in the dollar impact of the improvement, which is big money. Just doing a hypothesis test on the criteria ‘is there is leakage or not on a claim’, does not capture, the company’s ‘true interest’ of the dollars leakage. The company would not be as concerned if 75% of the claims had leakage if the dollar amount were very minor. On the other hand if 40% of the claims have higher dollar values of leakage, then that leakage significantly impacts loss ratios. This was the rationale for the original question, of using the 2 proportion test for leakage vs. non leakage claim dollars……To focus on the dollars vs. the claims and as indicated above a median test does not capture a significant improvement due to the number of claims with no leakage.
3) The scope of the project has already been narrowed to one category of claim so there would not be a quick or easy ability to further separate by a type of classification.
I realize that the most highly recommended answer is to continue to collect larger sample sizes over time, but this audit process incurs external cost for each claim reviewed so there is a cost to larger sample sizes.
Thanks so much for the great input and discussions.0January 28, 2013 at 6:01 pm #194644
Robert ButlerParticipant
be notified via email.So what was the result of the test of the two proportions?
0January 29, 2013 at 7:22 am #194647
JohnsonParticipant
be notified via email.Using ‘dollars’ in a 2 proportions test (accurately assigned claim dollars vs. leakage dollars), the pvalue is .000 showing a 10.5% improvement (pre vs. post improvement) (valued at $600,000+)
Using the 2 proportions test for ‘claims having leakage or not having leakage’ the pvalue is .25 and does not show statistically significant improvement. But as described about, the issue is not so much if the claim has leakage or not but what magnitude is the leakage $.
This brings us back to the original Q&A, I think….. Is a 2 proportions test ever acceptable/correct for continuous data like dollars. (pre and post)(leakage dollars vs. non leakage dollars) I know the 2 proportions test is not optimal, but I am not able to identify another test that truly measures the improvement in leakage dollars. Thanks again!
0January 29, 2013 at 7:30 am #194648
JohnsonParticipant
be notified via email.Running a 2 x 2 chi square test, I also get a pvalue of .000.
Thanks!
0 
