Compare between labs – Paired t-test

Six Sigma – iSixSigma Forums General Forums Tools & Templates Compare between labs – Paired t-test

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
  • #53542



    I’m looking for confirmation that my approach to this problem is correct…

    We have just set up a laboratory in the United States, the intention is that this lab will take over much of the product testing that is currently being done here in the United Kingdom. We need to have confidence that the results coming out of the US lab are in agreement with our measurements.

    I propose to select 20 of our products and ask the US lab to test each product and report their results. We shall do the same here in the UK.

    I should then have two sets of 20 results on which I intend to do a paired t-test. From this I should be able to determine if there is a signifigant difference between the US mean and the UK mean. I then intend to do an F-test to determine if there is a signifigant difference between the US std. dev. and the UK std. dev. Finally I can use the total std. dev. to calculate if I need both labs to test more than 20 products to achieve a 95% confidence level.

    This should be enough to satisfy the UK.

    Would you agree that this is a sound approach?
    What if 19 of the 20 pairs of results were in close agreement but one pair differed, is there a danger that I fail to spot the problem pair?
    Is there any need/advantage in doing repeat measurements on one of the products to get the std. dev. of the test method?

    Thanks for your help.


    Robert Butler

    I think you will need to provide more information before anyone can offer anything.

    You said ,”I propose to select 20 of our products and ask the US lab to test each product and report their results.” and you followed this statement with ” I should then have two sets of 20 results on which I intend to do a paired t-test.” The way this reads is that you have 20 separate products and you are going to take two samples from each of the 20 separate products and ask each lab to test a sample. If this is the case it is unlkiely that you will find much of anything.

    1. How do the products differ?
    2. How are the samples taken? Single sample split, two sequential samples from the output, etc.
    3. Unless you have a very odd process you are going to need more than one sample per product. The issues of how these samples are gathered and how they are tested will have to be addressed – you will want independent samples which means you will have to take steps to insure independence. You will also want the samples blinded and you will want to have the samples tested across time/operators/lab machines.
    3. Agreement – as a term of analysis – isn’t going to be addressed by a paired t-test. If you really mean agreement then you should consider using the Bland-Altman test.

    If you can elaborate on the above perhaps I or someone else can offer additional thoughts.



    Hi Robert,

    Thank you for taking the time to look at this. I’ll try to answer your questions. We haven’t started testing yet so I’m open to using an alternative approach to the one I have suggested.

    And yes, I think you have understood what I suggested. The products in question are adhesives. Here in the UK we have samples of Adhesive A, Adhesive B, Adhesive C etc. In the lab in the US they also have samples of these products.

    Adhesive A has very different properties to Adhesive B which has very different properties to Adhesive C and so on. If both the US and UK measured, for example, the tensile strength of these samples there would be no relationship between samples (very different products) but I would expect/ hope for a pair-wise relationship ie. if tensile strength of A is two times tensile strength of B in the UK then it should follow that the US should find the same.

    I also believe that the mean and variance should be similar between labs (am I correct?)

    OK, I’ll try to answer your questions

    1. How do the products differ?
    Quite a lot (see above)

    2. How are the samples taken?
    The samples are batches of our product which are currently out in the market. Both labs retain a small sample of previously manufactured product. I can specify a batch number to ensure we are both measuring the same batch.

    3. Unless you have a very odd process you are going to need more than one sample per product.
    I had hoped to keep testing to a minimum an since all I’m really interested in is the mean and variance of both labs, I had hoped that 20 samples would be enough. The thing I want to stress here is that I am not interested in “getting an accurate measure of the tensile strength of Adhesive A” per se, rather I’m interested in quantifying the level of agreement between labs.

    4. You should consider using the Bland-Altman test.
    I hope what I have said gives you a clearer picture of what I’m trying to do. If you still feel my approach is wrong I am open to other suggestions. I would also welcome your explaination as to why a paired t-test is inappropriate. In my simplistic view, when I wish to compare two sets of data that can be ‘paired up’ I automatically think paired t-test. So I’m willing to learn the errors of my ways.

    Thank you for your help so far.



    I’ve been thinking about my approach to this a bit more and yes Robert you are right and I am very wrong.

    Lets assume there is no big difference between the UK lab and the US lab, that is to say all US results are within 5% of the corresponding UK value.

    But products have very different values from each other. So the UK lab might measure Adhesive A to have a tensile strength of 1000 psi the US lab might get a value of 1050 psi. For Adhesive B the UK might measure 5000 psi and the US might get 5200 psi. Both pairs of results are in reasonable agreement with each other but the difference in Adhesive A is only 50 psi while the difference in Adhesive B is 200 psi. Clearly Adhesive B will have a greater influence on my calculated t-value. I can imagine a situation where if all other adhesives were around 1000 psi then the difference between US and UK measurement of adhesive B starts to look like an outlier.

    I now realise I should be a lot more careful about using a paired t-test. Up until now when data came in pairs I would used paired t. I shall have to read up on Robert Butlers suggestion of using Bland-Altman test.

    This leads me to ask one more question, how do I know when it is appropriate to use a paired t-test. Can somebody give me an example of where you would use a paired t-test and how can you be sure that ther isn’t great differences between pairs.



    I agree paired t-test comparison is the best tool to analyze such tool.

    Before doing this test I would suggest you doing the below.

    For the matter what we trying to do is to get confidence on the measurement result which we will get from both locations. Obviously measurement system to be perfect on both the places, how to find that? Asking the right question will get right answer lead to a solution.
    By asking below 3 question in the both locations will give you a clue.

    1) Is the measurement is Consistent?
    By repeatedly measuring the same production 30 times and plotting in a control chart will give you the result of measurement system is consistent

    2) Is the measurement is precise
    Precision is the 0.6745 * mRbar/ d2 from the range chart of above control chart.

    3) Is there any detectable bias?
    With the repeated measurement you will get an average, to find the bias
    Average +/- t0.05 * (std dev / sqrt(n)). If you the product can be measured accurately by another higher precision method, if this value lies between the bove confidence interval, there is no detectable bias.



    You must be certain that the samples you wish “to test” are dependant; a paired t-test is only used when you wish to test “pairs of dependant data or samples”. A two-sample t-test adds some variation due to the independence of the data samples; a paired t-test does not. As per above you must establish data dependence prior to selecting your test method.

    If I am reading your description correctly, it seems as though you are attempting to establish or perform a measurement system analysis (MSA).

    Best of luck…..



    Thank you guys for your reply.

    Sthothat; I agree it would be good to find out if the test is consistient, precise or biased. I am reluctant however to spend too much time on this, especially as there are several tests to compare between US and UK. At the moment I am focused on the agreement between the two labs, perhaps at a later date I can look into the test method itself.

    Optomist1; Yes, I am confident that the pairs of data are dependant. The same sample of each product is tested in both the US and UK. So the only difference in testing Adhesive A is that one test is carried out in US (US chemist, US equipment) the other in UK. It’s the same for Adhesive B, Adhesive C etc.

    I also see you recommend a ‘measurement system analysis’ I guess you are recommending Gauge R&R. I need to look into this.



    Good Day,

    I don’t know much about your test equipment, but generally, yes, unless already quantified, a Gage R&R is essential. One needs to know if the test equipment/fixture/gage has sufficient measurement discrimination, repeatability, stability etc. In addition a gage accuracy or bias study may be needed.

    This is especially relevant given that measurements will be taken or made at different locations, assuming different equipment.




    Save yourself time and angst and do whatever Robert Butler has advised and ignore the other advice. His advice is all you will need. Best of luck.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.