Paired T-Test or Independent Sample T-Test

Six Sigma – iSixSigma Forums General Forums Tools & Templates Paired T-Test or Independent Sample T-Test

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
  • #55793

    Adrian Martin Manansala

    Hi there!

    I have this research on testing. I have the same respondents and I let them take a standardized admission test and a self-made test. They took the standardized test first and after 2 months, after making the self-made test, I administered to them the self-made test. Now, I am confused since I don’t know which statistical tool to use to compare their means and to see if there is a significant difference between their means. Can you please help me decide which t-test should I use? Thank you!


    Robert Butler

    You will need to provide more information before anyone can offer much in the way of suggestions. As written the answer to your question is neither.

    Since the same population is taking both tests the results amount to repeated measures which means the assumption of independent measures – a requirement of a two sample t-test – has been violated. This would seem to imply that the correct is a paired t-test but there are issues…

    1. You have a standardized test and you have a self-made test.
    a. Are the number of questions the same?
    b. Are the counts of the questions within question category type the same?
    c. Is the grading structure of the tests the same?
    d. What guarantees do you have concerning issues of equivalence with respect to question content and type between the two tests?

    2. When you say self-made – what exactly do you mean by that?
    a. Every individual made their own test?
    b. A group of people got together, brought all of their favorite questions to the table and, on the basis of group consensus, chose a subset of the proffered questions and then took a single test containing these questions?
    c. Neither a or b. If this is the case then how was it built?

    If the answer to #2 is either a or b then how do you separate the questions from the individual who built them? It would seem reasonable to assume that there is going to be a significant bias with respect to individuals recognizing their input and thus knowing the correct answer to the question.

    If equivalence between the tests has not been confirmed then all a paired test is going to show is that the mean of the differences in test scores either is or is not significantly different from 0 and that finding will mean absolutely nothing.

    To provide some perspective on the last comment concerning the test of differences in test scores consider the following: There is a Stanford Math Test which is given to all of the students in the 3rd grade in my school district. The Stanford test is constructed in such a way as to minimize the impact of reading skills in order to guarantee, as much as one can, that the test is actually testing math skills and not reading skills.

    The state test for math, which is first given in the 4th grade, does not make any attempt to distinguish between math and reading skills. If one runs a paired t-test on the 3rd and 4th grade results on a student-by-student basis one can easily demonstrate that, for many segments of the student population. there is a statistically and (if you are not thinking about what you are doing) a supposedly physically meaningful decline in student competency with respect to mathematical skills.

    However, if you take a look at the Stanford reading test scores (which are also given in the 3rd grade), you will find, almost without exception, that students who scored low on the Stanford reading test also scored low on the state math test. A check for the correlation between 3rd grade Stanford reading scores and state 4th grade reading scores indicates near perfect agreement – poor reading skills in third grade = poor reading skills in 4th grade.

    Thus the decline in math skills between 3rd and 4th grade has nothing to do with math and everything to do with the fact that the state test for math tests reading skills first and math second. What this means is that the decline in math scores for many of the students in my public school system is due to the fact that the two tests are not equivalent and has absolutely nothing to do with the competence of the teachers.

    As you might expect, all of this is lost in the fulminations concerning the quality of public education in my state. Just in case you are wondering about the above statements – I didn’t get this from reading some report – these findings are the result of an analysis I did on district proficiency scores several years ago.


    Adrian Martin Manansala

    Hi @rbutler ! Thank you so much to your reply!

    1.a. Yes, both have 30 questions.
    b. Yes.
    c. Yes.
    d. Both are multiple choice type of test and the learning outcomes that they were aligned with are coming from one curriculum, the K-12 English Curriculum of the Philippines.

    2. Self-made test meaning I myself made it. I did it with help from my adviser who has a masters degree in English Literature.


    Robert Butler

    Based on your second post it sounds like there is a chance the two tests will be equivalent. However, the question of equivalence of question structure remains. In order to address this you would need to review the questions with someone who understands all of the issues involved in question construction.

    Since you probably do not have access to someone with this skill set you can go ahead and run a paired t-test on the results but before you do you will need to decide what kind of difference between the two tests would be physically meaningful. If you get a statistically significant difference between the two tests but the score differences do not matter then you will need to report that fact. You will also need to report that you have assumed an equivalence of question structure and, if your work is to be reported out as a paper, you will need to provide both the standardized questions and your questions in an appendix.


    Douglas Brown

    I’m confused too ? as to the point of the study. If it is to verify the effectiveness of some intervening activity that one group did, then you could have administered the same standardized test or, at worst, another validated version of it. To use your new test you’d have to start by evaluating the two groups at the same point of knowledge and then re-administering it. If you are trying to validate that your new test measures the same as the other one, then have a pool of respondents take both at the same time. Your advisor is a Lit expert but the experiment is a statistics problem.


    Chris Seider

    You can always run the paired test and be aware that IF there’s a statistical difference, it may be the difference in test design as the “cause”. Hard to tell as @rbutler well stated.

    Was the test environment different? All sitting in classroom first time and then online, any set of reasons could cause a change.

    Glad to see you’re using data to understand if there’s a shift.


    Chuck White

    I would add one more complication to your study. In your description you said that all of the participants took the standardized test first, and then took your own custom test two months later. That means any change over time would be confounded with the difference between the tests. In other words, if the participants either learned more or became more comfortable about the subject, or on the flip side if they forgot some of the subject matter during the two month interval, those differences would appear in the analysis as differences in the two tests even if the test were identical.

    A better way to conduct a study like this would be to randomly assign the participants to one of two groups. One group would take the standardized test first, and the other group would take your custom test first.

    That would remove any time based differences, but as Robert and Chris pointed out, there could be many other factors that could cause differences in test scores.


    Leonard D. Stimley

    Sounds more like you want to correlate. Did the high scorers on the standard stay high on the self-made? Did the low scores stay low? Are the tests related to each other in a meaningful way? To answer this question do a correlation. The t-test will not answer your question.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.