Hypothesis Test for Y(%) – Analyze

Six Sigma – iSixSigma Forums General Forums Tools & Templates Hypothesis Test for Y(%) – Analyze

This topic contains 19 replies, has 4 voices, and was last updated by  Dhurkadaas 1 year, 1 month ago.

Viewing 20 posts - 1 through 20 (of 20 total)
  • Author
  • #55765



    I am working on BB Project aiming at improving OTIF (on time in full deliveries).
    Currently in analyze stage struggling with finding proper hypothesis test to confirm statistical importance of potenitial Xs for Y.

    Y (%) – OTIF% -> discrete

    Potential Xs -> discrete
    X1 – Inventory difference
    X2 – Operational mistake
    X3 – Missing material
    X4 – Material damaged
    X5 – Overpick
    X5- Delay caused with too long waiting time
    X6- Delay casued with too long system transfer

    I have got 26 records (for 26 workng days).

    Which Test should i use?
    Tried Chi2 but had some difficulties with writing Y% and sum of each Xs in one table…



    Robert Butler

    Question: How many deliveries do you have in 26 days?
    Question: How do you know that a 26 day sample is at all representative?
    Question: Does the existence of any one of the X’s result in a failure or do you have some kind of overlapping where a failure might have one or more of the X’s?

    Based on what you have written I would not recommend trying to run a statistical test. What I would first do is get a sense of what the distribution of errors supposedly connected with a failed on-time delivery looks like.

    To that end you should ask a couple more questions and then graph your counts. I would start with something like the following:

    If they are completely independent (i.e. the occurrence of one failure excludes the possibility of any other) then a simple bean count expressed as a Pareto will identify the X’s you should pursue.

    If it is the latter then summarizing everything in Pareto chart form will allow you to quickly identify the most common X’s associated with failures and you could also quickly determine how much of a change would result if you eliminated the top two or three.



    Thank you Robert for prompt reply.

    1. # of deliveries in 26 days?
    About 1,000 lines is delivered per day. In this case total lines for 26 days are 25,537.

    2. Are samples representative?
    Outcome form Minitab : Individual Distribution Identification

    Descriptive Statistics

    N N* Mean StDev Median Minimum Maximum Skewness Kurtosis
    26 0 0.905509 0.0750262 0.932272 0.665468 0.975425 -1.72130 3.08891

    Box-Cox transformation: λ = 5.00000

    Johnson transformation function:
    -1.72876 + 0.779356 × Ln( ( X – 0.485314 ) / ( 0.978762 – X ) )

    Goodness of Fit Test
    Distribution AD P LRT P
    Normal 1.610 <0.005
    Box-Cox Transformation 0.904 0.018
    Lognormal 1.867 <0.005
    3-Parameter Lognormal 1.615 * 0.048
    Exponential 10.281 <0.003
    2-Parameter Exponential 6.382 <0.010 0.000
    Weibull 0.991 0.011
    3-Parameter Weibull 0.848 0.009 0.163
    Smallest Extreme Value 0.848 0.025
    Largest Extreme Value 2.550 <0.010
    Gamma 1.780 <0.005
    3-Parameter Gamma 33.284 * 1.000
    Logistic 1.221 <0.005
    Loglogistic 1.368 <0.005
    3-Parameter Loglogistic 1.221 * 0.107
    Johnson Transformation 0.124 0.984

    3. Overlapping Xs?
    When X occures then status “problematic line” is given and only one category could be ascribed.

    I consider building transfer funcion on how X occurance reduce Y, but it results from OTIF formula.
    OTIF % = [# Delivered lines (on time & in full) / # Ordered lines] X 100%.

    I totaly agree Pareto in finding most wanted Xs but how to show its influance on OTIF.



    Robert Butler

    Since you only assign one X to a given failure the best bet would be to run a logistic regression with the response being pass/fail and the X’s coded as 0/1 (No/Yes) for occurrence. The coefficients of the output of a logistic regression are odds ratios and they express the odds of occurrence of a success for the occurrence of a given X.

    If you have the capability you should test your X matrix to make sure that the X’s of interest are independent of one another in the block of data you are using. As I recall Minitab can check the variance inflation factors(VIF’s) of the X’s (running eigenvalues and condition indices would be better but I don’t think Minitab has that capability). The usual rule of thumb is if the VIF is >10 then that variable should be dropped from the analysis because it is too confounded with the other variables.

    The other thing that could occur is you will get a note that you have quasi-separation and the fit of the model is questionable. What that means is that even though your variables passes the VIF check you have a situation where there is almost a perfect match between the 0/1 values of one or more X’s and the corresponding 0/1 values of the Y (fail/pass).

    If you get this warning you will have to take your variables and add them to the logistic regression one term at a time. Eventually you will get the message, drop that variable and add the next variable in the list. Repeat this procedure until you have a matrix of X’s that do not result in quasi-separation. Run a backward elimination regression and generate your reduced model (the model containing only those terms which remain statistically significant (P <.05 – is a the usual choice)).

    Since you want to predict success the reference for the Y will be 0 (failure) and you will be predicting the odds of success given a unit change in a given X. The references for the X’s will also be 0 so the odds ratios will describe the odds of success given that a particular X has occurred relative to its non-occurrence.

    Some of your odds ratios may be < 1 in which case you have a situation where the non-occurrence of a given X leads to a greater chance of success. Some people have a hard time dealing with this idea so if you get this situation just change the reference for that particular X to the opposite and you will get the inverse of the odds ratio which will be > 1. Of course you will have to note the change of reference when discussing that particular X in relation to its impact on a Y success.



    For clarification i attach Excel with data (should have done it before).

    During Measure we have colledcted data on the daily basis (OTIF% and sum of occurance of each error (Xs)).

    Each Xs is related with delays (nOT) or missing material (nIF), both affects Y – It results from OTIF formula. But I prefere to work with only one Y so let’s stick to OTIF.

    Is there any other statistical tool (exept Pareto) to:
    1. Show how Xs influence Y.
    2. Prove statistical importance for key Xs.

    Normaly I would stick to Pareto but it won’t show relation Xs with Y (unless there is some other way).



    Robert Butler

    You are right – seeing the data does make a difference. Given what you have the approach I outlined in my last post won’t work. If we take your data set and check the X’s for independence they have enough to permit the inclusion of all of them in a multivariable model. If you run backward elimination on a model of the form
    Y = fn(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)

    Then the final model is Y = .933 – .0066*X9 where R2 = .47 and P = .0001.

    While the summary statistics sound great, a simple plot of X9 against Y tells a very different story. The significant correlation between X9 and Y is being driven by three data points. If you drop those values where X9 > 10 you will still get a significant correlation (P = .03) but your R2 drops to .19.

    If you plot Y against all of the X’s and run simple univariate regressions of Y against each X you will get significant correlation between Y and X8,X9, and X10. However, when you build a multivariable model with the three terms only X9 remains significant.

    In short, for the data you have gathered there isn’t much of any kind of a relationship between the various X’s and the Y.



    Thank you Robert, I need few days to come up with new data and Xs.




    I got stuck in analyse, trying to find a way to test right hypothesis in a right way.
    My challenges are:

    1. Show relation between potential „Xs” and „Y”.
    2. Hypothesis verification

    OTIF(%) = Lines delivered (on time in full)/ Lines ordered *100%
    All potential Xs are problematic lines (erros) and according to OTIF formula they reduce OTIF%.

    Right test:
    Y% is discrete, potential Xs (number of lines with specific error) discrete.
    For both (Y and Xs) discrete values my options are limited to CHI TEST.

    Chi test:
    I collected data for Chi test in excel (in attachment).
    There is:
    1. Y – stands for OTIF level
    2. OTIF describe as category A, B and C
    Mark 2 OTIF level
    C <0.8944
    B 0.8945-0.9159
    A >0.9160

    3. Ordered lines
    4. Problematic lines (sum of all potential Xs)
    5. List of potential 9 Xs.
    I run test in Minitab and results are below:
    Chi-Square Test for Association: OTIF Mark 2, Worksheet columns

    Rows: OTIF Mark 2 Columns: Worksheet columns

    X1 X2 X3 X4 X7 X8 X9 All

    C 27 52 158 52 5 32 52 378
    18.13 65.72 155.91 31.50 17.90 34.90 53.94
    2.428 -2.117 0.248 4.338 -3.553 -0.586 -0.324
    4.340 2.864 0.028 13.341 9.299 0.241 0.069

    B 19 72 187 24 26 29 76 433
    20.77 75.28 178.60 36.08 20.51 39.98 61.78
    -0.462 -0.484 0.953 -2.442 1.444 -2.118 2.270
    0.150 0.143 0.395 4.046 1.471 3.014 3.272

    A 34 166 343 63 48 93 110 857
    41.10 149.00 353.49 71.42 40.59 79.12 122.28
    -1.628 2.198 -1.044 -1.492 1.709 2.348 -1.720
    1.228 1.940 0.311 0.992 1.353 2.434 1.234

    All 80 290 688 139 79 154 238 1668

    Cell Contents: Count
    Expected count
    Adjusted residual
    Contribution to Chi-square

    Pearson Chi-Square = 52.165, DF = 12, P-Value = 0.000
    Likelihood Ratio Chi-Square = 53.780, DF = 12, P-Value = 0.000

    After running the test I come up with few worries:

    1. First of all, does the test run correctlly?

    2. Only Y and potential Xs were taken into test. Some samples with more Xs (errors) might have higher OTIF than samples with less Xs. It is due to larger amount of ordered lines (this was not taken into test).
    It there any solution?

    3. Having 3 categories for OTIF (A, B and C), there are 16 times A, 6 times B and 3 times C. Due to this disproportion Category A (highest OTIF) have 53% of all problematic lines (errors).
    Does Chi test could manage with it?

    If you have any other suggestion on:
    1. How to show relation between potential „Xs” and „Y”.
    2. Hypothesis verification for this case
    Please let me know.



    Robert Butler

    No, your options are not limited to the chi-square test. I realize there is boilerplate out there which insists that percentage measures must be viewed as discrete but, in fact, percentage values can be treated as continuous measures and you can run regressions between the Y and the X’s. By binning your Y values into A,B,C you are throwing away lots of valuable information.

    Before doing anything else – plot your data – that is plot Y against each individual X and see what you see.

    A check of you matrix indicates the VIF’s are within reason. If you run a backward elimination regression of Y against all of the X’s you will get a final model of the form
    Y = fn(X3,X6,X8,X9) where all of the X’s have a p-value < .05. The adjusted R2 is .83 and the plot of the residuals against the predicted values does not reveal anything amiss.

    All 4 of the X’s have negative coefficients which means as they increase your Y value decreases.


    Chris Seider

    May I suggest you gather even more data but…. Consider doing a pareto chart of reasons for not being OTIF (on time in full). However, if multiple reasons ARE possible, make that dual reason its own category.



    Thank you for reply and suggestions.
    Test results are in attachment:
    1. Matrix plot – showing relation between X’s and Y
    2. Backward elimination regression
    (H0- there is no correlation between X and Y vs H1 – there is correlation between X and Y)
    Confirmation of key Xs.
    3. Pareto – showing cumulative value of each X’s (3, 6, 2, 5, 9 causes 80% of all problematic lines). In this model multiple reasons are not possible (line might have only one category of error).
    What we have:
    – Graphical tool showing relation between X and Y
    – Hypothesis verification
    1. Regression requires both data to be continuous.
    X’s were numbers of errors occurrence (problematic lines), Shouldn’t it be discrete?
    I understand that Y in % might be treated as continuous when we have a lot of data.

    2. What type of residual analysis to run before starting regression test? If any? What if P-value<0.05?
    During next few days I will gather some more data.


    Robert Butler

    Question #1:

    Short answer: no, it doesn’t. The statement that regression requires both the X and the Y to be continuous is one of the boilerplate “rules” which are provided during six sigma training. None of the boilerplate rules are wrong but all of them are overly restrictive. The reason they are overly restrictive is to make sure that someone with little or no training in statistics will not do something really stupid and make potentially catastrophic mistakes.

    In the six sigma courses I’ve attended/been party to the instructor has made it a point to emphasize the restrictive nature of these rules and that as a practitioner you need to remember this, question it, and take the time to find out what you really can and cannot do with respect to regression or anything else.

    A check of any good book on linear regression will show you that you can run regressions where Y is continuous, discrete, ordinal, categorical, etc. The same holds true for the X’s. If you have nominal values for the X’s you will need to employ the methods of dummy variables in order to include the nominal variable in your analysis and if you have nominal values for the Y then you will need to consider multinomial regression methods.

    As for your understanding about percentages – again – overly restrictive but you need to know why and you need to know what you have to take into consideration with small data sets.

    Question #2:

    Short answer -none. Residuals are what you get after you have run the regression. They are the difference between the actual measures and their corresponding predicted values.
    As for residual tests – you can run them if you want but what you really need to do is plot them against the predicted values,the various X’s, on a normal probability plot, and, if you have a time order, against time and see what you can see. Regression analysis is first and foremost a matter of graphical assessment and the better books on regression will devote one or more chapters to this issue.

    If you need a reference I’d recommend borrowing a copy of Applied Regression Analysis by Draper and Smith and a copy of Regression Analysis by Example by Chatterjee and Price from your library(you can most likely get these through inter-library loan).


    Robert Butler

    I looked over your regression output. I see you put a cutpoint of .1 which means you get to add X4 to the list. I’d recommend re-running the analysis with a cutpoint of .05. and see what you get. After running it both ways put both through the wringer with respect to residual analysis and see if there is any real difference between the two. Given what I saw with respect to the residuals for the 4 variable I doubt that relaxing the selection criteria will give you much of anything more with respect to a predictive model.


    Chris Seider

    OTIF is actually a melded metric. I’m well aware of it but you may gleem more info if you look at %full vs the X’s separately from On Time vs X’s.

    From your pareto chart, assuming the data for OTIF is CORRECT (which isn’t an easy metric to calculate–depending on the supporting systems), it seems you should look at the top 4 and drill down more.




    Project team has followed Your advice and drilled down main contributor X1 – inventory discrepancies (30.4% responsible for Y according to Pareto).
    This changes scope of the project in Define but data collected in Measure connecting with OTIF and all possible Xs are still in usage.
    Team created a new Ishikawa, with all possible root causes of inventory discrepancies as a head. Then by using Funnel, 3 most probably Xs were selected for 3 hypothesis tests.

    1. Weights indication are different on different weights (small material is weighted and software transferred grams into pieces).

    Test condition: 13 weights were tested with 8 trials each (by different operators).
    Data type: Continuous

    Data were checked with probability plot and test of equal variance both P-value < 0.05, we used non-parametric test.

    In this case we use Kruskal-Wallis test. P value < 0.05, Ha – medians among weights are different.

    2. Discrepancies in amount of material provides from Production to Warehouse is less than 0.05% all delivered lines.
    Test condition: 3650 lines delivered from Production to Warehouse were checked.
    Data type: Discrete

    In this case we use One-Proportion test we found 14 discrepancies. P-value (0.870) > 0.05, H0 – Discrepancies in delivered material from production is below 0.05%.

    Here first question: Should data distribution be checked before One-proportion test?

    3. Discrepancies among material groups are the same.
    Test condition: 8 material groups (columns) and 2 possible results correct or with discrepancies (rows).
    Data type: Discrete

    In this case we use Chi2 test. P-value (0.653) > 0.05, Ho – There is the same discrepancies among the material groups.
    Second question: Should data distribution be checked before Chi2 test?

    Looking forward to Your reply.


    Robert Butler

    For #1 You will need to provide more details. As it reads it sounds like you took the same 13 things and had a group of operators test these things 8 times. If this is correct then there is a lot more you will need to do but before we go there please provide more details of the test set up.

    For #2 and #3 it sounds like you are just running a yes/no defect count – that is if a lot has any number of defects it is classed as 1 otherwise it is classed as 0. Is this the case?
    If so, then all you are running is simple proportions tests and the issue concerning distributions has already been answered – the distribution of your measures would be binary so there’s nothing to check.


    Chris Seider

    My compliments that you’re starting to gather data.



    Thank you for inputs.
    #1 I provide in attachment test results (with test description). Here I got a little confused as testing weights is similar to R&R in Measure but weights are possible X (Ishikawa), which should be verified with proper test in Analyse.
    Now I am not sure if Kruskal-Wallis is the best option for comparing weights (nonparametric data with no outliners).

    In #3 groups are compared with each other with Chi2 test. When do we need to check distribution before running Chi2 test?


    Rolly Sotelo

    I need to see your Minitab formatted data for the x’s and Y




    Your Discussions gave me some clarity, kindly share me the Project on Absenteeism Reduction in Shop floor. I have last 6 months data. Kindly guide me

Viewing 20 posts - 1 through 20 (of 20 total)

You must be logged in to reply to this topic.