# Hypothesis Testing – Levene’s Test or Moods Test?

Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Hypothesis Testing – Levene’s Test or Moods Test?

- This topic has 11 replies, 2 voices, and was last updated 10 months ago by Robert Butler.

- AuthorPosts
- May 25, 2019 at 1:19 am #239403
Hi All,

I need some insight on a problem I am trying to solve statistically. I am quite new to statistics (barely 2 months now).

Background: I have been given a task to solve which involves investigating the report production cycle time with the aim of identifying the special causes of variation in order to reduce them. I have mapped out the process and collected data for 2 main activities within the process for analysis – the cycle times for ‘draft report’ and ‘cleanse data’ 9with information on report type, department, error type, etc.). I did a multi-vari analysis which indicates that a large family of variations are from the ‘cleanse data’ activity. So I have decided to look for my variable Xs in cleanse data activity.

1st Hypothesis Testing: I have developed my statistical question as : I want to know with 95% confidence if the cycle time for cleanse task process is influenced by the source department that prepared the original data file with HO: µD1 = µD2 =µD3 = µD4 = µD5 and HA: At least one department is different from the others.

Tests: Based on my A-D normality test done on the cycle time data, I concluded it was not normal and proceeded to test for variance instead of mean by using levene’s test since I have more than 3 samples (i.e.5 departments)

Question: Does my logic work and is this the right test to use? Should I be checking for centrality or variance?

Results from A-D test.Descriptive Statistics Time (Days)

Count 126

Mean 4.444

StDev 2.240

Range 8.224

Minimum 0.205645

25th Percentile (Q1) 2.281

50th Percentile (Median) 4.880

75th Percentile (Q3) 6.230

Maximum 8.430

95.0% CI Mean 4.0486 to 4.8385

95.0% CI Sigma 1.9935 to 2.5568

Anderson-Darling Normality Test 2.391

P-Value (A-D Test) 0.0000

Skewness -0.218495

P-Value (Skewness) 0.3024

Kurtosis -1.191

P-Value (Kurtosis) 0.00000May 25, 2019 at 8:39 am #239405

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If your post is an accurate description of your approach to the problem then what you are doing is wrong and you need to stop and take the time to actually analyze your data in a proper statistical manner.

The first rule in data analysis is plot your data. If you don’t know what your data looks like then you have no idea what any of the tests you are running are telling you.

So, from the top, I’d recommend doing the following:

You said, “I have … collected data for 2 main activities within the process for analysis – the cycle times for ‘draft report’ and ‘cleanse data’ 9with information on report type, department, error type, etc.). I did a multi-vari analysis which indicates that a large family of variations are from the ‘cleanse data’ activity.”

Run the following plots:

First – side by side boxplots of cycle times (be sure to select the option that plots the raw data and not just the boxplots themselves) split by activity. Do the same thing with overlaid histograms of cycle times by activity type.

What this buys you;

1. An assurance that the “large family of variations from the cleanse data” really is really a large family of variation and not something that is due to one or more of the following: A small group of influential data points, multi-modalities within cleanse data and/or within draft report whose overall effect is to give an impression of large variation within one group when, in fact, there really isn’t, some kind of clustering where one cluster is removed from the rest of the data, etc.Second – side by side boxplots within each of the two main categories for cycle time where the boxplots are a function of all of the things you listed (“report type, department, error type, etc.”). What you want to do here is have boxplots on the same graph that are plotted as (draft report (DR) report type A), (cleanse data (CD) report type A), (DR report type B, CD report type B)…etc.)

What this buys you:

1. An excellent understanding of how the means, medians, and standard deviations of reports, department, error type, etc are changing as a function of DR and CD. I am certain you will find a lot of commonality with respect to means, medians, and standard deviations between DR and CD for various reports, departments, etc. and you will also see where there are major differences between DR and CD for certain reports, departments, etc.2. The boxplots will very likely highlight additional issues such as data clustering and extreme data points within and between DR and CD. With this information you will be in a position to do additional checks on your data (why clustering here and not there, what is the story with the extreme data points, etc.)

3. If the raw data on any given boxplot looks like it is clustered – run a histogram and see what the distribution looks like – if it is multimodal then you will have to review the data and figure out why that is so.

Third – multivar analysis – I have no idea what default settings you used for running this test but I would guess it was just a simple linear fit. That is a huge and totally unwarranted assumption. So, again – plot the data. This time you will want to run matrix plots. You will want to choose the simple straight line fit command when you do these. The matrix plots will allow you to actually see (or not) if there is trending between one X variable and cycle time and with the inclusion of a linear fit you will also be able to see a) what is driving the fit and b) if a linear fit is the wrong choice.

1. You will also want to run a matrix plot of all of the X’s of interest against one another. This will give you a visual summary of correlations between your X variables. If there is correlation (which, given the nature of your data, there will be) then that means you will have to remember when you run a regression that your variables are not really independent of one another within the block of data you are using. This means you will have to at least check your variables for co-linearity using Variance Inflation Factors (eigenvalues and condition indices would be better but these things are not available in most of the commercial programs – they are, however, available in R).

Once you have done all of the above you will have an excellent understanding of what your data looks like and what that means with respect to any kind of analysis you want to run.

For example – You said “I want to know with 95% confidence if the cycle time for cleanse task process is influenced by the source department that prepared the original data file with HO: µD1 = µD2 =µD3 = µD4 = µD5 and HA: At least one department is different from the others. Tests: Based on my A-D normality test done on the cycle time data, I concluded it was not normal and proceeded to test for variance instead of mean by using levene’s test since I have more than 3 samples (i.e.5 departments)”

Your conclusion that non-normality precludes a test of department means and thus forces you to consider only department variances is incorrect.

1. The boxplots and histograms will have told you if you need to adjust the data (remove extreme points, take into account multimodality, etc). before testing the means.

2. The histograms of the five departments will have shown you just how extreme the cycle time distributions are for each of the 5 departments. My guess is that they will be asymmetric and have a tail on the right side of the distribution. All the Anderson-Darling test is telling you is that your distributions have tails that don’t approximate the tails of a normal distribution – big deal.

Unfortunately, there are a lot of textbooks, blog sites, internet commentary, published papers which insist ANOVA needs to have data with a normal distribution. This is incorrect – like the t-test, ANOVA can deal with non-normal data – see The Design and Analysis of Industrial Experiments 2nd Edition – Davies pp.51-56 . The big issue with ANOVA is variance homogeneity. If the variances across the means are heterogeneous then what might happen (this will be a function of just how extreme the heterogeneity is) is ANOVA will declare non-significance in the differences of the means when, in fact, there is a difference. Since you have your graphs you will be able to decide for yourself if this is an issue. If heterogeneity is present and if you are really being pressed to make a statement concerning mean differences you would want to run Welch’s test.

On the other hand, if you do have a situation of variance heterogeneity across departments, what you should do first is forget about looking for differences in means and try to figure out why the variation in cycle time in one or more departments is significantly worse than the rest and since you have your graphs you will know where to look. I guarantee if heterogeneity of the variances is present and if you can resolve this issue the impact on process improvement will be far greater than trying to shift a few means from one level to another.

1May 26, 2019 at 4:26 am #239417Hi Robert,

Thank you so much for this feedback. It is extremely very informative and much detailed. I shouldn’t have rushed into the hypothesis testing without really understanding my data files and distribution. I have been reading up on the use box plots and interpretation.

Just to shed more light on the project deliverable, the objective is to identify the critical input variables within a report production process and also verify that these variables have a statistically significant impact on overall report production time.

However, the data available for me to conduct this analysis is the following:

1) The DR data file – this data file just has cycle time (days) and 376 data points. Thus I can’t do much analysis with it in terms of plotting the variables.

2) The CD data file – this data file has in addition to the cycle time, a breakdown of variables like department type, data type, error type and report type. It has a count of 126.

Starting from the top as you suggested by analyzing my data, I plotted box plots, histograms and multi-vari charts for both DR and CD data files in a bid to analyse both cycle times. Objective: To decide which data to focus on or identify the data file with the dominant source of variation (I.e where my Xs are). I have attached my findings and interpretations. Would this be a good first step? If yes, how should I proceed in analyzing my CD data file by conducting tests to prove which X has a statistically significant impact on the report production time 9which is made up of the DR and CD cycle time). I have attached my findings with graphical representations for CD and DR cycle time.###### Attachments:

- Box-Plot_Histogram_Multi-Vari-Chart-DR-and-CD.docxYou must be signed in to download files.

1May 26, 2019 at 10:27 am #239424

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The histograms and the boxplots suggest some interesting possibilities. Based on what you have posted I’m assuming your computer package will not let you have a boxplot which includes the raw data. On the other hand, if there is a sub-command to allow this you really need to have a plot that shows both.

If my understanding of your process is correct namely the process flow is Draft Report and then it goes to Cleanse Data the histograms suggest a couple of possibilities:

1. The Draft Report (DR) phase is completely independent of the Cleanse Data (CD) phase

2. The folks in the DR phase are ignoring all of the problems with respect to data and are just dumping everything on the CD group.Of course, if it is the other way around – first you need to cleanse the data and then you draft a report then what you are looking at is the fact that, overall, the CD group did their job well and DR gets all of the benefits of their work and kudos for moving so quickly.

My take on the histograms and the comment on the Multivar analysis is that for all intents and purposes both DR and CD have reasonably symmetric distributions. What I see is a very broad distribution for CD. I don’t see where there is any kind of skew to the left worth discussing.

To me the boxplots indicate two things:

1. Production Planning has the largest variation in cycle time

2. Production Department has the highest mean and median times and appears to have some extreme data points on the low end of the cycle time range.My next step would be the following:

1. If you can re-run the boxplots with the raw data do so and see what you see.

2. Plot the data for all 5 of the CD departments on a normal probability paper and see what I can see. These plots will help identify things like multimodality.

3 The boxplots suggest to me that the first thing to do is look at the product department and product planning.

4. Go back to the product department data and identify the data associated with the extremely low cycle times and see if there is any way to contrast those data points with what appears to be the bulk of the data from the product department.

a. Remove those extreme data points and re-run the boxplot comparison across departments just to see what you see.

5. Take a hard look at what both prod department and product planning do – are they stand alone (I doubt this)? Do they depend on output from other departments before they can begin their work? If they do require data from other departments what is it and what is the quality of that incoming work? Do they (in particular product department) have to do a lot of rework because of mistakes from other departments? etc.The reason for the above statements is because the boxplots suggest those two departments have the biggest issues the first because of greater cycle time and the second because of the broad range. The other thing to keep in mind is that prod planning has cycle times all over the map – why? This goes back to what I said in my first post – if you can figure out why prod planning has such a spread you may be able to find the causes and eliminate them. There is, of course, the possibility that if you do this for the one department what you find may carry over to the other departments as well.

1May 26, 2019 at 3:04 pm #239425Hi Robert,

Thank you for your feedback and taking the time to review this with me.

Yes, you are indeed correct, it’s the other way around as you described. The process starts with the CD activity which ensures the data is cleansed before proceeding to the DR activity so CD is doing a great job and laying the foundation for a fast DR task. But the CD task is consuming a lot of precious time in the overall process. For CD task, in addition to the low p-value, I can see what you mean that it has a wide distribution and I would assume having a wider spread would mean a wider variation in terms of cycle time (I guess I was looking at the p-value because I know that would determine the kind of test I run eventually)

To answer your question, I did plot all the box plots with the raw data (rwa data I would presume would be my data points?)

In plotting, i selected the CD cycle time as my numeric data variable ‘Y’ and the specific category of X (report type or department type or data type or error type).

The CD data file has information on the CD cycle time (which I have classified as a continuous variable data because of the way the data was recorded e.g. 0.87 days, 1.3 days, 8.22 days). This CD data also provides me with additional information as to – what report is being requested for, which department is sending the data, is the data dirty and what kind of error is causing the dirtiness.Conducting Steps 1 – 3 in your feedback above:

Plotting the histogram and Norm Prob. confirms your statements that production planning has the largest variation in terms of cycle time because of the StDev and the shape of the norm plot. The histogram confirms the production department having the highest range too. The histogram also does display multimodality especially for the production planning and finance departments with others exhibiting bimodality excluding production department.So I want to ask, how can I also statistically explore the linkage of department type with the other variables? When I look at the CD data as a whole, what i see is cycle time for cleaning different types of report, with information on the department that collected the data, the kind of error the data has and the type of report it is being used for. It is possible that the production department has such a large variation in cycle time because the data files from them has specific errors that are taking longer time to analyse and clean.

Questions: How can I explore these relationships? Is it possible to use a single box plot or mult-vari chart to show such possible problem linkages/relationships?

Also, in conducting these investigations, though the charts are very informative visually, I am required to prove it using hypothesis testing which was why I was trying to do develop root cause hypothesis questions that would help me prove the X with significant impact. By testing hypothesis questions like: Q1. Does the type of report being produced have a significant impact on the cleansing time? Q2. Is the CD activity significantly influenced by the department that collected the data? Q3. Is the CD activity significantly influenced by the different data error types? etc. And possibly try a combination of 3 variable linkage hypothesis question like Q4. Is the influence on cycle time by data collected by production planning dept with aggregate errors the same as the data with duplicate errors from the same department?

Thank you so much!

###### Attachments:

- Variables_Norm-Prob_-Histograms.docxYou must be signed in to download files.

1May 26, 2019 at 6:08 pm #239427

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.A few general comments/clarifications before we proceed

1. You said “I guess I was looking at the p-value because I know that would determine the kind of test I run eventually.” I assume this statement is based on something you read – whoever offered it is wrong – p-values have nothing to do with test choices.

2. When I made the comment about the boxplots and raw data what I meant was the actual graphical display – what you have are just the boxplots. Most plotting packages will allow you to show on the graph both the boxplot and the raw data used to generate the boxplots and it was this kind of graphical presentation I was referencing.

3. Cycle time is indeed continuous so no problems there and your departments/report types etc. are the categories you want to use to parse the cycle times to look for connections between category classification and cycle time.Steps 1- 3

Overall Data Type

Your first boxplot of data types – no surprises there – however keep this plot and be prepared to use it in your report – it provides a quick visual justification for focusing on dirty data and not wasting time with DR data

Error Type

The graph says the two big time killers are, in order, duplicated data (median 7.1) and aggregated data (looks like mean and median are the same 4).

Report Type

The biggest consumers of time, in order, are Prod Volume (median 5.995), Balances (median 5.79) and Demand Volume (median 4.7). So from a strictly univariate perspective of time these would be the first place to look. However, this approach assumes equality with respect to report volumes. If there are large discrepancies in the report output count then these might not be the big hitters. What you should do is build a double Y graph with the boxplot categories on the X axis, the cycle time on the left side Y axis (as you have it now) and report volume on the right-hand Y axis. This will give you a clear understanding both time expended and quantity. If you don’t have a double Y plotting capability, rank order the boxplots in order of number of generated reports and insert the count either above or below the corresponding boxplot. Knowing both the median time required and the report volume will identify the major contributors to increased time as a function of report type.Department Type

Same plot as before and I have nothing more to add as far as this single plot is concerned (more on this later).

Normality plots

These tell you a lot. With exception of the exploration department all of the other departments have a bimodal distribution of their cycle times. Distribution ranges: Finance – 1-4 and 5-7 days Prod Plan – <1 – 4 and 6-8 days, Sales <1-3 and approx. 4.5 – 5 days. Prod department may have a tri-modal distribution 0-3, 3- <6, and 6-9 days

Based on your graphs you know the following:

1. For each department you know the data corresponding to the greatest expenditures of time. For Finance it is the 5-7 day data, for Prod Plan it is the 6-8 day data, for sales it is the 4.5-5 day data, and for prod department it is the 6-9 day data.

2. You know the two biggest types of error with respect to time loss.

3. Given your graphical analysis of the report types and volumes you will have identified the top contributors to time loss in this category.Your first statistical check:

Since you are champing at the bit to run a statistical test here’s your first. Bean count the number of time entries in each of the 4 error types. Note – this is just a count of entries in each category, the actual time is not important.,

1. Call the counts from aggregates and duplicate data error group 1 and the counts from categories and details group two. Run a proportions test to make sure that the number of entries in group 1 are greater than the number of entries in error group 2. (This should make whomever is demanding p-values happy)Given that there really are more entries in aggregate and duplicate and given that you know from your graphical analysis which of the report types are the big hitters, pareto chart those reports, take the top two or three, and then run your second check by grouping the reports into two categories – the top three (report group 1) and all of the others (report group 2).

Now run your second check on proportions – a 2×2 table with error groups vs report groups. This will tell you the distribution of error groups across report groups. You may or may not see a significant difference in proportions.

Either way – what you now want to do is take the two report groups and the two error groups and look at their proportions for the two regions for Finance, Prod Plan, Sales, and Prod Department. What you are looking for is a differentiation between the short and long times for each department as a function of report type and error type. To appease your bosses you can run proportions tests on these numbers as well.

If the error types and the report types are similar across departments for the longer elapsed times then you will know where to focus your efforts with respect to error reduction and report streamlining. If they are quite different across departments then you will have more work to do. If there isn’t a commonality across departments I would recommend summarizing what you have found and reporting it to the powers that be because this kind of finding will probably mean you will have to spend more time understanding the differences between departments and what specific errors/reports are causing problems in one department vs another.

2May 27, 2019 at 11:53 am #239435Hi Robert,

Thank you. Apologies for taking so much of your time but am so grateful for the assistance.

You are right, I don’t have that plotting option and Y plotting capability. So I sorted the data manually and ranked as you suggested but unfortunately my box plot does not pick up the ranked file. It defaults to the original setting. I found a workaround. The count tells me that there are issues with balances report, looking at the quantity (6) versus the time expended.

So far, am struggling with the 3rd combination Error groups versus report groups. Am not certain I used the right proportion test and if my test result is correct I will still work on it.However, I am confused about the last point on looking at the proportion of error group Vs report group Vs departments. I am struggling to develop an appropriate hypothesis question. Does this mean I would need to run a Chi Square test with my sub-groups as :

– Error Group 1

– Error Group 2

– Finance

– Report Group 1

– Report Group 2

So Null: Error Grp 1= Error Grp 2 = Finance= Report Grp 1 = Report Grp 2

Interestingly, looking at my data points, I can already see the differentiation between the long and short cycle for this combination: Duplicated Data > Production Volume report > Production Department.###### Attachments:

- Hypothesis-Testing.docxYou must be signed in to download files.

0May 27, 2019 at 4:59 pm #239440

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.There’s no need to apologize. You have an interesting problem and I just hope that I will be able to offer some assistance. To that end, I may be the one that needs to apologize. I was re-reading all of your attachments and, in light of what you have in the most recent attachment, I’m concerned that my understanding of your process is incorrect.

My understanding was you had 5 departments in the CD group and that any one of the 5 could produce any one of the 8 types of reports and could expect to deal with any one of the four types of data errors. In reviewing the normal probability plots of the departments it occurred to me that there is a large difference in data points for the probability plot for the production department when compared to all of the others. This, coupled with the large count for Prod Volume reports is the reason for my concern about a dependence between department and report type.

My other assumption was that the counts for the types of error occurrence was “large” which is to say counts in the realm of >25, however, your most recent attachment indicates the boxplots for the types of error are summaries of very small groups of data.

So, before we go any further – are the assumptions I’ve made above correct or are they just so much fantasy and fiction?

In particular:

1. What is the story concerning department and report generation?

2. Do any of the departments depend on output from the other departments to generate their reports?

3. What is the actual count for the 4 error categories

4. Why is it that production department has so many more data entries on the normal probability plot than any of the other departments?If I am wrong (and I think I am) please back up and give me a correct description of the relationships between departments, report generation, and types of error a given department can expect to encounter.

One thing to bear in mind, the graphs you have generated are graphs that needed to be generated in order to understand your data so if a change in thinking is needed the graphs will still inform any additional discussion.

0May 27, 2019 at 5:34 pm #239441Hi Robert,

1. Yes, your understanding is correct to a large extent. The CD activity is an activity executed by a special team that prepares report. Now, this data for reports are obtained from different departments depending on the type of report being produced. Hence, the departments are the original source of data. Some departments send dirty data while for some departments data does not require cleansing.

In a bid to help with the investigation, the data file for the CD activity was classified to show which department sent the data file and the type of error they experienced during cleansing.

Time Report Type DeptTy Data TypeError Type

4.52 Demand Volume Sales Clean Details

4.70 Demand Volume Sales Dirty DetailsSo then, these are what am trying to investigate (the variables or the cluster of variables, that are leading to the most CD). Cluster of variables just because the problem might just be that a specific report type is taking a long time to clean because of the errors found within the data they received from the department. So am trying to identify the Xs with the most significant impact on the CD.

2. the special force team are the one that cleanses the data to prepare reports. Therefore they get data from the department depending on the report being prepared

4.Aggregate = 48, duplicate = 16, Categories = 4, details = 7

5. Looking at the data, its due to the fact that production volume report is the most frequent report being produced. Data for this report is received from the production department. Out of this data, 18 were clean while 50 were dirty (with a mix of aggregate and duplicate error types). Aggregate = 34 but highest time it took to clean is 6.8 days and lowest 4.9 which explains the wide variation. Whereas for duplicate = 16 with the highest freq as 8.43 days and lowest as 7.1 days0May 28, 2019 at 11:22 am #239447

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If we couple what you said in your last post with an assessment of the graphs you have generated I think you might have the answer to your question.

As I said previously, the normal probability plots of cycle time days by department is a very important graph. What it shows is that the cycle time data for 4 out of the 5 departments is distinctly bimodal. When you examine the bimodality what you find for Finance, Sales, and Product Plan is a clear break in the distributions of reports with low cycle times and the distribution of the reports with high cycle times. The break occurs around 4-5 days. The Prod Dept, probably by virtue of the large report volume relative to the other 3, does have an obvious bimodal distribution with a break around 5 days. One could argue that the Prod Dept plot is tri-modal with a distribution from 1-3 days, a second from 3-6 days and a third with 6 days or more, however, given the data we have I wouldn’t have a problem with calling the Prod Dept data bimodal with a break between low and high cycle time reports at around 5.

The Exploration Department is unique not only because its report cycle time data is not bimodal but also because it appears that only one report has a cycle time of greater than 5 days.

What the normal probability plots are telling you is whatever is driving long report cycle times is independent of department. This is very important because it says the problem is NOT department specific, rather it is some aspect of the report generating process across or external to the departments.

Now, from your last post you said, “Data for this report is received from the production department. Out of this data, 18 were clean while 50 were dirty…” If I bean count the data points on the normal probability plot for Prod Dept for the times of 5 days or less I get 18. This could be just dumb luck but it bears investigation.

For each department – find the reports connected with dirty data and the reports connected with clean data – data error type does not matter. If the presence of dirty data is predominately associated with reports in the high cycle time distribution (as it certainly seems to be with the Prod Dept data) and if, it turns out that the Exploration Dept data has no connection (or, perhaps a single connection with the 6 day cycle time report) then I think you will have identified a major cause of increased report cycle time.

If this turns out to be true and someone still wants a hypothesis test and a p-value the check would be to take all of the reports for all of the departments, class them as either arising from clean or dirty data, and then class them as either low or high cycle times depending on which of the two distributions with each of the departments they belong (which is which will depend on the choice of cut point between the two distributions – I would just make it 5 days for everyone – report cycle time <=5 = low and >5 = high). This will give you a 2×2 table with cycle time (low/high) on one side of the table and data dirty/clean on the other. If the driver of the longer cycle times for reports is actually the difference between clean and dirty data then you should see a significant association.

1June 7, 2019 at 1:20 am #239652Hi Robert,

Thank you so much for this detailed feedback and taking the time to respond to my inquiry, really appreciate it.

Sorry for the late response, been off doing exams.

This approach is definitely clear-cut.

I would apply the suggestions above and revert on outcome.0June 7, 2019 at 5:52 pm #239658

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You’re welcome. I hope some of this will help you solve your problem.

0 - Box-Plot_Histogram_Multi-Vari-Chart-DR-and-CD.docx
- AuthorPosts

You must be logged in to reply to this topic.