working with Non normal data

Six Sigma – iSixSigma Forums Old Forums General working with Non normal data

Viewing 15 posts - 1 through 15 (of 15 total)
• Author
Posts
• #47883

Magritzer
Participant

Hi,
I am working on a project to reduce the # of trouble tickets in a specific category. I have a sample of trouble tickets (TTs) per day for a 30 day period. When I plotted my histogram the data turned out to be non normal. I transformed the data using a box cox tranformation and the data still turned out to be non normal, p value<0.05. I think this is being caused by an outlier, but I am not sure. I'm not really sure how to analyze the data now. Can anyone offer any suggestions?
Also, what type of control chart would I use?
Thanks

0
#160180

mcleod
Member

P chart for defectives, U chart for defects.
Should the data be non-normal? If so, there are a couple of options:
1) try to fit a curve using minitab
2) use non-parametrics to do all of your testing
If there is reason to believe that the data should be normally distributed, then you need to understand why your sample isn’t so.

0
#160313

hitesh chopra
Participant

hi,
very basic question i have is how you are able to plot Histogram for Attribute data & hence check Normality.
Atrribute data does not follow normal distribution curve, they will either follow Poission or Binomial curve.

0
#160316

Anand
Participant

As your data is ‘Number of tcikets’, it is a Discrete data and definitely it will turn out to be Non Normal. Treat it like discrete data and use tools available for the same. You dont need to check Normality in this case.

0
#160317

Prasoon
Participant

How can we say that No. of tickets is a discrete data ? No. of tickets will be in integer …..
I think its a variable data and should not be treated as a discrete.
Try to remove the outliers from the data and then check for normality. try to find out the special causes for these outliers.
I hope it will work in your case.
Thanks,
Prasoon

0
#160319

Six Sigma guy
Member

Why are u checking for normality? I assume trouble tickets to be like defective tickets?

0
#160321

Arturo Ruiz Falcó
Participant

Lou:
First of all, what kind of data are in the tickets? I assume you are analyzing continuous data (e.g. length values).
Second, if you are picking the trouble tickets you are not taking a random sample of the process. It is likely that you are picking the tails of the parent population. Even in the case that the parent population is normally distributed, you can not expect that such sample shows normality.
Third, why do you need to transform this data to be normally distributed? I will be justified if you want to apply an statistical test or chart which requires normallity. Is this the case?  I assume that your real objective is to discover why parts are out of spec and therefore trouble tickets are rised. Then I would suggest to plot scatter Final quality variable vs process parameters (e.g CTQ vs CTP) and then look for a pattern of the trouble tickets.
I hope it helps. Good luck,
Arturo

0
#160324

Anand
Participant

Hi Prasoon
What is being measured over here is the number of tickets raised and that is a Count of the tickets. Count by any means is Discrete. To expalin further – Can there be any day when 10.5 tickets were raised, it will be either 10 or 11, hence it is a Discrete data.
Anand

0
#160336

Matthieu
Participant

Lou,First, do you trust your data ?Have you done an MSA on the data you deal with ?Have you taken into account the “Saturday” and “Sunday” ?Do you have benchmark to compare your baseline against ?If you suspect an outlier being the cause of the non-normal distribution, find it, understand it and isolate (remove) it from your dataset and re-run your chart.This will help you in obtaining a normal distribution.This will not give you any indication of why these tickets were created or how to prevent them.
Prasoon, Anand,The “Discrete Data” definition below can be found at:https://www.isixsigma.com/ dictionary/ Discrete_Data-226.htm

0
#160368

Six Sigma BBIT Linda
Member

Hi Lou:
I am also working on a project to reduce ticket count.  My mentor has taught me to do the following:
1.  Collect data
2.  Run graphical summary in mini tab to determine if data is normal
3.  If data is normal create a control chart (in this case you would use P chart since your data is discrete)
4.  If your data is not normal you will need to investigate as the previous messages have stated – if you have an outlier that is causing this and you determine the reason you could eliminate and rerun.  When you run the graphical summary in minitab you will be able to identify if there are outliers, the out put from the graphical summary will give you a histogram, and a boxplot you can identify the outliers on the box plot.  Use the paintbrush to highlight the outliers and minitab will provide you with the row/location of the outlier on your spreadsheet.
5.  Once your data is normal run the P chart, and if the process is stable you can run process capability using Binomial distribution
Hope this helps—

0
#160371

Robert Butler
Participant

Let’s back up a bit here.
You said, “I am working on a project to reduce the # of trouble tickets in a specific category. I have a sample of trouble tickets (TTs) per day for a 30 day period.”
What we need are more details.
1. Give us a definition of a trouble ticket – better yet give us an example.  Is this some kind of pass/fail or is it some kind of check list or?
2. Does this specific category have only a single thing that will result in the issuing of a trouble ticket or does it have multiple things that could result in a trouble ticket?
3. If there are multiple things within this specific category that could generate a trouble ticket – where’s your bean count by thing (the pareto chart)?  What does it look like?
4. Since you are working to reduce the number of trouble tickets one would assume you would want to look at frequency of trouble tickets against things like time of day, day of the week, shift changes over time, production lines across time, raw material changes over time, etc. In short – what does this data look like when plotted against things that might show trending or clustering which in turn might suggest possible cause and effect?
Histograms are interesting but, based on what you have posted so far, I don’t see why you would even care about them or any of their characteristics at this point.

0
#160372

Taps
Member

Linda,
Thanks for your response. When I ran the graphical summary, the data were not normal. I tried transforming the data (my thought process being that I need normal data in order to do statistical analysis), but the data were still not normal. As folks have already posted, since I am working with attribute data, I should not have expected my data to be normal.
I guess my question then is, if I have attribute data (that does not fit a normal distribution) how can I analyze the data? I wouldn’t be able to construct a control chart to monitor the process, correct?
Thanks again for your response. I’d be curious to know how your project turns out.

0
#160374

Taps
Member

Robert,
Thanks for your feedback. Being new to SS I guess I was hell bent on having normal data to work with, not realizing that there will be times that I will have non normal data. To answer some of your questions:
1. Give us a definition of a trouble ticket – better yet give us an example.  Is this some kind of pass/fail or is it some kind of check list or?
In this case the, category of trouble tickets are for MS Outlook. Some examples: I can’t access my email, I am missing emails, etc…
2. Does this specific category have only a single thing that will result in the issuing of a trouble ticket or does it have multiple things that could result in a trouble ticket?
There could be a number of things wrong with Outlook that would cause a customer to open a TT.
3. If there are multiple things within this specific category that could generate a trouble ticket – where’s your bean count by thing (the pareto chart)?  What does it look like?
When I created the pareto chart, it clearly highlighted the major problem areas causing TTs to be opened. In the case there were about 5 areas that made up the 80%. We’re beginning to focus on identifying probable causes for each of those areas.
4. Since you are working to reduce the number of trouble tickets one would assume you would want to look at frequency of trouble tickets against things like time of day, day of the week, shift changes over time, production lines across time, raw material changes over time, etc. In short – what does this data look like when plotted against things that might show trending or clustering which in turn might suggest possible cause and effect?
That’s a great question. When I collected my data I was simply concentrating on how many TTs had been opened in a 30 day period. I had a break down of TTs per day for that time frame, but didn’t take into consideration factors such as time of day.
I really appreciate your comments. I am beginning to realize I was probably making this much more difficult than I should have.

0
#160411

Robert Butler
Participant

Based on your reply I’d say at this stage of the effort there is no need to even think about distributions.  I’d recommend going back to the main areas identified by the pareto and start tearing that data apart.  Some possible areas of investigation could be:
1. Is there a connection between a type and frequency of a particular trouble ticket and things like – level of worker training, level of worker experience, location of worker on a network, location of a worker on a server, etc.?
2. Is there any connection between type of ticket and worker occupation? There is a good chance that some occupations will use various aspects of Outlook more than others.
3. How uniform are the basic programs? (i.e. are some sections still running Windows ME whereas other sections are running Windows XP?). If this kind of split exists is there any connection between this split and the frequency of occurrence of a particular trouble ticket?
..and so on.

0
#160434

annon
Participant

A good data collection plan and drilldown will take you from the corporate goal or objective that the project supports all the way through to the process, the process output, a specific characterisitic of the output, its operational definition, unit and metric, data type, spec, standard, and DO per unit.
From here, you know precisely what questions need to be answered and thus, what data should be gathered and what tools (generally) you will be using to analyze the data set.
Good luck. And do the work up front in Define and Measure.  Makes things a lot easier.

0
Viewing 15 posts - 1 through 15 (of 15 total)

The forum ‘General’ is closed to new topics and replies.