iSixSigma

Outliers

Viewing 17 posts - 1 through 17 (of 17 total)
  • Author
    Posts
  • #44535

    Rodrigo
    Member

    hi all
    Mintabs Boxplot shows 1 outlier. this outlier is “messing up” my data with regards to normality and other tests I want to perform.
    Question: Is it ok to remove this extreme value (outlier) from the population? If so, how do you “tell” minitab to ignore it in further calculations/tests?
    thanks in advance

    0
    #142834

    Sayyed
    Participant

    Rodrigo,
    An outlier on the boxplot is a trigger for further investigation. More often than not data would have an outlier due to some special cause. If you are able to identify the special cause then it will be okay to remove the outlier for better further analysis.
    To your second question, I believe you just need to delete the row containing the outlier from Minitab data coloumn.
    Hope it helps. However wait to get more responses (esp. from the more knowledgeable folks) before you act.
    Thanks, – Neal.

    0
    #142839

    Rodrigo
    Member

    Cheers Neal
    unfortunatelly i am unable to find the root cause (it is the 1st time this process is plotted on a graph and this outlier happened ages ago)
    The thing is, in my company, you only look at spec limits. If the result(s) is in you feel lucky, if its out no propper investigation is carried out to determine root cause. (this drives me nuts!!)
    can you imagine the fire-fighting that goes around here?

    0
    #142840

    Anonymous
    Guest

    Rodrigo,
    There a two types of ‘outliers.’
    The first is unrepresentative of the investigation and can be due to a number of unrelated ‘events’ or siuations, such as hoidays when there is no output, zeroes due to faulty devices, or some fixed value due to default values in software. All of these can lead to misleading conclusions because outliers tend to have a lot of ‘leverage.’
    The other type of outlier is relevant to the study because they are psrt of the problem, such as a program to reduce defects but the cause of the defect is not related to parametric value of X. In other words, if you’re studying yield as a function of a film thickness, but the measuring device can’t measure through a ‘comet’ defect in the film and you obtain a ‘crazy value. The real question is how many comets accound for a given yield loss and how much is caused by film thickness being too thick or thin.
    Hope this helps!
    Andy

    0
    #142845

    Robert Butler
    Participant

      I would urge caution with respect to your view and understanding of “outliers” particularly with respect to a box plot.  In a box plot, if you remove the “outlier” and re-run the plot you will very likely find another point has volunteered for the position and you will be right back where you started.  If you want to visually check to see if a point (or points) is “messing up” your assumption of normality you should plot the data on normal probability paper. 
      If the point appears to be “far” from the line I suppose you could choose to just blindly eliminate it but if you don’t have any other reason for removing it I would leave it alone and press on with the analysis.
      When running the analysis I’d recommend doing it twice – once with the offending point included and once with it excluded.  If there are no significant differences between the two efforts then you are wringing your forehead for no reason.  If there is a significant difference (and here you need to think about significance in both the statistical and physical sense) then the data point in question and the circumstances surrounding its generation bear further investigation.
      If further investigation is not possible make a note of the point and the significance of its impact when writing your report. Specifically, note how the inclusion/exclusion of the point(s) changes the conclusions which can be drawn from the analysis. Its been my experience that an analysis which results in significantly different conclusions because of the inclusion or exclusion of a small group of data points causes management to develop an intense interest in the process and the resolution of the conflicting conclusions.

    0
    #142847

    Deep
    Participant

    Man!!!!!!!!!! Robert you are the star of isixsigma………….

    0
    #142848

    Sayyed
    Participant

    I agree with Deep…
    You stand alone…Mr. Butler.
    Respectfully,
    Neal.
     

    0
    #142857

    Robert Butler
    Participant

      Deep and Neal, thanks for the kind word.  I caught up on prior posts this noon hour so thanks also to Eric and Hans.
      Hans, as far as I can remember I haven’t sent anything to Quality Digest but I did have a letter to the editor in Quality magazine in 2003 concerning the role of statisticians.  Would that be the letter you were referring to?

    0
    #142860

    Hans
    Participant

    Robert,
    Yes, and it was extremely well thought through and written.
    Just to close the loop on an issue we discussed a while back ago about a data set where the median test and the Kruskall Wallis test showed different results.
    The data was truly ordinal by interval, so an ANOVA could have been run. The total data points also were 30 per groups, making the total sample size 120. So an ordinal by ordinal measure would definitely not work. Also, when running a run chart on the first group you could see that there were two sets of data points (the histogram confirmed that because it was bi-modal).
    It really shows how careful one has to be with even the simplest hypothesis tests. So back to the point you made in your letter, statistics can be a very rewarding, but also a very dangerous tool.
    Warm regards, Hans

    0
    #142863

    Anonymous
    Guest

    Robert,
    Once again I don’t know if you’ve directed your post at my comments.  All I can tell you is in my own experience failure to remove the outliers I described would lead to the wrong conclusion.
    Regards,
    Andy
     
     

    0
    #142866

    Robert Butler
    Participant

    No Andy, I wasn’t addressing your post. I was addressing the general question posed by Rodrigo.  My understanding of his post was that he was identifying outliers using boxplots – and that is not the way it should be done.
    My second point to his post was that, even if you have an outlier as determined by a visual examination of a plot of the data on probability paper, or as identified by any one of the myriad of outlier detection algorithms, you should run the analysis with and without just to see how much of an impact it has.
     If I have an assignable cause I usually choose to report the analysis without the outliers and, when presenting the results, highlight the fact that I did so.  However, if it is only a suspicion or if I don’t have a good reason for eliminating the outlier then, in those situations where the presence or absence of the outlier is significantly changing my results, I always present the results of the analysis with and without the outliers present.
    I’ve found reporting in this fashion gives everyone a better understanding of what is going on in the process and usually results in a plan of action that includes further investigation of the process to find and identify the causes of the offending points.
     

    0
    #142870

    Anonymous
    Guest

    Thanks for the clarification Robert. One of the reasons I wanted to clarify your position is because I too respect your contributions and hold you in high regard.
    Cheers,
    Andy

    0
    #163925

    Kannan
    Participant

    Hi All
    I am very much impressed with the discussion. We are in confusion on the following.
    How much % of Outliers can be allowed for Removal from a data set?
    Regards
    Kannan
     

    0
    #163930

    indresh
    Participant

    as the name suggest outliers are far and few. as suggested by the experts to remove even a single outlier you need to investigate as to how much effect it does have on the remaining data. i cant sight a rule of thum, it depends how large is your data and how many outliers you may safely remove post investigating causes (special causes only). a cause from process if leads to outlier removing it might lead to wrong analysis and missing out some vital information

    0
    #163931

    Dr. Scott
    Participant

    Rodrigo,
    Having read each of the responses to your original post, I have the following to offer:

    If you have the time each data point was collected (that which was included in your box plot), then use a control chart to identify special causes (if the data are fairly normal then an I,MR with suffice).
    Remove the special cause from the data set, then do the control chart once more.
    As you said before, another might take its place, remove it too. Do this no more than three times (my rule of thumb, no theory or strong statistic reasoning to this idea).
    Then conduct one final control chart.
    Then conduct a capability analysis on the data that are in control.
    Then conduct a capability analysis on all of the original data.
    The difference between the two capability analyses is the “cost” of special causes.
    Then attempt to identify the reason for the special cause(s). 
    It is very important to illustrate the difference between stability (control charts) and capability (capability analysis), and to know how the two are related, and what each are costing you.
    This will also help you to deal with the problem of management overreacting to points out of spec instead of out of control. This leads to even more problems (as you seem to already realize). Out of control points are operators’ responsibility. Out of spec (process incapability) is management’s responsibility (I can explain this further later if you wish).
    Feel free to send me your data ([email protected]) and I will illustrate what I noted above (perhaps not so well) and send my analysis back to you with explanations. Maybe this will help you understand my points.
    Good Luck,
    Dr. Scott

    0
    #163932

    Rodrigo
    Member

    Thank you Dr. Scott for your input.
     

    0
    #163953

    Six Sigma guy
    Member

    Since the concept of outlier has been very well answered by others i will try to answer the second part of ur question.
    Asking minitab to ignore outlier without removing them is easy. Just choose brush- in editor menu and select the outlier. Again click edit last dialog and go to data options – select – specify which rows to exclude- then down select brushed rows.
    Pst: Outlier can occur due to incorrect capturing of data as well especially in manual measurement system.
    Regards

    0
Viewing 17 posts - 1 through 17 (of 17 total)

The forum ‘General’ is closed to new topics and replies.