Elusive Root Causes

Six Sigma – iSixSigma Forums Old Forums General Elusive Root Causes

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
  • #32286

    Arthur J. Faske

    I am a GB working on a project to minimize rework in an analytical lab.
    Through the use of pareto analysis I’ve identified the tests which are causing the most reruns. My team has been able to substantially reduce the errors for several of the tests. For one of the tests, I have been unable to definitively identify the root cause(s) of failure. The cause is intermittent, and when it does occur, it occurs with catastrophic results (low results by approx. >20%, relative). (“Catastrophic” in analytical parlance only means the analysis is an utter failure, not that something blows up.)
    Are there tools available to help identify elusive, intermittent root causes?


    Sigma Singh

    My take on this is that there is some assignable cause that is intemittently troubling the process. you may adopt following approach to identify Mr assignable cause-
    1. Identify probable causes leading to catastrophe (Test not analysed properly) in a team exercise. These causes can be arranged in the form of a fishbone (with main bone consisting of man, machine, method……). list as many “actionable causes” as you want.
    2. Now start monitoring test results using control charts (select appropriate type suiting your conditions). You have casted your net now watch for “out of control conditions” – as & when they occur go back to fishbone and check out how process operated differently today with respect to each of actionable cause(sub-bones). You shouldbe able to zoom-in on root cause(s) .
    good luck


    Marc Richardson

    You may be experiencing the effects of interaction between two or more variables. It can be very difficult to isolate and identify them since they both (or all) have to be active in the process at the same time. About the only thing that I know of that will work, aside from dumb luck, is Design of Experiments. But first we must have a stable process and a minimum of measurement error or we may confuse non-random variation for DOE signals.
    Let us know what you learn.
    Marc Richardson
    Sr. Q.A. Eng,



    1) “The cause is intermittent, and when it does occur, it occurs with catastrophic results” (Arthur) That’s a special cause of variation.
    2) THEN: The process is not stable.
    3) “You may be experiencing the effects of interaction between two or more variables” (Mark)
    4) “About the only thing that I know of that will work, aside from dumb luck, is Design of Experiments” (Mark)
    5) “first we must have a stable process […] or we may confuse non-random variation for DOE signals” (Mark)
    Shortly: The process is unstable. If it’s an interaction the only thing that will work is DOE, for which first we need a stable process.
    Wait a minute: If DOE will not work…, and it was the only thing that could have worked…, then… Oh my God, nothing will work!
    Sorry for the sarcasm. It was just a bit of humor. But trying to be constructive, could you clarify, Mark?
    Note: A common cacuse of variation can not be sometimes active and sometimes not active.


    Arthur J. Faske

    The method in question is the determination of nitrogen using the Kjeldahl procedure. In this method, a sample is digested with hot sulfuric acid to convert (almost) all nitrogen to ammonium sulfate. The digestate is then made alkaline (to convert ammonium ions to ammonia) and steam-distilled. The ammonia-containing steam is condensed and collected. And this distillate solution is then finally titrated with standard acid solution.
    From this description, you can see that we have three major sub-processes–digestion, distillation, and titration. Any one, or combination, of the three can cause the problem. The problem, when it occurs, always results in very low numbers. Cause and effect diagramming and brainstorming were done with this result in mind–what can cause low results? Several possible causes are immediately obvious:
    1.  The sample digestion is incomplete.
    2.  Sample is lost in the digestion process via some mechanism, such as spattering or volatilization.
    3.  The distillation efficiency is low: insufficient alkali added such that not all ammonium ion (non-volatile) is converted to ammonia (volatile) or distillation not allowed to proceed long enough.
    4.  Poor trapping efficiency of the ammonia.
    5.  Transfer losses between any of the steps.
    The problem is in isolating at which step or steps is the sample loss occurring. We’ve tried increasing the digestion time, increasing the distillation time, and increasing the amount of alkali added (plus other things, as well). The problem still occurs occasionally. Here is a recent history of the number of “good” control-sample results between failures: 7, 7, 6, 20, 12, 2, 3, 7, and 14.
    I have doubts whether DOE will help here. Unless the problem occurs during the DOE experimentation, nothing is gained. (Well… process information is gained, and that is always good, but not necessarily information that solves the problem.)



    Sorry, Arthur. I can’t help you with that.
    Just being curious, didn’t you consider “The test result is Ok. The nitrogen content is actually too low” as a potential cause for the low values?
    Sometimes, when we don’t like the news, we tend to blame the newspaper. The newspaper can be biased, but also the news can actually be far from what we expected.
    I remember once we had a problem with the diameter of a part. A few parts were found far out of tolerance. Initial investigations showed that, in those parts, another feature was deviated from the nominal value. The measuring principle supposed that the part was measured exactly at certain point. The production and metrology people blamed the measurement system because that second deviation would cause the part to be measured not exactly at the point where it was supposed to. That was true, but a theorethical analysis showed that the measurement error in the diameter due to variation in the second feature was negligible. Further investigation also revealed that the “good” parts also had that second feature deviated. And a final cross check with another measuring principle showed that the cause for the measurement system to give a diameter far out of tolerance was simply that the diameter was actually far out of tolerance. Then the investigation moved to why the parts were out of tolerance, but that’s another story…


    Karel Burgoyne MBB

    Agree you need to get some stability into the method,and one thing that might help? is to use a nitrogen 15 stable isotope tracer and of course a mass spec you might have lying around.
    Point being you can take all the recovered solutions and do something of a mass balance.
    off the wall perhaps, what I didn’t see is what is the material you are analysing as there may already be validated methods out there.



    I function on the motto: “you must make bad in order to understand what makes bad”.  Thus a series of DOE’s will help you if you are focusing on what leads to failure not success.  Choose levels of the variables to screen out some causes (screening DOE).  Look at noise factors beyond the processes.  If you have 3 steps, DOE each as well as a system.  It may take time and money but if you are looking to truly understand the process and effects, it should be worth it.
    Because it is a special cause (or so it seems being unexpected and sporadic), you are looking to first protect your process from the set of circumstances (variables and noise) that cause the failure and then secondly begin to make a robust process for the testing.
    Protection= If the process fails everytime the wind blows, build a shelter.
    Robust= Design a process that does not fail when the wind blows.
    My two cents.


    Arthur J. Faske

    Yes, we considered that the analysis results were correct and the nitrogen level in the sample was indeed low. However, the evidence does not support that conclusion:
    1. Upon rerunning the samples for which the results were suspect, the results for most (not all) of the reruns were much higher than the original values and were in line with expected values.
    2. The same intermittent phenomenon occurs for our control sample. We know, both from historical data and from another independent method, what the value for the control sample should be.
    I’m not sure I follow regarding using DOE to produce failure. I can certainly establish experimental conditions which guarantee low results, but I don’t need to perform experiments to verify. If I under-digest the sample, for example, I know without experimenting that I will get low results. Or if I intentionally break a joint in the distillation apparatus so that steam escapes, I know I will get low results again.


    Marc Richardson

    As Deming would say, an understanding of the nature of variation is necessary if one is going to improve the process. In the beginning, one must identify and eliminate the causes of non-random variation in the process. Sometimes the causes of non-random variation produce a single, transient signal; a point beyond the +/- 3 sigma control limits. This could be caused by a change in a single factor, such as a change in raw material lot. Sometimes, however, the signal is caused by the interaction of two factors, such as a change in material lot and a change in processing temperature. Only when both factors occur simultaneously is the resultant signal seen the control chart.
    Another sources of process variation is the measurement system. Lack of repeatability can increase the within sub-group or random variation that we perceive to be in the process. Lack of reproducibility can manifest as non-random variation in the process. Take for example the case of two operators, ”A” and ”B”. ”A” measures a part diameter as 10.14mm, which is an unbiased, accurate measurement. ”B’s” measurement, however, is biased and obtains a result of 10.25mm when the process average has not shifted. The control chart will show a shift in the process average when none has occurred. For this reason, measurement system analysis must be completed prior to beginning to control chart the process and the contribution of the measurement system to the process variation minimized.
    Once the sources of non-random variation have been identified and eliminated, what is left is random variation and a stable process. If the process output is still not satisfactory, further steps must be taken, typically, to design an execute an experiment. One identifies the factors one believes effects the process output characteristic of interest and then deliberately varies them to the extent that they produce signals in the process, in other words non-random variation. Once the effect of the process x’s upon the product y of interest has been demonstrated, engineering solutions can be developed to reduce the variation in those process x’s.
    Finally, continuing to control chart the process once the improvements have been made will assist in the determination of the long-term effectiveness of the changes made.
    I hope this clarifies things.
    Marc Richardson
    Sr. Q.A. Eng.


    John H.

    Just a thought on reaction interference factors-
    Is there any possibility of either reagent/sample contamination by trace amounts of transition metals(ex: Cr, Fe, Mn)? Lab procedures and/or manufacturing equipment could be an intermittent source for this contamination.
    -John H.



    Marc, I agree with about everything you said about understandig variation, the DOE, the interactions, etc.
    But wat I wanted clarified was how is Arthur supposed to use DOE in this case, where he has an unknown special cause of variation, that is what he wants to find out, but because it is unknown he can not stabilize the process before the DOE, which hopefully will help him to discover that cause, but he can not run it becuse he didn’t find the cause so the process is not stable to run the DOE which…. Do you see the loop here?
    You propossed a DOE. How to break that loop? If he finds the way to ind the special cause before the DOE to stabilize the process to run the DOE, then he doesn’t need the DOE anymore because the problem is solved.
    Arthur “charted” the process (at least he has a history of results), he found the signal (an average of 1 “crazy” value every about 7 tests), but he could not find the cause for that signal.

Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.