iSixSigma

Bimodal Distribution – Separate Analysis?

Six Sigma – iSixSigma Forums Operations Manufacturing Bimodal Distribution – Separate Analysis?

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #246940

    KCM
    Participant

    A heat-shrink process at work outputs a trimodal distribution for tensile strength (or a greater number of groups) when parameters such as temperature and operator vary in a single batch, but because the LSL (one-sided specification) is well below the values we see we are meeting quality requirements.

    We are now repeating TMV for that test, and due to its destructive nature, we must use ANOVA to determine %Tol. We also calculate STD ratio (max/min). 4 operators are required.

    For TMV we limited the build process ranges – one temp, one operator etc and we have a distinctly bimodal distribution (19 data points between 0.850 and .894 and 21 data points between 1.135 and 1.1.163) LSL is 0.500. Reduction to a unimodal distribution is not worth the expense from a process standpoint, and we wouldnt know how to do so, since it may be incoming materials causing this distribution (All are validated, verified, suppliers audited etc. Huge headache, huge expense to make changes.)

    Is it appropriate to separate the two distinct groups for ANOVA and STD ratio analysis (in this TMV)?

    How would I go about calculating process capability for a bimodal distribution in the future?

    May be obvious, but data is not normal. Each of the 2 groups are normal (A-D).

    Thanks in advance

    0
    #246952

    Robert Butler
    Participant

    After a Google search I’m guessing that TMV stands for Test Method Validation.  If this is the case then I think you need to back up and re-think you situation.

    What follows is a very long winded discourse concerning your problem so, first things first. The short answer to your question is – no – running a TMV and pretending the bi-modal nature of your results does not matter is a fantastic way to go wrong with great assurance.

    You said you have what appears to be a multimodal distribution of your output but everything meets customer spec.  Based on this and on some other things you said it sounds like the main question is not one of off-spec but just one of wondering why you get multimodal results.

    1. You said,”We are now repeating TMV for that test, and due to its destructive nature, we must use ANOVA to determine %Tol. We also calculate STD ratio (max/min). 4 operators are required.”

    2. You also said,”For TMV we limited the build process ranges – one temp, one operator etc and we have a distinctly bimodal distribution (19 data points between 0.850 and .894 and 21 data points between 1.135 and 1.1.163) LSL is 0.500. Reduction to a unimodal distribution is not worth the expense from a process standpoint, and we wouldnt know how to do so, since it may be incoming materials causing this distribution (All are validated, verified, suppliers audited etc. Huge headache, huge expense to make changes.)”

    The whole point of test method validation is an assessment of such things as accuracy, precision, reproducibility, sensitivity, specificity, etc.  Since your product is all over the map and since your (second?) attempt at TMV gave results that were also all over the map for reasons unknown the data resulting from your attempt at TMV, as noted in #2 above, are of no value.  The data from #2 is not accurate and you cannot use that data to make any statements about test method precision.

    I would go back to #2 and do it again and I would check the following:

    1. Did I really have one temperature?

    2. Was my operator really skilled and did he/she actually follow the test method protocol.

    3. What about my “etc.” were all of those things really under control or if I couldn’t control them did I set up a study to randomize across those elements I thought might impact my results (shift change, in house temperature change, running on different lines, etc.)?

    4. It’s nice to know the suppliers of incoming raw material are “validated, verified, suppliers audited etc.” but that really isn’t the issue. The two main questions are:

    1) What does the lot-to-lot variation of all of those suppliers look like (both within a given supplier and, if two or more suppliers are selling you the “same” thing, across suppliers)?

    2) When you ran the TMV in #2 did you make sure all of the ingredients for the process came from the same lot of material and from the same supplier?

    I’m sure you have a situation where not all ingredients come from a single supplier but the question is this – for the TMV in #2 did you lock down the supplies for the various ingredients so that only one supplier and one specific lot from each of those suppliers was used in the study in #2?

    The reason for asking this is because I’ve seen far too many cases where the suppliers had jumped through all of the hoops but when it came down to looking at the process the “same” material from two different suppliers or even the “same” material from a single supplier was not, in fact, the “same”.  The end result for this lack of “sameness” is often the exact situation you describe – multimodal distributions of final product properties.

    A couple of questions/observations concerning point #1:

    1. I don’t see why you think you need ANOVA to analyze the data – nothing in your description would warrant limiting yourself to just this one method of analysis.

    2. You stated you needed 4 operators yet in #2 you said you were using one operator. As written both #1 and #2 are discussing TMV so why the difference in number of operators?

     

    0
    #246988

    KCM
    Participant

    @rbutler

    In regards to your questions –

    The process and test method are two different things:

    • The process of making the test article was performed by a single, qualified operator, using one calibrated piece of equipment set to one temperature and air flow rate. The process monitoring routinely shows distinct categories – goodness knows why multiple operators are allowed to perform the single operation for one lot, but they are (for normal production). I don’t care about production, to be blunt. Production does not care about the distribution because it has a ‘barn door’ specification. The LSL is .5, and we never see values approaching that. The build which was conducted to generate TMV samples was conducted using uniform incoming materials from one lot each (as is done in normal production, for traceability). If there is variation within those lots, it isn’t enough to raise a flag at incoming inspection.
      TLDR: There was one line, one shift, one operator, one process setting when parts were manufactured.
      The Test Method was conducted by four trained qualified operators – each measured 10 units each. TMV is performed according to a procedure, which I do not have the power to change, since it is compliant to industry specific regulations. So yes, I am limited to that kind of analysis for this TMV, but I can support it with additional statistics. In TMV, I am concerned with ensuring the method is repeatable and reproducible. Accuracy is determined by the test equipment calibration; it is calibrated to .02.
      Regarding the rest of your response –
    • The data was not “all over the map” and I don’t know what you mean by the TMV was “all over the map”.  The data (all measurement values) representing the special lot of the test article has two peaks, each of which is tightly distributed.

    At no time did I suggest I would like to ignore the bimodal distribution. The opposite is true – I would like to ensure that based on this distribution, all analysis is meaningful – no assumptions (like normality) are violated.

    I’m not sure why you believe I cannot make conclusions about method validity – based on your response, its fairly clear that I have more experience with TMV than you, so on what basis you are making that remark, I’m not sure. You’ve gone as far as to assume that you know more about the build conditions and study design than I do as well, which I find fairly rude.

    Supplier controls – 

    “The reason for asking this is because I’ve seen far too many cases where the suppliers had jumped through all of the hoops but when it came down to looking at the process the “same” material from two different suppliers or even the “same” material from a single supplier was not, in fact, the “same”.  The end result for this lack of “sameness” is often the exact situation you describe – multimodal distributions of final product properties.”

    I understand fully that the distribution is likely bimodal due to a part coming in which in some way is more than one ‘kind’. Its possible the supplier is combining batches, or that their incoming material is varied. Any change we make with the supplier requires significant efforts – tens of thousands of dollars, and several weeks of time, in addition to BOM changes throughout the entire assembly drawings and inventory management systems. Considering we do not have high risk of failure, and the risk category the defect falls into does not warrant that expense, it wont happen. Due to the size and shape of the parts, we do not have on-site capabilities to measure them, in order to perform a ‘sort’ – even for a single lot in order to validate the method for tensile strength.

    Each incoming material is made by a single supplier, and no, lots are not mixed.

    General remarks –

    Everything was checked. This is the best I can afford to do – bimodal. So, @rbutler, do you have any thoughts on how to perform statistical analysis on a bimodal population?

    The basis of my thought process, why I am wondering if separate analysis of the two groups is possible, was that if I were to consider the likelihood that someone would be less than 5 feet tall, I would assess the male and female populations separately, since the distribution of ‘height’ including both sexes is bimodal. I was thinking maybe the same strategy could work here, where we don’t know the difference between the two groups, but we also don’t have overlap at all, and can easily determine which is which.

    The question isn’t about identifying root cause of the bimodal distribution.

    0
    #246995

    Robert Butler
    Participant

    From what I found after a quick Google search for the definition of TMV the big issue with that process is exactly what you stated in your second post: TMV focuses on the issue of repeatability and reproducibility.  Since assessment of repeatability and reproducibility are the main issue and since it was what I thought you were asking in your first post your question collapsed to one of determining a measure of the variability that could be used for test method validation. It was/is this question I am addressing and it has nothing to do with skill in using, or knowledge about, TMV.

    The kind of process variability you will need for your TMV will have to reflect only the ordinary variability of the process. Ordinary process variation is bereft of special cause variation.  The variability associated with bi-modal or tri-modal data contains special cause variation.  As a result, should you try to use the variability measure from such data for a TMV your results will be, as I stated, “all over the map.”

    I appreciate your explaining the issue with respect to the testing method – mandatory is mandatory and, I agree, there’s no point in worrying about it. I only questioned the methods since your first post left me with the impression you were just trying various things on your own.

    So, taking your first and second posts together I think this is where you are.

    1.       Your process can be multimodal for a variety of reasons.

    2.       You don’t care about any of this since the spec is off in the west 40 somewhere and no cares about the humps and bumps.

    3.       For whatever reason(s) identifying the sources of special cause variation impacting your test setup is not permitted.

    4.       When you tried to control for some variables you though might drive multi-modality you still wound up with a bi-modal distribution in your experimental data.

    5.       For the controlled study the two groups of product were distinctly different and both had a narrow distribution.

    6.       You want to find some way to ignore/bypass the bi-modal nature of the controlled series of builds and come up with some way to use the data from the controlled build to generate an estimate of ordinary variation you can use for your TMV.

    The easiest way to use your test data to attempt to get some kind of estimate of ordinary variation suitable for a TMV would be to go back to the data, identify which data points went with which mode, assign a dummy variable to the data points for each of the modes (say the number 1 for all of the data points associated with the first hump in the bi-modal distribution and number 2 for all of the data points in the second), and then run a regression of the measured properties against the two levels of the dummy variable.

    Take the residuals from this regression and check them to see if they exhibit approximate normality (fat pencil test).  I’d also recommend plotting them against the predicted values and look at these residual plots just to make sure there isn’t some additional odd behavior.

    If the normal probability plot indicates the residuals are acceptably normal, and if the residual plots don’t show anything odd, then you will take the residuals and compute their associated variability and use this variability as an estimate of the ordinary variability of your process.

    The reason you can do this is because by regressing the data against the dummy variable you have removed the variation associated with the existence of the two peaks and what you will have left, in the form of the residuals, is data that has been correctly adjusted for bi-modality.

    …and now the caveats

    1.       You only have the results from the one controlled build – you are assuming the result of a second build will not only remain bi-modal but that the spread around the residuals from the analysis of data from the new build will not be significantly different from the first series.

    2.       All you have done with the dummy variable regression is back out the bi-modal aspect – you have no idea if the residuals are hiding other sources of special cause variation, you have no idea how those unknown special causes may have impacted the spread of the two modes, and that means you have no guarantees of what you might see the next time you try to repeat your controlled analysis.

    3.       If there is nothing else you can do then, as a simple matter of protecting yourself, I would recommend you insist on running a series of the same controlled build experiment over a period of time (say at three- month intervals) for at least a year and see what you see. My guess is, even if you manage to somehow only have two modes each time you run the experiment, you are going to see some big changes in the residual variability from test-to-test.

    4.       If you should ever have to face a quality audit armed with only the results based on the above and the auditor is an industrial statistician you will have some explaining to do.

    0
Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.