iSixSigma

Rick Pastor

Forum Replies Created

Forum Replies Created

Viewing 24 posts - 1 through 24 (of 24 total)
  • Author
    Posts
  • #93948

    Rick Pastor
    Member

    I have made the same calculation for all three sets of data:  The results are below:  In JMP, the distributions (all 50 points) for sets 1, 2, and 3 appear different. 
     
    Now, I have a question for you.  What rules do you use to determine if a set of data represents a stable proces?
     
     
    Data Set 1: The data is normally distributed and Cpk » Ppk.
    Distribution analysis: 
    1.  10 subgroup averages: Goodness of fit p-value = 0.8061 (JMP)
    2.      50 individual points:  Goodness of fit p-value = 0.8598 (JMP)
    3.      Summary:  The data is normally distributed.
    Control Chart:
    4.      Subgroup size of 5 and applied Shewart’s eight (8) tests:  The data indicates that the process is in control.  The data passed all eight tests. (JMP)
    Excel Calculations:
    5.      Cpk= 0.644 (Excel)
    6.      Ppk= 0.668 (Excel) (agrees with the value calculated in JMP)
    7.      Summary: Cpk » Ppk
     
    Data Set 2: The data is “not” normally distributed and Cpk > Ppk.
    Distribution analysis: 
    1.  10 subgroup averages: Goodness of fit p-value = 0.0601 (JMP)
    2.      50 individual points:  Goodness of fit p-value = 0.0098 (JMP)
    3.      Summary: The data is not normally distributed
    Control Chart:
    4.      Subgroup size of 5 and applied Shewart’s eight (8) tests:  The data indicates a process that is out of control.  (Two out of three points in a row in Zone A or beyond detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.) (JMP)
    Excel Calculations:
    5. Cpk= 0.673 (Excel)
    6. Ppk= 0.499 (Excel) (agrees with the value calculated in JMP)
    7. Summary: the process in not stable Cpk > Ppk
     
    Data Set 3: The data is not normally distributed and Cpk > Ppk.
    Distribution analysis: 
    1.  10 subgroup averages: Goodness of fit p-value = 0.1890 (JMP)
    2.      50 individual points:  Goodness of fit p-value = 0.0492 (JMP)
    3.      Summary:  The data is normally distributed when viewed as subgroups (the central limit theorem).  On the other hand, the 50 point distributions indicates that the data is not normally distributed.
    Control Chart:
    4.      Subgroup size of 5 and applied Shewart’s eight (8) tests:  The data indicates that the process is not control.  Subgroup 1 and 10 (One point beyond Zone A detects a shift in the mean, an increase in the standard deviation, or a single aberration in the process. For interpreting Test 1, the R chart can be used to rule out increases in variation. ) (JMP)
    Excel Calculations:
    5.      Cpk= 0.711 (Excel)
    6.      Ppk= 0.432 (Excel) (agrees with the value calculated in JMP)
    7.      Summary: Cpk > Ppk
     

    0
    #93757

    Rick Pastor
    Member

    Statman:
    I confess that I am a litte confused by the statement that it is better than adding a 1.5sigma shift to process A.  Nevertheless, I would say that Montgomery and I would have a different point of frustration (I would love to see the exact pages where Montgomery discuss the issue).  When I examine the difference between Cpk and Ppk, I just see that Cpk uses a short cut to calculate sigma.  That short cut requires the data to be normally distributed. 

    0
    #93735

    Rick Pastor
    Member

    You, Gabriel, have provided two excellent sets of data.  The two sets of data show why estimating sigma from R_bar/d2 is so dangerous (d2=2.326 for a subgroup of 5).  Estimating s from R_bar/d2 assumes that the distribution is normal.  The following will show that it is better to use the definition of standard deviation, the formula, to calculate s. 
     
    Before I go through what I have seen in the data, let me state that I used both JMP and excel for my calculations.  Furthermore, both data sets were examined in two ways:  A distribution analysis, and a control chart.  Furthermore, I make the arbitrary assumption that the USL and LSL equal 12.4 and 7.4 respectively.
     
    Data Set 1: The data is normally distributed and Cpk » Ppk.
    Distribution analysis: 
    1.  10 subgroup averages: Goodness of fit p-value = 0.8061 (JMP)
    2.      50 individual points:  Goodness of fit p-value = 0.8598 (JMP)
    3.      Summary:  The data is normally distributed.
    Control Chart:
    4.      Subgroup size of 5 and applied Shewart’s eight (8) tests:  The data indicates that the process is in control.  The data passed all eight tests. (JMP)
    Excel Calculations:
    5.      Cpk= 0.644 (Excel)
    6.      Ppk= 0.668 (Excel) (agrees with the value calculated in JMP)
    7.      Summary: Cpk » Ppk
     
    Data Set 2: The data is “not” normally distributed and Cpk > Ppk.
    Distribution analysis: 
    1.  10 subgroup averages: Goodness of fit p-value = 0.0601 (JMP)
    2.      50 individual points:  Goodness of fit p-value = 0.0098 (JMP)
    3.      Summary: The data is not normally distributed
    Control Chart:
    4.      Subgroup size of 5 and applied Shewart’s eight (8) tests:  The data indicates a process that is out of control.  (Two out of three points in a row in Zone A or beyond detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.) (JMP)
    Excel Calculations:
    5. Cpk= 0.673 (Excel)
    6. Ppk= 0.499 (Excel) (agrees with the value calculated in JMP)
    7. Summary: the process in not stable Cpk > Ppk
     
    When a process in not stable, the data is not normally distributed, the value of Cpk can be larger than Ppk.  What this means is the estimate of s does not pick up the shift in process mean.  In contrast, the calculated value of s captures the variability of process and gives a smaller value for Ppk. 
     
    Gabriel, I stand corrected for failing to reveal the assumptions to my statement.  Cpk and Ppk are roughly equal when the same set of data is used and the data is normally distributed.  In other words, a process that is stable, normally distributed data, gives Cpk and Ppk that are roughly equal.  Therefore, Cpk for a process that is not stable can make the process appear better than it actually is.  That is why reported values of Cpk or Ppk should include the calculation method. 
     
    Because many managers and engineers are barely familiar with Cpk and even less familiar with Ppk, I calculate Ppk while I use the word Cpk in discussions. 
     
     
     

    0
    #93663

    Rick Pastor
    Member

    Kim:
     I am not sure why Montgomery would say that Ppk is a step backward.  I have several of his books but not the one in question.  I would love to know how Montgomery defines Cpk and Ppk and to see the section in question.   
    The fact is, if you calculate Cpk and Ppk using the same set of data, the two values should be very close.  The difference in the two lies in how you calculate the standard deviation.  In what I call the old days, before everyone had computers, the standard deviation was calculated using a short cut — R_bar/d2.  With computers there is no need to use a short cut to calculate the standard deviations.
    Short‑Term vs Long‑Term Capability:  Here is where, I think, the confusion even gets worse. Since the same set of data gives nearly the same value for Cpk and Ppk; we have to ask what does short‑term and long‑term capability really mean. 
    Short‑Term:  Examines the data over a short period of time.  For example, last week’s data or for a high volume process it maybe the data collected for a single day.
    Long‑Term:  Examines the data over a longer period of time, say a month or a year. 
    The longer-term capability takes into account the drift in the mean. In contrast, the short term is a snap shot of the process.  Clearly, a plot of short‑term capability as a function of time shows how the capability changed over time.  The long‑term capability summarizes the change.  It is how you use the data to improve the process that is really important.

    0
    #91761

    Rick Pastor
    Member

    John makes an interesting comment.  I never realized that JMP original platform was on the MAC.  I only used a MAC for a short period of time and that is where I first started using JMP, probably version 2.0 or even 1.0.  I used 3.0 on a PC and I am now using 4.0 on a PC.  I have used PCs since the XT, DOS, was around and I do not feel that the user interface is strange at all.  I would be interested in knowing what you find strange about the interface. 
     You are right when you say any package will need time to train yourself or to get training to use effectively.

    0
    #91707

    Rick Pastor
    Member

    More Clarification! 
     
    All the steps in your process appear to be the times required to complete a process step.  That does not produce a one‑side distribution.  The time needed to complete an event is equivalent to the measurement of a physical dimension.  Both measurements, assuming random error, will end up normally distributed.  With time you have a one‑side specification limit.   You might want a two‑side specification limit when errors are generated when someone is working too fast. 
     To the question of why your data is not normal.  Perhaps you do not have enough data points.  How many data points in the population?  Did you use the same person to take all the data?  Are the subgroup normal, etc., etc? You may find that answering these questions can add insight into how to shorten the process time.  For example, if several people were involved in the study  you need to examine the data for each person.  Perhaps one person need training!

    0
    #91647

    Rick Pastor
    Member

    I use JMP, and I have used it for years. JMP will do everything you are asking for and more.  I have used other programs to a limited extent; so I will comment on JMP only.  I have trained others on using JMP, and this is what I have found.  
    I find the program easy to use.  The people that I have trained have found the package easy to use once they have become familiar with how to use it.  Depending upon the data, I use excel to re-arrange the data into a format that makes it easier to do multiple analysis in JMP.  As an example, I have had a many as 10 output variables to examine.  
    The nice thing about JMP is the graphics.  The graphs make it easy to compare results. Even though you will find the program easy to use, after some effort to train yourself, here is what you may find to be your biggest problem — Understanding the statistics.  
    For example, if you did an experiment at four different temperatures, the graphical output helps to show the differences in the results.  Let suppose that the ANOVA analysis gives a p-value less than 0.05.  Now, you want to do a pair-wise comparison of the data to see which pairs of temperatures are statistically different from each other.  Do you use a student-t, Tukey Kramer etc?  Understanding the statistics is the biggest problem that you will run into on every program that you use.
    The package contains more tools than you will use for the tasks that you outlined.  Nevertheless, they are there if you need them.  I have also found the technical support to be very helpful. 
    Hope this helps if you have not already purchased a package.

    0
    #89633

    Rick Pastor
    Member

    I have been out of the loop on this discussion for roughly a month and changed e-mail address during that time.  Now, I am back and I shall ask two questions. Gabriel, you said, “But with Cpk and Ppk, int’s not just to ways of estimating the same things. You are estimating different things (inherent variation vs total variation), and hence 2 different names are needed.” 
    You claim that Cpk is inherent variation: 

    Question: Does sest=Rbar/d2 estimate the s calculated from the equation that defines s, “the calculated s”?
    Question: What property of the data makes Cpk inherent?
     Assumption: Both Cpk and Ppk are based upon the same subpopulation.

    0
    #88929

    Rick Pastor
    Member

    Matt:
    It has been a while since I have had a chance to follow the discussion.  First, it was pointed out that I made a mistake in the calculation of Cpk using R_bar/d2.  When I went back and checked the results.  I failed to divide by three.  The Cpk and Ppk for a uniform distribution of 100 points was much close in value. 
    Nevertheless, I know of at least two ways that Cpk is being calculated and we do not spell out the assumptions.  Both methods use a fixed subgroup size to perform the calculation.  One is using R_bar/d2 to estimate s.  The second method calculate s using the average of all the subgroup, which is the grand mean of the population, and then uses the standard formula sqrt[sum(Xi-X_bar_bar)2/(n-1)] to calculate s.  The latter of the two methods is called the short‑term sigma.  The short‑term s matches the sigma in control charts. 
    If a Cpk is reported without specifying the calculation method, I see confusion.      

    0
    #88776

    Rick Pastor
    Member

    Matt:
    What I was suggesting is that we use the term Cpk for all the different calculation methods.  We then specify how we did the calculations.  If you want to calculate sest using R_bar/d2, just tell me how you did the calculation.  If you are my supplier, I may ask you to calculate Cpk using a different method that I could then specify. 
    Even if I am making a control chart in excel, I would not calculate s using R_bar/d2.  I would use the standard formula for calculating s (which is still an estimate of the population’s s).  Of course, I would be using the x_bar of the subgroups to calculate s. 
    My original point involved the following question.  What is the probability that a manager picked at random from all the companies in the USA knows the difference between Cpk and Ppk?  My guess is that the probability is well less than 50%.  

    0
    #88775

    Rick Pastor
    Member

    First you have to determine your subgroup size.  The size of the subgroup is something that you determine before you start the calculation.  It seems to be that a subgroup size of 5 is used a lot
     
    For a subgroup size of 5 (5 batches with one data point per batch) gives d2=2.33.  If you want a different subgroup size you need to let me know and I could look up the number in a statistics book. 
     
    Assuming that you have your 100 data points from 100 bathes and the data is in a column with the first data point in A2, you would perform the following:
     

    In cell B2 calculate   =MAX(A2:A6)-MIN(A2:A6). 
    Copy and paste cell B(2) into cell B(7).  Cell (7) is the range for the next subgroup. 
    Continue the copy and paste until you have 20 subgroups. 
    R_bar is the average of the 20 subgroups. 
     
    This may not be the most efficient method of calculating R_bar but it will get you there.  If you have more than one data point per batch, then you need to do things a little differently.
     
    With the 100 batches you may have been running an X-bar and R control charts.  The subgroup size is normally determined when you establish the control charts.  For example, someone may decide to measure 5 random samples per batch.  The sample size is determined by balancing several factors:

    Materials cost of the measurement
    Human resources available to make the measurements
    Time to make the measurement
    Consequences (cost/liability) of making a bad batch
    And there are probably more.
     
    Hope this helps.
     
     
     

    0
    #88690

    Rick Pastor
    Member

    Jeff:
    I use SAS Institutes JMP for my statistical analysis so I do not have a copy of Minitab.  I had someone follow  the path
    Stat->Quality Tools->Capability Analysis->Help->See Also->Calculations
    They gave me a copy of what they found.  The Sigma within is the standard deviation estimate based on within subgroup variation.  I assume that this calculates a sigma using group averages and the grand mean.  JMP calls the Cpk calculated from this sigma the short term Cpk.  I don’t think that either JMP or Minitab uses the old method of R_bar/d2 to estimate sigma.

    0
    #88654

    Rick Pastor
    Member

    If your customer requires Cpk, you have no choice but to do it.  On the other hand, I would still track Ppk.  Ppk uses the calculated s.  When you report Cpk or Ppk you need to report how s is determined.     
     
    Let me just emphasize one point in the calculation of Cpk or Ppk.  Let’s assume that you examine the same set of data and you calculate Cpk and Ppk.  Both require the calculation of X_bar.  For that set of data, X_bar is the grand mean.  Therefore, the value of X_bar is the same for both Cpk and Ppk.  The difference between Cpk and Ppk is how you determine the value of s.
     
    Minitab’s Six Pack gives both Cpk and Ppk.  Cpk is called the within group capability and Ppk is called the overall capability.  I tried to find out how Minitab calculated Cpk, but the index in the help menu did not reveal the defining equation for Cpk. 
    The index in the help menu of SAS Institutes JMP shows how JMP calculates a short-term capability for a fixed group size.  JMP uses the grand mean of all the subgroup, and the average of each subgroup to calculate s.  I “think” that JMP’s short-term Cpk is Minitab’s within group capability. 
    I suppose that the only package that I know of that calculates Cpk using sest=R_bar/d2 is in PQ System’s Chart.  In retrospect, it is important to know the assumption made in reporting any metric.

    0
    #88513

    Rick Pastor
    Member

    Ashwin:
     
    Let me clarify a statement that I made.  I wanted to say, “If Cpk goes down you need to explain why!  If Cpk goes up, should I get a bonus? Ok forget the bonus.” 
     
    I believe that we fundamentally agree.  I do not think that we shall have a flame war.  In addition, I do think very highly of statisticians.  Nevertheless, I still believe that Ppk is a bad concept, and the term Cpk should be reported using the standard deviation of individual measurements. 
     
    The use of the word long-term capability for Ppk is missing leading.  By hiding the fundamental assumption used in the calculation of Cpk and Ppk we confuse people.  We make them think we can see into the future.  I believe that Ppk is the correct way to calculate the weekly snap shot view of a process.  Unless someone improves the process, both Cpk and Ppk will fluctuate around some typical value.  A bar chart of the weekly results over time will reveal that fluctuation.  Get rid of Ppk and use only the term Cpk and tell your people how you did the calculation.
     
    I will ask you the following question.  What real information does the comparison of Cpk and Ppk provide?  Here is an example for you to use
     
    Calculate Cpk Calculated in Excel:

    Generate 100 random numbers between 0 and 1.  Uniform distribution
    X_bar=0.457
    R_bar=0.641
    d2=2.33 for a subgroup of 5
    s_est = R_bar/d2 = 0.275
    Cpk = Min(1.65,1.97)
    Cpk=1.65 (Central Limit theorem makes things look better)
     
    Calculate Ppk Calculated in JMP:

    Using the same 100 random numbers generated above
    X_bar = 0.457
    s = 0.284
    Ppk = Min(0.536,0.63)
    Ppk = 0.536
     
    Note:  both s_est, and s are estimates of the population’s standard deviation.  In other words they are both snap shots of the process over the period of time that the data was collected. 
     
    Engineers in the early 1970s produced control chart limits, GR&Rs, Cpk, etc using estimates of sigma because it could be done quickly.  Using the trick made the calculation more accurate but not the estimate of the population sigma.  On the other hand, I do not believe that any textbook written then suggested that the estimate was the correct way.  It was a trick to get to the need values. 
     
    Computers made the calculation of process capability easy.  I believe that Ppk is the original definition of Cpk.  I think Ppk was introduced to allow people stuck without computers (being polite rather than saying stuck in the past) to continue their use of statistics.  

    0
    #88465

    Rick Pastor
    Member

    Migs:
     
    I do not like the way the meaning of Cpk and Ppk have evolved.  Therefore, I hope I got your attention by saying that “Ppk is junk.” 
     
    The definition of Cpk is that it uses an estimate of sigma, and Ppk uses the calculated sigma.  In this definition of Cpk, the term estimate of sigma means sigma=R/d2 where R is the range and d2 is a value that is automatically calculated by computer programs or you find in a table without the user knowing what it is all about.  The Cpk is called the short-term capability.  In contrast, Ppk uses the calculated sigma and hence is called the long-term sigma. 
     
    This seems all to confusing to me.  As an engineer, I would say that the true statisticians have lost contact with the concept of evolution.  For the old timers that do not use computers, they could subtract two numbers much easier than calculate a standard deviation.  Therefore, control charts with a subgroup of 5 would be X_bar, R chart. 
     
    Think about it.  If you want to produce an X_bar, sigma control chart in the old days with a subgroup size of 5, you had to get out your pencil and paper and slide rule and start to work.  Wow, back in the dark ages making a control chart using X_bar and sigma required lots of work.  In general, calculating sigma in the dark ages was not much fun unless you had no friends and did not eat breakfast or lunch.  
     
    Engineers did have a life.  Since most engineers had some kind of life, they developed short cuts.  Therefore, a table of d2 values, a subtraction and one division allowed them to quickly calculate the value of sigma.  Today we have computers that do sigma calculations in a flash without the need of a short cut.
     
    What does the lack of evolution bring us today? – Two confusing concepts!  I say stop the confusion.  First, Ppk is not a measure of long-term process capability.  If I report a weekly Ppk using the calculated sigma based upon say 100 measurements (subgroup size 1 or individual measurements), is that really a measure of long‑term capability from a manufacturing point-of-view?  No!  It is a snap shot of that process step.
     
    Here is the solution, get rid of Ppk and use Cpk only.  More managers have heard of Cpk and fewer of Ppk.  Just make sure that you let them know that you calculated sigma (subgroup size 1).  To explain how the process behaves as a function of time, a long-term look at the process, I would create a bar chart showing the Cpk on a weekly basis or monthly basis.  If Cpk goes up, you need to explain why?  If Cpk goes down, should I get a bonus? Ok, forget about the bonus. 

    0
    #88378

    Rick Pastor
    Member

    ETAL:
    Statistical indices: Different indices mean different thinks.  Etal said, “…BEFORE yields are impacted? ”  
    Cpk:   Cpk does not provide information before yields are impacted.  The purpose of  this statistical index is to provide information regarding the capability of your process.  For example, if you have a Cpk of 0.7 and you have been running at that value since the process was started, yield is poor and you have a process that is costing your company money.  The higher the value of Cpk the less scrap you are producing.
    Low Cpk Does Not Mean Poor Quality: Just because you have a small Cpk does not mean that your company produces poor quality products.  If you do 100% inspection, you might catch most of the poor quality before shipment to a customer. Unfortunately, you are now spending money on inspection and scrap when that money could be used for new product development. 
    A process with a large Cpk produces less scrap and with a large enough value you might be able to convert from 100% inspection.
    Dave said, “ALL STATISTICAL INDICES, LIKE CPK, ARE A JOKE…” Dave I wish you all the success in the world.  For the rest of us, I suggest that we take statistical indices seriously.  Of course, we must try to use the tools properly.  Statistical indices look into the process and tell us how we are doing.  Since there are statistical fluctuations for many sources in a process, indices change as a function of time, and we must track them to know how the process is doing.

    0
    #88372

    Rick Pastor
    Member

    I avoided comments about the integrity of a company that would fudge their data to make themselves look good when I read Dan’s response.  But Eileen your comment, “…you are already in the crosshairs” was great.  While Bullet Boy was a messenger, and it appears that he will be shot, I wonder if his management will suffer the same punishment. 
    Improve the process is the only way to be successful in my book.  I leave with one more comment.  The goal, objective, purpose of technology is to improve the condition of the human race.  Company profits are critical to achieving that goal.  If the end result is that the human race is marching backward through time for only a few people to reap the rewards, then we are doing something very wrong.  The question to me is the following was Bullet Boy trying to step up in the corporate world without concern for the consequences or was management pressuring him.  A person that robs a bank may steal a few thousand dollars and scare a hand full of people.  A president with support from other top managers that manipulates accounting numbers or facts reaps millions of dollars in their personal accounts.  These people should scare the world, which by the way they have. 

    0
    #88371

    Rick Pastor
    Member

    Katherine:
    Dave’s idea of creating a process map is a good idea, and understanding what the customers wants, QFD, is also a very important tasks.  
    In many situations, understanding the process under investigation is reasonable clear from personal experience.  Unfortunately, to give specific suggestions that could be useful in the short term requires a better understanding of your process. 
    Please explain what BPO and AHT mean. I think that AHT might mean average handling time; but I have no idea what BPO means.  If you want specific suggestions, you need to give more detail.  How many people handle calls, is this a 24hr/7days operation or 8hrs/5day operation.  Need to now something about the capability of your measurement system – e.g. how accurately do you know the time required to handle a call?  With a little more information specific short-term answers might be available. 

    0
    #88364

    Rick Pastor
    Member

    Just because you have a small Cpk does not imply that you deliver nonconforming parts.  If you are doing 100% inspection prior to shipment, you should catch most of the nonconforming parts. 
    What a small Cpk means is that you are not making as much or any money on the parts that you sell.  The question comes down to this.  What is your Cp?  If it is pretty good, then you need to adjust the mean of the process.  If neither is pretty good, you have to worry about reducing the variation in the process, which typically is harder than shifting a mean. 
    Here is my suggestion:  Improve the process and everyone is happy.

    0
    #88333

    Rick Pastor
    Member

    David:
     
    You maybe confusing three points: Cp and Cpk, control charts, and six sigma
     
    Cp and Cpk: Let me give an example regarding Cp and Cpk.  You have a process whose mean is 5.03, standard deviation, s, is 0.15,USL=10, and LSL=5.  From the example below, you can have a great Cp and a poor Cpk.  What this mean is that your process has very little variation and therefore has the potential of being a very good process because the     Cp is 5.6.  On the other hand, the Cpk is 1.11 that means that the process is not centered very well.  Therefore you are probably producing lots of bad parts.  In this example, all you have to do is shift the mean and that is typically easier than reducing variation.
     
    Example:
    Cp is calculated by
     
    Cp=(USL – LSL)/6s where  Cp=5.6
     
    Cpk is calculated from
     
    Cpk = min value of CPL and CPU  = 1.11
     
    Where CPL=(mean-LSL)/3s = 1.11 and CPU=(USL-mean)/3s =10
     
    As you can see, you can have a great Cp but a poor Cpk. 
     
    Even if The Cp and Cpk do not mean that you have a
     
    Control Charts: Control charts are really different.  Even with the above example you could have perfectly good-looking control charts.  Control chart calculate  lower and upper control limits (LCL, UCL) based upon your standard deviations. If the process is stable, you could still be producing bad parts because of the statistical fluctuations in the machining operation.
     Six Sigma: You probably do not have a six sigma process.  Although, the data provided cannot be used to determine that fact.  Six sigma requires knowledge of the number of defects generated and the number of defect opportunities.  See the six sigma calculator at the top of the web page.     

    0
    #88302

    Rick Pastor
    Member

     
    What is too perfect is a good question and I thank the original author for the question.  I did test to see how things might look if I generated a histogram from numbers obtained randomly from a normal distribution; and A Shapiro-Wilk W test to determine normality.  I obtained the following results.
     
    N=   52             p-value = 0.3145
    N=1000            p-value = 0.8300
    N=1999            p-value = 0.9493
     Having examine many distribution and tested them for normality, I could not have visually inspected the histograms and determined if the data was a simulation or experimental.  Although, I have seldom seen p‑values as large as 0.9493.  One more point.  When I tested the population sizes of 1000 and 1999, I had multiple outliers.  Assuming that these software package that I used did their job correctly,  my experiment was a good reminder that outlier do not indicate that a test was performed incorrectly.  
    I used excel’s random number generator, RND(), to generate random numbers between 0 and 1.  Norminv(probabilty mean, stdev) where probability  is the result from the random number generator, mean=0 and stdev=1 to produce the outcomes of the experiment.  I then used JMP to analyze the data.
     

    0
    #88226

    Rick Pastor
    Member

     
    Anonymous:  Let me tell you what I would do assuming that the following has not already been done.  Measure the 13 point (12 perimeter, 1 center), keeping track of their (x,y) position in the disk and the (x’,y’) position on the perimeter or (r,q).  You could make a series of contour plots that would give you a visual impact of the uniformity across the die.  Making these measurements on multiple shots provide a statistical comparison of the locations within the die.  (260 measurements per shot with the selection of shots taking place over a period of time)
    The pointed that I am making is that I would want to be sure that there was no systematic behavior in the machining of the die that would cause some disk to go out of control more than others.  This is a lot of work; but it is better to improve the process and remove defects than capture them afterwards.  If you cannot improve the die, then you could at least look more carefully at high risk disk.
     
        

    0
    #88225

    Rick Pastor
    Member

     
    Point 1:  A the top right hand side of this page there is a sigma calculator. 
    Point 2:  I “like” Logon Luo description of what you need to do.  I would like to add that you probably have more opportunities for creating a defect than three.  Each opportunity is a potential point of improvement.   
    Point 3: The calculation of a sigma level is not very important if all that you ever processed was three replies ‑‑  “ Reply to Customer on Negative Feed back.”  Let me extend Logon’s example:  Suppose you processed 100 total replies and you had the 9 defects owing to the three replies that you used in your example.  What would your DPMO and sigma level be?  As you map the process per Logon’s recommendation locate other opportunities in which defects can be introduced. 
    Point 4:  Based upon the three steps, you might think about obtaining a yield and sigma level for each step.  Then you have to think about rolled throughput yield.
     
    Have fun!!

    0
    #88170

    Rick Pastor
    Member

     
    BOB WOW Interactions at What Sacrifice?  The following will show that a BOB WOW experiment “can” and “cannot” provide information on interactions but you have to make statistical sacrifices.  This sacrifices has not been mentioned in previous replies to the original message.  One pair of BOB WOW samples does not allow for an analysis of interaction while more than one allows the experimenter to evaluate interactions.  Dan states that 5 to 6 is a good number.  An example illustrates why one pair does not include interactions and more than 1 pair includes interactions.  At the same time, the example raises another question.  How to perform the experiment with more than one pair?
    BOB WOW Without Interaction:  Let set the experiment up.  We will perform a full factorial experiment:
     Let  A and B be a two factor two level full factorial experiment where
     A = your part    B = the subassembly into which your part fits
    WOW:  A = -1      B = -1                 (A,B)=(-1,-1)
    BOB:    A = +1     B = +1              (A,B)=(+1,+1)
     Assume that the customer selects one good and one bad “fully assembled part” that is there are no repetitions.  The customer decides to use a continuous scale to describe the idea of good (10) and bad (0).  The customer performs the BOB WOW experiment with the following results. 
     Pattern     A                   B                   Outcome
    -+                -1                   1                     3
    +-                1                   -1                     8
    —                 -1                   -1                   0
    ++               1                   1                     10
     
    Running a full factorial using the JMP software package gives
     
    Analysis of Variance
    Source    DF    Sum of Squares    Mean Square     F Ratio
    Model       3      62.750000             20.9167               …….
    Error         0      0.000000               …….                  Prob > F
    C. Total    3       62.750000                                       …… .
     
    By including the interaction A*B in the model
     Outcome = c0 + c1*A + c2*B + c3*A*B + error)
     there are no degrees of freedom for error, . 
     BOB WOW With Interactions: On the other hand if the experiment performed by the customer used two good and two bad fully assembled parts (one repetition) the results might look like the data in the table below.
     Pattern      A                    B                    Outcome
    -+                 -1                    1                      3
    +-                 1                    -1                      8
    —                  -1                    -1                    0
    ++                1                    1                      10
    -+                 -1                    1                     4
    +-                 1                    -1                     7
    —                  -1                    -1                    2
    ++                1                    1                       9
    Running JMP full factorial again gives 
    Analysis of Variance
    Source       DF     Sum of Squares   Mean Square     F Ratio
    Model          3       88.375000            29.4583            33.6667
    Error            4      3.500000               0.8750              Prob > F
    C. Total       7      91.875000                                       0.0027
     
    This time we have enough data that we can begin to evaluate the interactions between factors A and factor B.
     
    What is the Statistical Sacrifice?: In order to calculate averages and standard deviations we select n samples randomly from a population without replacement.  For example, if we want to now the average height of men in Illinois, we do not measure the same man n times and take an average.  In contrast, if the components in a BOB WOW experiment are simply exchanged, the same experimental unit is used in different runs and this does not make for a great DOE — independence. 
     
    Perhaps a better way is to treat the outcomes (-1,+1) and (+1,-1) as a B vs. C experiment.  In this case, we can say that the BOB WOW does not detect interaction.  Also, the experiment requires multiple runs and the customer might want to do the work.
     

    0
Viewing 24 posts - 1 through 24 (of 24 total)