- New JobMondelezCI Engineer
Home › Forums › Old Forums › General › Tips on how to explain normal distribution
This topic contains 22 replies, has 20 voices, and was last updated by Richard Thomas 9 years, 9 months ago.
I need to deliver some training to a group of approx 10 machine operators. The purpose this is to explain why they must achieve a certain seal strength from the samples they take at first-piece trial.
We state a minimum seal strength in our specification (there is no max value) but I want them to make sure that the first-piece values are not near this minimum. I am suggesting that they apply the mean minus 3 std dev logic and thus would like to explain normal distribution to them.
Their knowledge of basic statistics is minimal/ non-existant so does anyone have any suggestions/ examples I can use to simply explain this?
I had thought of using the distribution of their height as an example, but am conscious that I will be working with a small sample size.
Any thoughts?
Thanks
If you have prior seal strength data take a look at it yourself and select a subset of the data that is approximately normal (seal strength is one of those properties that is bounded – the seal strength cannot exceed the strength of the material – so, depending on your current process control, your data may be very non-normal) and plot it manually on a large sheet of graph paper. (If you want them to see the histogram evolve-print your pre-selected set of data in the form that it would appear after having been recorded on the shop floor and ask someone to read the numbers to you while you plot them in front of the class-it won’t take that long to plot 30 or so data points and it will let everyone see just how it was generated)
Run a simple bean count and show them how many data points of the total population are contained in the regions below 0,1,2,and 3 standard deviations from the mean.Take this thought and turn it around and point out to everyone what this would mean if the lower spec limit were located at these points. Remind everyone that when a machine is set up for a run you are trying to hit the target (mean) the first time. Point out that if you have done the best you can and the first piece comes back with a value that is exactly on the minimum this would suggest that if your process is behaving as it has in the past then there is an excellent chance that 50% of the run will be rejected. From this it is an easy jump to justifying the requirements for a first piece trial.
If you use graphs to illustrate each of the points as you move through them you will probably get your point across. I’ve used this approach with children as young as 9 years of age and it has worked well.
To help get active participation ask each of the operators to identify a bill they have all paid, i.e grocery bill, telephone bill etc, Then agree on a common one, get each to supply there own ( each should have a total number of entries of more the 25) and do the maths on the sample of 250
Ar
First, explain the concept to them from a statistical, technical point of view. If you have a bead box (quincunx), and can demonstrate the concept to them, so much the better. At this point, you will have noticed that their eyes have glazed over. The lights are out. No one is at home. Now, lead your class to the company cafeteria (hopefully you have one). Pop a bag of popcorn in a microwave oven and turn it on. As the corn begins to pop, point out that the popping begins slowly, with just a few kernels and builds until it sounds like they are all poping at once and then tapers off. Bring the point home by explaining that if you make a histogram with the cells of 10 second durations, and put all the corn that pops during the first 10 seconds into the the first cell, all the corn that pops during the second ten seconds into the second cell and so on, you will have a histogram that resembles a normal distribution. (Be careful not to burn the corn.) You and the class may now enjoy the fruits of your labor.
I have used this example for years. Whenever is use it I see the lights go on. The moral: take something from life that most people know about and that also demonstrates the principle. Meet them where they are. Build a bridge from there to where you want them to go. Standing on the opposite shore and yelling at them to come over is an excercise in futility; your’s and theirs.
Marc Richardson
Sr. Q.A. Eng.
Have I heard simple?
Get 3 dice, throw them and tell the operators: “Let’s assume that the sum of the five dice is the srength of a seal”. It is easy, fast, and you can make the sample as big as you want. It looks very much as a normal distribution even when it is not (you will never get values under 3 or over 18, what is about ±2.5 sigmas, but you don’t need to tell them that).
You can see the the normal shape appear marking every trial in a histogram chart. The shape will become apparent faster if you group the results (3 and 4 in one bin, 5 and 6 in the next, and so on).
You can also simulate how poor can th lower tail stenght be even when they got a good result in one initial sample, and how thay can improve that if they take several samples and average them.
Good luck.
That is perhaps the best example of explaining the ND I have ever heard. Excellent stuff!
Ronan,
It often helps to stay away from technical data when trying to explain statistics to a new group. I have had a lot of success by using shoe sizes of men in the surrounding area. I see you were already thinking along these lines by using the groups height.
Arm yourself with a flip chart, and a marker and get the group involved in suggesting the average shoe size. You could use this size as the highest scoring, and then smaller scores either side of this size. You can decide your own sample size. Stand back, draw yourself a pretty little normal distribution curve….hey presto!! One group of newly educted employees.
Hope this helps!
I am sure you have already got a lot of interesting examples to collect data.
I sugges, better to explain them unilateral and bilateral tolerance, normal ditribution by taking data from any method as suggested by few of our freinds through this forum and then come to Cp and Cpk, then explain in your case where the minimum value is given as spec, and then come to Cpk lower and Cpk upper and then explain in your case which Cpk is required to be monitore and now draw the point why it is required to be at marginal far from from mim spec in first palce at the starting point.
Use Quincx box to explain, I am having a dbase small program for the demo of quincx box in soft, revert back to me on my email, I will forward it to you.
Also you can ask to find out no of coins one have in his or her pocket to collect data and make it interesting.
I have always found it helpful to think of the normal distribution, and process capability, as a car on a road. Most people can visualize this. Similar to the popcorn analogy (good one by the way), they can visualize driving down a road. Most of the time they are staying toward the center of their lane, but occasionally start to drift towards the side, or towards the center of the road. . If they were to take a reading of where they are on the road every so many yards, they could see where most of the readings would be grouped to the center, and would taper off as they moved towards the side. You can then explain that the tighter they stay to the center of their lane, the less variation their driving (and thus the process) has.
It also helps to explain Cp and Cpk. If they view the edges of the road as the specification limits, then Cp is how many widths of the car (distribution) can fit on the road. Cpk is how close the car is to the edge of the road. A negative Cpk means you aren’t even driving on the road, and it’s telling you how close you are to getting back on the road.
I hope this helps
Scott
Ad 1) Use Taguchi’s method to guarantee your seal strength (parameter design / robust design / quality engineering = different names for the same thing). If you need assistance here, I’d be happy to help you along. Input is engineering knowledge, statistics is not required; output is a robust product with fine Cpx’s, when your process provides the capability.
Ad 2) I was tempted proposing throwing dice as well. However, many times they don’t provide a uniform probability distribution. So the chance is low to actually plot a ND from trials.
Better take a bag with 10 balls (or letters or other thingies of the same size). Let, e.g. 3 of them be black, 7 of them be white. Blindly pick 5 together out of it. Make histograms for finding 0, 1, .. 5 white balls. It should peak at 3.5.
The ND is much easier to see here. As an aside you can line out the underlying ‘law’, which makes ND’s appear in reality:
you have a number of independent events (like the popping corns)
each event has the same likelyhood to happen
the number of simultaneous independent events is high
=> so in the limiting case we’ll find the continous ND
I think you can relate those observations to the other examples mentioned here, like size of shoes:
you observe many men
each man’s shoe size is independent of the other ones (what has Jim to do with Frank in terms of shoe size?)
you observe quite many men
the more men you observe, the better the histogram matches the mathematical ND
Finally, line out the convenience to >consider< certain events as being a high number of independent events with constant probability: using the ND approximation makes it very easy to describe the complex event.
Michael Schlueter
To start explaning normality I prefer not to use data / examples based on their work-environment. I’ve noticed that when using such examples people tend to focus primarily on why individual measurements are what they are, because the process is to familiar to them (so they start looking for explanations).
I’ve also found that using bead boxes and such to start the introduction can also be counter-productive. Some people (depends on a bit on the culture and background of the group) are suspicious about these types of ‘theoretical experiments’. It is too unfamiliar for them and does not have enough relation to their idea of reality.
So we need something that is real & tangible, could be shoe-sizes, could be tins of canned food, bags of popcorn, …. I prefer to use something that is clearly the result of a definable process. Preferably they have no actual knowledge of the production-process to avoid they start focussing on causes of individual measurements. Along the way you’ll need to explain the process (at least I prefer to do it), so make sure you know a bit about it. Do not over-simplify the process-detail or they might start to doubt the reality of the exercise again, do not over-detail either, or you’ll never get your message across.
(Shoe-sizes are real-life measurements, but did you ever try to explain what the process is that leads to people having different shoe-sizes? ;^)
E.g. you could use bags of popcorn (and if you’re working for top-management => tins of caviar ;^)
By 30 bags (of the same advertised weight) of one brand and weigh them all, plot the weights and you’ll see something like a normal curve appearing. Hopefully this curve is centered around the the advertised weight, giving you a starting point to talk about averages and deviation.
Buy 30 bags (of the same advertised weight as the first load) of another brand and repeat the exercise, hopefully again a normal curve appears. You can then overlay both curves and hopefully see a difference in centering & width of the curve, giving you starting points to start talking about process capability (and how the process of Brand X is better controlled then the process of Brand Y).
Now you can start to explain the production-process and ask them what could be causing the differences. Or stated otherwise, introduce them to the concepts of special and common causes.
And so on. Before you know it, you’ll have explained everything you need.
Some things to keep in mind:
Whatever product you choose, do some private experiments first to make sure the measurements show the characterics of normality clearly enough (some manufacturers are already achieving quite high process-capability, which is fine for them, but makes it more difficult for you to show what normality is about). Products of which the production-process is not super-automated are good candidates (people tend to produce larger variations ;^)
Also when choosing different brands, make sure they are from different manufacturers (different brands from the same manufacturer tend to produce very similar distributions, cause mostly they are produced by the same production-process).
Try choosing a low-cost brand & a high-cost brand. If you’re lucky you might demonstrate demonstrate value-for-money. This also gives you a way to start talking about ‘Fitness for Use’ and different perceptions of quality to different people.
It is always nice if you can let the group choose the product that will be used, this makes it even more real for them Of course you should be steering them in the direction of products/brands you have already experimented with (and away from work-environment examples). This is a good exercise for your own facilitating skills.
And last advice => MAKE IT FUN
Ronan,
I teach a Yellowbelt class in which part of the discussion focuses on the normal distribution curve. I have found that practical real-world examples works best to demonstrate that most data, by nature, tends to follow a normal distribution. Once they gain an understanding of a real world example, you can close the gap by comparing your real-world example to some actual on-the-job data.
The example I focus on is the distribution of batting averages of major league baseball players in a given year (minus pitchers of course because they skew the data with their pitiful averages). Almost everyone can relate to this. After all, it is America’s favorite past time (second only to reality TV shows). After some searching on the internet, I came across a database of baseball statistics. I was able to download this data and manipulate it so I could access the data I needed. This data forms a very smooth normal distribution curve that makes discussion effective and easy. Everyone enjoys this example and uses it as a frame of reference anytime they need to remember the definition of a normal distribution.
Good luck!
Oh yeah, one last thing. KEEP IT SIMPLE!!! I can’t stress this enough! Forget Cp, Cpk, Taguchi or any other gaudy statistical association because it will only intimidate and scare those who do not need to understand these concepts. Based on your description of the audience and your purpose for delivering this information, simplicity is all you need. You’re goal is to deliver an understanding, not to prepare them for a stats exam. I’m always amazed at how often people like to throw out perplexing statistical jargon in order to explain even the simplest of concepts.
A picture is woth a thousand words, but a hands-on simulator is priceless. I found a shareware program that illustrates SPC. It is a fine tool to demonstrate normal distributions. Take a look at it on http://www.symphonytech.com, and click on the quincunx simulator link.
Steve Clapp
Spike,
That’s a fantastic idea! I have a shift of employees that does nothing but talk about baseball — I’m already thinking of how I can best use your idea :). Thanks!
Phil
Michael, Could you further explain the following?”I was tempted proposing throwing dice as well. However, many times they don’t provide a uniform probability distribution. So the chance is low to actually plot a ND from trials.”Because of the central limit theorem, the sum of 3 dice will give you a pretty normal shaped distributon, even if the dice are a bit unfair. I used it in a course of SPC and it worked very nicely.”Better take a bag with 10 balls (or letters or other thingies of the same size). Let, e.g. 3 of them be black, 7 of them be white. Blindly pick 5 together out of it. Make histograms for finding 0, 1, .. 5 white balls. It should peak at 3.5.”First of all, if you have 10 parts with only 3 not-white and take 5, you will get at most 3 not-whit, so at least 2 white. So do not bother making histogams for finding 0, 1, because you will never find 0 or 1.The distribution of the number of white balls in the sample will be hypergeometrically distributed, not normally.Further more, you will have only 4 values (2, 3, 4 and 5) so I don’t think it will be very didactic any wayIf it was an “infinite” population with 70% white, the occurence of “white” in a sample of 5 individuals would be binomially distributed, not normally.Neither the sum of 3 dices nor the number of white balls in a sample (either from a finite or infinite population) gives you a real normal distribution. But as you can see in the following charts, the “sum of 3 dice” is a much better approach.
some years ago i have to explain normal distribution to shop floor operators without a formal training on statistics. I opt not to made a lecture, but to take samples and made graphs with them. This approach call for attention of operators and them I start to show the basics of SPC. Only after that i explain normal distribution. I found this approach very helpful even this is not the normal way we look at the problem.
other way may be to have some normal distribution game like a box with black and white beans ( look for Statistical Quality Control by Eugene L. Grant(McGraw Hill Series in Industrial Engineering and Management Science)
I like the analogy you have developed! I would like to use this information for future training sessions. Can you forward the file you have developed for my presentation purposes?
Nice charts. Excel, right?
Would you send me the excel file so I can see how the histograms and the normal curve are done?
Thanks!
Spike,
Would it be possible to have you send me your baseball stats and allow me to use it during Green Belt training that I conduct? I’d appreciate it. Your approach is very simple. We are currently using measurements of the class participants, but it’s getting a little old. One can only measure so many shoe sizes, or deal with the height of participants before getting bored as an instructor!
Thanks!
Steve
Tom,
Even if I had your e-mail adress (that you didn’t post) I could not send you the file because I never saved it. I just copied the charts as .gif pictures.
However, I just used the normdist, binomdist and hypergomdist functions.
For the sum of 3 dice, I listed all the combinations with their sum, took the average and standard deviations of the sums (to apply to the normdist) and counted the occurrences of each result using {sum(if(…)}.
Ronan,
I’m doing an SPC class in a few days. I agree with those who suggest straying away from intimidating statistical jargon and also feel that you should keep it simple and relevant.
As far as the ND goes, in order to show them the difference between approximately normal and definitely non-normal, what I’m doing is having them work in 3 groups to build a histogram by rolling 2 dice 50 times. One group is given loaded dice (one of the two die is weighted).
Yes, it’s true that rolling two die 50 times will not be exactly normal, but it gets the point across, when compared to the block produced by the loaded dice. You can also use the opportunity to mention that one simply needs an approximately normal curve to do meaningful SPC; most controlled process data will not be normal anyway, because engineers are constantly tinkering with the process in order to improve it or meet customer specs.
How did your session go (if you’ve already given it)?
Thanks,
Rob
Try using small bags of multi colour sweets.
Give two / three bags to each candidate (increases sample size).
Pick a particular colour and ask them to count how many in each packet. Chart this down on your white board!.
Very easy and they can eat the experiment after!!
Richard
The forum ‘General’ is closed to new topics and replies.
© Copyright iSixSigma 2000-2015. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »