|
|
 |
Determine # & Size of Bins for Histogram
 |
|
|
|
|
|
Message: 34726 Posted by: Mike Posted on: Tuesday, 21st October 2003
Hi,
I have several thousand rows of data which I need to group into appropriate sized bins for a histogram. The problem is in determining the correct number and size for each bin, i.e. if I had a bunch of test scores and didn't have pre-determined bins (A,B,C,D,F) that were already sized (> 90, 80-89, 70-79, 60-69, <60), how would I figure out the number and size of the bins??
Thanks! Message: 34732 Posted by: Gabriel Posted on: Tuesday, 21st October 2003
I have known about the following steps and used them since then. It has no theoretical base as far as I know, but it seems to work fine:
1) Number of bins (first trial): n1=sqrt(N-1)-1, where N is the number of individuals.
2) Bin size (first trial): s1=(max-min)/n1, where max and min are the higher and lower individuals
3) Bin size (definitive): s=Round UP s1 to the precision of the data. For example, if s1=0.32 mm and the data is in 0.1 mm format, then s=0.4mm.
4) The lower limit of the first bin will be min-(1/2 of the precision), the upper limit of the first bin will be the lower limit + s, and this also be the lower limit of the second bin, add another s to get the upper limit of this second bin which will also be the lower limit of the third bin and so on.
As said, this is a gideline. If you don't like the result then you can increase or reduce the bin size, but allways keep the size a multiple of the precision as said in point 3) (if not some bins will contain more possible results than others, and the bars of those bins will be fakely higher) and allways keep the limits of the bins "between" possible readings as said in point 4), if not the bins will be "unbalanced". For example a bin "larger than 10, up to 12" has its center at 11, but if the resolution is 1 the possible results are 11 and 12, which has a center in 11.5. A bin (10.5; 12.5) has a center at 11.5, which matches the centyer of the possible results and, by the way, you don't have to bother thinking if it is "larger" or "larger or equal" than 10.5 and "lower" or "lower or equal" than 12.5, because you will never have a data point "equal" to 10.5 or 12.5 anyway.
Message: 34733 Posted by: Heebeegeebee BB Posted on: Tuesday, 21st October 2003
Check this link out:
http://www.sytsma.com/tqmtools/hist.html
Message: 34909 Posted by: Mike Posted on: Thursday, 23rd October 2003
Thanks for the replies Gabriel & Heebeegeebee!
The following two are from published studies:
1) bin width = 3.49*ó*N-1/3 2) bin width = 2*(IQR)*N-1/3
where IQR = 75th pctl - 25th pctl; N = number of samples; and the number of bins would be based on dividing the dataset range by the bin width.
This one is a rule of thumb I found on the Internet:
3) number of bins = 1+3.3*ln(N) where the bin width would be the dataset range by the number of bins
4) I've also tried Excel's built-in data analysis tools.
5) Gabriels's method
Here is what I get with the test data I'm reviewing (I've left out some small % of some bins so it won't total 100%): 1) bin width = 888; number of bins = 338; 97% of items in one bin, 1% in next bin, then 1% 2) bin width = 17; number of bins = 17564; 20% of items in one bin, 13% in next bin, then 11%,6%,5%,5%,3%,3%,3%,2%,2% 3) bin width = 9606; number of bins = 31; 99% of items in one bin, 1% in next bin 4) bin width = 3093; number of bins = 97; 99% of items in one bin, 1% in next bin 5) bin width = 3158, that's as far as I took it
All of these give way to many bins because most of the data is clustered below a certain number and the range below the lowest and highest numbers is quite large.
Message: 101928 Posted by: hide Posted on: Sunday, 1st October 2006
This page provides the method to select histogram bin size (or number of bins) of your data.http://www.ton.scphys.kyoto-u.ac.jp/~hideaki/res/histogram.htmlBest,
"The Bottom Line" Links
|
|
|
|
1BMG
UNIVERSITY.
|
I
I
|
|
I
I
|
• Reduce Travel Costs
• Maximize Training Budget
|
|
![]() |
|
|
![]() |
|
SIGMAPRO |
MBB, Lean Sigma, & DFSS
when
experience
matters
most...
|
|
![]() |
M O T O R O L A U N I V E R S I T Y |
Learn from the most experienced practitioners of Six Sigma in the world
Public Training & Certification
Click here to take a free Six Sigma Lesson
|
|
![]() |
|
J |
URΛN
| |
Lean Six Sigma Public Workshop
Upgrade to Black Belt - November 2008
Become one of your organization's 'vital few'. Get Juran Certified. |
|
|
![]() |
Pyzdek
Institute |
Online training and certification
from the author of the
Six Sigma Handbook |
Starting at
$695 |
|
![]() |
|
LodeStar Institute
| |
Affordable DFSS, Lean Sigma, MBB
Public & On-site Certifications |
|
On-site
certification classes starting from $1,800/person!
>>Learn about LSI
specials... |
|
![]() |
Finding that key person for your team is just a click away . . .
|
|
|
TheJobShop
jobs.isixsigma.com
|
|
|
|
|
|
![]() |
THE UNIVERSITY OF
TEXAS
AT AUSTIN |
|
2 weeks + 1 project = Black Belt Certification
|
|
|
![]() |
| . |
Find us on LinkedIn |
Join the iSixSigma Network and Connect with Other Six Sigma Pros |
|
| . |
|
![]() |
| . |
iSixSigma Live! Summit & Awards Jan 13-16, 2009 • Miami, FL |
Save up to $500 • Click Here! Register by October 14 |
|
| . |
|
Download the iSixSigma Toolbar for 1-Click access. Search Your Way. Everyday. Without Delay.

|
 |
|