do we need to know the type of distribution (eg.normal,weibull) of the data before we can build a boxplot?if not, then how are we going to get our 25 and 75 percentile,etc?
Percentiles are created by ranking the data and dividing it into 100 equal incriments. Twenty five incriments and seventy five increments are really no different than a median which is the center line (50 increments). The percentiles do not care about he underlying distribution. If you look at the Minitab nonparametric tools they use the median.
is this the same for the minimum and maximum value when making the boxplot?
so , we may conclude that we do not need to know the distribution of the data when plotting a box plot,ya?
Dear Hoon,
You are absolutely right.In fact Box plot is a Non Parametric test.Non Parametric tests are those where there is no assumption made about the type of distribution.A simple way of finding the percentile value is
((Percentile)*No of readings) + 0.5
The answer obtained above is the no of reading ranked.For example we wnat to find 25th Percentile of a set of nos where total obs are 40.
(0.25*40)+0.5 = 10.5
Which means that the avg value between the lowermost 10th reading and the 11th reading is the percentile value.You can use this formula to determine any percentile value.
Once the Median. third and first quartile are determined you can make the box.The whiskers of the box extent to the higher most vaue in the data set or the lowermost value of the data set or Q1-1.5(Q3-Q1) on the lower side or Q3+1.5(Q3-Q1).Incase there are any points beyond these calculated values they are identified as outliers.The value of Q3-Q1 is also called the interquartile range.
In fact in analysis of data set if Histogram is the front view of Data the Box plot is the top view of data

I think you have got it. Good luck.

