Using the binomial distribution. The defects in a lot are binomially distributed IF the defects were produced at random OR the sample is taken at random (or both), as long as the sample is taken with repetition (a sampled part is returned to the lot after sampling, existing a probability to take the same part twice) OR the lot is infinitely large. In all these cases the lot is many times larger than the sample, so the “infinitely large lot” is a very good model.

Let’s call p the actual defectives rate in the lot (that you don’t know and will not know either), n the sample size, and X1 the number of defecives found in the sample. X will be binomially distributed. Let’s work with P(X<=X1), which is the probabilty that X1 or fewer defectives are found in a sample of size n taken form a lot with a fraction p defective.

I don’t remember the figures now, but let’s say that in a sample of 1000 I find 5 defectives. The point estimation for p is 5/1000=0.005, or if you prefer 0.5% or 5000PPM. However, it is perfectly understandable that the lot does not NEED to have 0.5% defective. But it is also clear that it is very unlikely that find “only” 5 bad out of 1000 if 30% was bad in the lot, or that you find as much as 5 out of 1000 if the lot had only 4.3PPM. This intuitive feeling can be shown with the probability:

We have n=1000, X1=5.

Let p=0.3 (30%) ==> P(X<=X1)=P(X<=5)=1.5×10^(-144), that's zero poin 143 zeros befor the first non zereo decimal. Do you let me say impossible?

Let p=0.0000043 (4.3PPM) ==> P(X<=X1)=P(X5 is impossible.

Clearly the result of these probabilities are too extremme.

I said the figures were the 90% confidence interval. The 90% confidnece interval that that leaves 5% of chances on each side. That means that we must find:

p such as P(X<=5)=0.05, and

p such as P(X<=5)=0.95

These p have to be found by trial and error (or an iterative method). Computers can certainly make your live easier here.

In this example, I found p=0.01 (1%) and p=0.0026 (0.26%). This is the 90% confidence interval. Compare with the point estimation of 0.5%.

It is interesting to note the effect of the sample size. If you find 50 in 10,000, the point estimation remains 0.005 (5%), but the 90% confidence interval narrows to 0.4% ~ 0.6% (compare with 0.26% ~ 1.0%).

Now, if your question what is the formula, then the formula you need is the one to calculate the cummulative probability in a binomial distribution. Go to any statistics book or website or use the BINOMDIST function in Excel.