September 03, 2002
Statistical Analysis of Spam - Part III
Once more it is time to study the statistical aspects of spam. In part II I commented on the differences in two independent sets of spam. Today I will try to be slightly more scientific.
Just by looking at the histograms in the previous posting it is easy to see that the probability distribution of spam is not normally distributed. It took me a while, and some hours playing around in Minitab to discover that the distribution might be the Weibull-distribution.
Identical components subjected to identical environmental conditions will fail at different and unpredictable times. We have seen the role that the gamma en exponential distributions play in these types of problems. Another distribution that has been used extensively in recent years to deal with such problems is the Weibull distribution, [...]- from Probability and Statistics for Engineers and Scientists by Walpole, Myers and Myers
The Weibull distribution's density function is defined as:

In Minitab the two constants alpha and beta are defined as the scale and shape of the distribution. Alpha as defined by Weibull is actually 1/(100*scale). Easily put one can say that the bell-shape of the graph is controlled by beta, and the 'strech' along the x-axis is controlled by alpha.
Below is shown a Weibull Minitab analysis of my 1964 spam.

Looking briefly at the numbers we see a shape of 2,45 and a scale of 20,22. What is more interesting is the graph showing the Weibull ProbPlot. The different points from my set of numbers seems to fit perfectly into what should be a Weibull distribution. (As also seen, there is a deviation from the straight line for values lower than 5,0, which should be expected since there are no values below that threshold.)
One also see a plot of the Weibull distribution together with the histogram in the upper right corner. It is easy to see that it fits quite nicely.
Tomorrow I will try to look at the Weibull distribution for both Anders' spam together with mine. With the distribution in place it should be possible to calculate the probability of false negatives.
Posted by ludvig at September 3, 2002 07:00 PM | TrackBackAbout WEIBULL and Minitab:
I have done a test with the Worksheet called INSULATE.MTW available on Minitab 13. It is not clear for me, why the Weibull distribution is chosen,...if the plots for Lognormal and normal also look so fine.
Could you held me? I would like to send you the plots I have done to give you an Idea what I am talking abot.
Regards, Maria
Wir sind froh, Sie auf unserem neu zu sehen! Wir hoffen, da? Sie das neue Design und die Bequemlichtkeit der Benutzung bewerten Sie. Bookmarked it.furnitre [url=http://furniture10.org/]furniture[/url]
Posted by: furniture at August 20, 2006 03:36 PM