US 7,051,077: identifying and controlling spam
A method for classifying an e-mail message received over a digital communications network as unwanted junk e-mail or spam, comprising:
accessing an output from a first e-mail classification tool and an output from a second e-mail classification tool differing from the first e-mail classification tool, wherein the outputs are indicative of whether the e-mail message is spam and differ in format; converting the outputs from the first and second e-mail classification tools into first and second standardized outputs, respectively, having a predetermined standardized numerical format;
generating a single classification output by combining the first and second standardized outputs; and
providing the single classification output to a comparator for comparison with a spam threshold value for determining whether the e-mail message corresponding to the single classification output is spam.
The abstract states:
A method, and corresponding system, for identifying e-mail messages as being unwanted junk or spam. The method includes converting the outputs of a set of e-mail classification tools into a standardized format, such as a probability having a value between zero and one. The standardized outputs of the classification tools are then input to a voting mechanism which uses a voting algorithm based on fuzzy logic to combine the standardized outputs into a single classification result. The use of a fuzzy logic algorithm creates a more useful result as the classifier results are not merely averaged. In one embodiment, the single classification result is itself a probability that is provided to a spam classifier or comparator that functions to compare the single classification result to a spam threshold value and based on the comparison to classify the e-mail message as spam or not spam.
Wikipedia discusses fuzzy logic:
Degrees of truth are often confused with probabilities. However, they are conceptually distinct; fuzzy truth represents membership in vaguely defined sets, not likelihood of some event or condition. To illustrate the difference, consider this scenario: Bob is in a house with two adjacent rooms: the kitchen and the dining room. In many cases, Bob's status within the set of things "in the kitchen" is completely plain: he's either "in the kitchen" or "not in the kitchen". What about when Bob stands in the doorway? He may be considered "partially in the kitchen". Quantifying this partial state yields a fuzzy set membership. With only his little toe in the dining room, we might say Bob is 99% "in the kitchen" and 1% "in the dining room", for instance. No event (like a coin toss) will resolve Bob to being completely "in the kitchen" or "not in the kitchen", as long as he's standing in that doorway. Fuzzy sets are based on vague definitions of sets, not randomness. [IPBiz: hmmm, Texas Digital and Phillips and patent claim term definition.]
Fuzzy logic allows for set membership values between and including 0 and 1, shades of gray as well as black and white, and in its linguistic form, imprecise concepts like "slightly", "quite" and "very". Specifically, it allows partial membership in a set. It is related to fuzzy sets and possibility theory. It was introduced in 1965 by Prof. Lotfi Zadeh at the University of California, Berkeley. [IPBiz: where else but Berkeley for fuzzy logic?]
**
Separately, from SpamButcher:
SpamButcher uses multiple methods to reduce spam including fuzzy logic, customer filters and optionally spam control server lists. SpamButcher also allows end-users to add custom filters to optimize the product to their needs. As advances to filtering technology are introduced into the code-base, free updates will be made available.
0 Comments:
Post a Comment
<< Home