Let
be a set of attributes, and let Cl be the
class attribute. Given that an individual takes on the values
for attributes
,
in a Bayesian approach
the most likely class value c is the one that maximises
| (1) |
Here we write P(ai) as an abbreviation for P(Ai=ai).
In order to decrease the number of probabilities involved in this
calculation, and to increase the reliability of their estimates,
usually the simplifying naive Bayes assumption is made that
,
i.e. the values taken
on by the different attributes are conditionally independent given the
class value. The predicted class value c is the one that maximises
:
The classifier which predicts by maximising the above expression is called the naive Bayesian classifier, or Bayesian classifier for short. Essentially, it reads the description of an individual to be classified, and then tries to estimate how likely it is to observe such an individual among each of the possible classes. Thus, the fundamental problem of a Bayesian classifier (naive or otherwise) is to estimate how likely it is to observe an individual satisfying a particular description among given sub-populations. In our case these estimates are obtained from the training set, under the naive Bayes assumption of conditional independence. Even in cases where this assumption is clearly invalid, the Bayesian classifier has been shown to give good results [5].