Thursday, July 17, 2014

Long Run Relative Frequency of Classification Error

Not everything that is called "Bayes" is Bayesian! Bayesian theory is an approach to probability, one with interesting philosophical and technical consequences. There is not a unique system that is itself "Bayesianism", since different folks emphasize different things. That said, Savage and Jaynes are notable references.
But there is Bayes and there is Bayes. Some things, like Bayes Error are not Bayes (indeed, Bayes theorem will not even be used in this post). Instead, the Bayes Error is best thought of the frequency that a particular classifier will misclasify observations.


Consider two overlapping uniform probability distributions:

A simple classifier is a vertical line. This separates the plane into two parts. We will label one part green and another part blue:
The classifier mistakes some of the blue for green and some of the green for blue. This means that the error is the area:

If we let \(C_i \) be the property of being labelled the center, then the formula for the error is

\[ p(error) = \int_{-1}^0 p_g(x)dx+\int_0^1 p_b(x)dx\\ p(error) = \int_{-\infty}^\infty p_g(x \cap C_b)dx+\int_{-\infty}^\infty p_b(x \cap C_g)dx \]
Since the definition of conditional probability is \(p(A|B) = p(A \cap B) / p(B)\), the above can be written:

\[ p(error) = \int_{-\infty}^\infty p_g(x|C_b)p_g(C_b)dx+\int_{-\infty}^\infty p_b(x|C_g) p_b(C_g) dx \]

This last one is the general definition of Bayes error. One can easily see that this integral gives a line for the error in the above situation. Therefore, moving in one direction is always optimal, until one reaches the edge of the higher uniform pdf. This is only true in this trivial example, but it makes the geometric interpretation obvious. One can compute this for several pdfs and get semi-parametric bounds via Tchebysheff's inequality.

No posts for the next three days as I will be out of town.

No comments:

Post a Comment