# What is the intuition behind the "correctness" of k fold cross validation?

The error probability of a given classifier intuitively is the average of falsely classified points in a large set of randomly chosen points from X x Y. Now the result of k fold cross validation is supposed to be an estimate for the aforementioned error probability. However, there one uses different training sets, yielding different classifier. What is the intuition behind the "correctness" of k fold cross validation? In other words: Why should the computed number, which stems from multiple classifiers, be a good estimate for the error probability of the classfier which we obtain when we use the entire data set for training? Do we suppose that the classifier on k - 1 of the k sets behave in a similiar way as the classifier on the k sets?

Hi dimant, thank you for the great question!

You are right. Sadly, CV *over*estimates the error of f, the classifier trained on all data. This means the true error of f will usually be slightly lower. In this sense, CV gives a conservative estimate: in truth, f is even more accurate than CV lets us think.

There is no way to measure the error of f. To measure it, we would need to train on all data, but then no data would be left for testing. The best we can do is to take:
k := n-1
This method, called leave-one-out CV (LOOCV) will give you a very, very precise estimate of the error of the classifier trained on all data.  As far as I know, Billy is preparing an exercise on LOOCV.
by (140 points)