Discovering the truth by conducting experiments
Wouter Koolen
Abstract:
Paul Vitanyi's 2003 Kolmogorov complexity lecture included a computer
exercise in which a polynomial relation had to be learnt from
samples.1 The following data were provided: a sequence of pairs of
numbers (h1, d1), (h2, d2), . . . , (hn, dn), supposedly noisy
measurements of a classical urn, hi being the height from the floor
and di being the diameter of the urn at the height hi. The goal was to
infer a polynomial that represented the relation between height and
diameter. For a given degree, this can easily be done using linear
algebra. The crux of the exercise was finding the best degree.
To me, learning from given data is only part of a more general concept
of learning, and I started to wonder whether the techniques that I
learnt during my studies could be adapted to an interactive setting,
allowing the learner to perform experiments. For example, when
learning polynomials, the learner could be allowed to choose a point,
and she would then receive the value of the polynomial at that point.
For this thesis, I started working on the interactive polynomial
learning problem, but it turned out to be much too hard. I then
devised the balance scale problem, a toy problem that conserves the
important features of the polynomial learning problem: it is
interactive, probabilistic, model-based, but finite. I had by then
developed a slight aversion to subjective Bayesian methods, for my
initial work on the polynomial learning problem suggested that they
are not robust. It seemed that a subjective Bayesian learner can be
tricked into assigning high posterior probability to a certain
proposition while this proposition is false, and additionally, great
confidence in this proposition leads to great confidence in the
usefulness of experiments that in fact do not help to determine that
this proposition is false.
With this in mind, I decided to perform a worst-case analysis of the
balance scale problem, and of similar problems in general. This
problem naturally decomposed into the truth-finding problem, where we
want to find the true model from given data, and the experiment-design
problem, where experiments have to be selected, whose outcomes
subsequently serve as the data for truth finding.
I have yet to solve the balance scale problem completely. But I have
already learned and discovered much more than I could initially
imagine. I hope that this thesis will provide inspiration to others.
Keywords: