The Power of Vacillation in Language Learning
John Case
Some extensions are considered of Gold's influential model of language
learning by machine from positive data. Studied are criteria of successful
learning featuring convergence in the limit to vacillation between several
alternative correct grammars. The main theorem of this paper is that there are
classes of languages that can be learned if convergence in the limit to up to
(n+1) exactly correct grammars is allowed but which cannot be learned if
convergence in the limit is to no more than n grammars, where the no more than
n grammars can each make finitely many mistakes. This contrasts sharply with
results of Barzdin and Podnieks and, later, Case and Smith, for learnability
from both positive and negative data.
A subset principle from a 1980 paper of Angluin is extended to the vacillatory
and other criteria of this paper. This principle, provides a necessary
condition for circumventing overgeneralization in learning from positive data.
It is applied to prove another theorem to the effect that one can optimally
eliminate 1/2 the mistakes from final programs for vacillatory criteria if one
is willing to converge in the limit to infinitely many different programs
instead.
Child language learning may be sensitive to the order or timing of data
presentation. It is shown, though, that, for the vacillatory success criteria
of this paper, there is no loss of learning power for machines which are
insensitive to order in several ways simultaneously. For example, partly
setdriven machines attend only to the set and length of sequence of positive
data, not the actual sequence itself. A machine M is weakly nary order
independent iff(def) for each language L on which, for some ordering of the
positive data about L, M converges in the limit to a finite set of grammars,
there is a finite set of grammars D (of cardinality Ÿ n) such that M converges
to a subset of this same D for each ordering of the positive data for L. The
most difficult to prove theorem in the paper implies that machines which are
simultaneously partly setdriven and weakly nary order independent do not
lose learning power for converging in the limit to up to n grammars. Several
variants of this theorem are obtained by modifying its proof, and some of
these variants have application in this and other papers. Along the way it is
also shown, for the vacillatory criteria, that learning power is not increased
if one restricts the sequence of positive data presentation to be computable.
Some of these results are nontrivial lifts of prior work for the n = 1 case
due to the Blums; Wiehagen; Osherson, Stob and Weinstein; Sch¨afer; and Fulk.