A New Probability Model for Data Oriented Parsing (Extended version) Bonnema, R.; Buying, P.; Scha, R. Data oriented parsing systems employ redundant stochastic tree substitution grammars STSGs to analyse natural language utterances on the basis of an annotated corpus (a treebank). An important component of such systems is the way in which the substitution probability of a parse tree is estimated from its occurrences in the treebank. In the standard method for doing this, the probability of a parse tree is directly correlated with its occurrence frequency in the collection of all fragments of all corpus trees. We show that this results in undesirable statistical biases. We therefore propose an alternative method, which estimates the substitution probability of a fragment as the probability that it has been involved in the derivation of a corpus tree. We show that this method has more plausible properties. Keyword(s): linguistics, parsing, probabilistic parsing, data oriented parsing, dop, statistical methods in linguistics, corpus linguistics