DS-1994-05: Executable Language Definitions, Case Studies and Origin Tracking Techniques

DS-1994-05: van Deursen, Arie (1994) Executable Language Definitions, Case Studies and Origin Tracking Techniques. Doctoral thesis, University of Amsterdam.

	Text (Samenvatting) DS-1994-05.samenvatting.txt - Published Version Download (9kB)
	Text DS-1994-05.text.pdf Download (2MB)

Abstract

This thesis is about mathematical descriptions of computer languages. This raises the question of what computer languages are, and why they should be described mathematically.
A computer language gives people the ability to communicate with an automated system. To name three examples: The programming language Pascal can be used to build entirely new systems; The query language SQL serves to ask questions to a database; Finally, the command language of MS-DOS enables a PC user to copy or delete files. Example sentences of these languages can be found in Figure S.1.
It goes without saying that the systems being given commands in such a computer language must know something about the language in question. This is made possible by language-specific tools, which can parse, interpret, analyze, correct, optimize, translate, etc., sentences in a particular language. A collection of such tools is called an "environment" for that language.
To increase insight into computer languages, a large part of computer science research is devoted to describing and analyzing all kinds of languages. A central tool in this process is a mathematical description of a language. Such a description characterizes the most important properties of the language, including the syntax and semantics (structure and meaning) of the language.
Creating a description is a lot of work, but it has at least two important practical advantages:

First of all, it can serve as a definition of a language. This is particularly important when a new language is being designed. The mathematical definition then forms a precise and unambiguous description of the language.
Secondly, and more surprisingly, such a language description can, under certain circumstances, be used to automatically generate an environment for a language (see also Figure S.2). This is possible if the definition is sufficiently detailed, and when it can be read as a kind of "recipe" that can be "executed" in one way or another.

This is also of great importance during the design of a new language. Having a language environment easily available enables experiments with the use of the language at an early stage, allowing the usability of the language to be investigated.
These types of executable language definitions are central to this thesis, as expressed in the title: Executable Language Definitions.
In recent years, various methods have been proposed for writing such executable language definitions. This thesis takes place in the context of algebraic specifications, which were introduced in the late seventies. Often, such specifications can be executed by means of term rewriting. In the late eighties, an algebraic specification formalism called ASF+SDF was developed at the Center for Mathematics and Computer Science (CWI) and the University of Amsterdam. This formalism is particularly suitable for describing languages. Together with an accompanying system, the so-called ASF+SDF Meta-environment, the language definitions can be executed, which in turn leads to automatically generated, language-specific environments.
Part I of this thesis discusses experiences with the use of ASF+SDF. The most striking advantages of ASF+SDF are the simplicity of the formalism (making it easy to learn and understand) and the ability to write very readable specifications. However, it is less clear what the main limitations of ASF+SDF are. Therefore, three case studies are discussed in Part I, which are used to bring the problems with ASF+SDF to light.
Chapter 3 discusses the design of a language developed by Cap Volmac and Bank MeesPierson. MeesPierson offers its customers a large number of "interest products." To survive in the financial markets, it is important to be able to quickly launch new interest products that are slightly more attractive to the customer than the competitors' products. However, introducing a new product has significant consequences for the bank's automated systems: the financial administration must be aware of it, and the management information systems must be able to properly assess the extra interest risks the bank incurs through the new product. Modifying this software is a time-consuming activity. Therefore, CAP Volmac and MeesPierson decided to design a language, called RISLA, in which the characteristic properties of the various products can be recorded. Given a RISLA description of a certain product, the software required for processing this product can then be automatically generated. ASF+SDF was used during the design of RISLA.
Chapter 4 shows how ASF+SDF can be used to define the programming language Pascal. The emphasis is placed on static semantics, i.e., describing which errors in a Pascal program can be detected without that program being executed.
Finally, Chapter 5 illustrates the use of ASF+SDF for obtaining language-specific tools for a specification language, as used in the context of action semantics. This last case study is by far the most complicated of the three.
The remaining question is to what extent these case studies have brought problems to light. First of all, it should be noted that in all cases, ASF+SDF proved to be a very suitable formalism for describing the language in question. Yet, the design of the language for Bank MeesPierson showed that for commercial applications, it is important that the link with the common business language COBOL is easier to establish. The specification of the static semantics of Pascal showed that using only term rewriting to execute language definitions did not yield sufficiently informative tools. Finally, the use of ASF+SDF for constructing an environment for action semantics led to an extensive list of comments and recommendations for improving ASF+SDF.
For the second problem, which occurred in the Pascal study, a solution is discussed in Part II of this thesis. In the Pascal case study, a definition is given of the errors in a Pascal program that can be found by analyzing the program. If we execute this definition, we get a (type) checker for Pascal programs. Let's assume we have written a small Pascal program ourselves and run our checker on it. We probably made a few mistakes, and the checker will detect them and provide a list of errors. This tells us what is wrong.
What we also want to know, especially when the program we are writing is large, is where in our program we made the error. The observation made in Chapter 4 is that the generated checker cannot provide this information when we have obtained the checker by simply executing the definition via term rewriting. In Part II, origin tracking is discussed, an extension of term rewriting that automatically keeps track of the origins of certain calculations. For every intermediate value that is the result of a single rewrite step (i.e., for every step in the large calculation), a set of "pointers" (the origins) to the relevant parts of the initial value is maintained. These sets are maintained up to and including the final result, where they then provide the desired origin information. One difficulty is, among other things, the choice of which information should be kept (not too much, but also not too little!). Furthermore, it is desirable that this information is kept track of fully automatically; the original language definition must not undergo any changes.
In Chapter 6, the general principles of origin tracking are discussed. In Chapter 7, we show how, if we concentrate on a common form of algebraic language definitions, primitive recursive schemes, we can improve the quality of the delivered origins. In Chapter 8, we explain how, if we extend the way language definitions are written with powerful higher-order constructions, we can still realize origin tracking in a good way.

Item Type:	Thesis (Doctoral)
Report Nr:	DS-1994-05
Series Name:	ILLC Dissertation (DS) Series
Year:	1994
Subjects:	Language
Depositing User:	Dr Marco Vervoort
Date Deposited:	14 Jun 2022 15:16
Last Modified:	30 Apr 2026 15:08
URI:	https://eprints.illc.uva.nl/id/eprint/1968

Actions (login required)

View Item