Abstract corpora (part 1)

June 6, 2008

In The Logical Structure of Linguistic Theory (written around 1956, although not published until 1975), Chomsky outlined a theory of linguistic form, and suggested from the beginning that “we will try to show how an abstract theory of linguistic structure can be developed within a framework that admits of operational interpretation, and how such a theory can lead to a practical mechanical procedure by which, given a corpus of linguistic material, various proposed grammars can be compared and the best of them selected” (Chomsky 1975, p. 61). In order for such a mechanical procedure to be used, it would be necessary to present an actual collection of linguistic material—utterances recorded in some suitable form—on which it could operate. A grammar, in this context, is construed as a theory (Chomsky 1975, p. 63):

By “the grammar of a language L” we mean that theory of L that attempts to deal with such problems as [projection, ambiguity, sentence type, etc.] wholly in terms of the formal properties of utterances. And by “the general theory of linguistic form” we mean the abstract theory in which the basic concepts of grammar are developed, and by means of which each proposed grammar can be evaluated.

The relationship between a language L and a grammar of L, in early generative theory, is conceived of as follows (Chomsky 1957, p. 13):

From now on I will consider a language to be a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements. … The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences. The grammar of L will thus be a device that generates all of the grammatical sequences of L and none of the ungrammatical ones.

The general proposal here is, to some degree, analogous to filling out tax forms. A person’s actual financial situation is a collection of transactions, with money being received and dispensed at various points in time. In filling out a tax form, they need to deal with certain problems—net income, withholdings, and the like—which are the financial properties of the transactions, and ignore such things as whether the money was earned by clearing clogged plumbing or by managing a team of financial auditors. The financial situation is evaluated based on the tax laws, which define the basic concepts independent of any specific person’s financial situation. The “general theory of linguistic form” is roughly analogous to the pertinent tax laws, and the “grammar of a language L” plays a role similar to the information provided on a tax form. In the tax scenario, all of this description and analysis is performed relative to an actual set of financial transactions. The language L is analogous to these transactions, in that it provides the material to be described and analyzed.

Copyright © 2008 Michael L. McCliment.


Mathematics and linguistics (part 4)

May 23, 2008

In part 1, I discussed the non-specialist’s experience with mathematics and with linguistics, and suggested that their experience is, in both cases, essentially prescriptivist in nature. Before discussing the relationship between these fields, we needed to move beyond the non-specialist’s perceptions and to understand more about the actual scope of each of these fields. I offered some observations about mathematics in part 2, and addressed linguistics in part 3. In this final part, we’ll look at the relationship between the two fields in light of the preceding discussion.

Now that we have a better understanding of what mathematics and linguistics are, let’s reconsider the question of how they are related to one another. Our initial view was that the two fields were predominantly parallel, and would interact where their respective objects of study happened to coincide (if anywhere). We represented this aspect schematically:

Fields of inquiry and their objects of study
Field of study Practitioners Objects studied
linguistics linguists language
mathematics mathematicians space, number, quantity, and arrangement

Linguistics, as we have now seen, isn’t just an arbitrary study of language. In many cases, philosophers may be said to study language. I’m not convinced that literary criticism can ever avoid studying language. But neither of these are inherently linguistic in nature. (They may be approached from a linguistic perspective; for example, one can apply pragmatics to the study of literature—but one can also study literature without using pragmatics, or any other part of linguistics.) Linguistics is the scientific study of language as a principal phenomenon.

Linguistics naturally studies the patterns that arise in languages. Even linguists who strongly reject any notion of an underlying rule-governed cognitive system propose that there are patterns in the languages that they study. Whether we adopt a rule-governed framework or not, the role of a linguistic theory is to propose that there are certain patterns—ranging from fully systematic “laws” through to weak “tendencies”—that arise in the use of language. The process of drawing conclusions from these basic propositions, and hence the entire act of moving from philosophical opinion to empirical science, is inherently an act of mathematics. Language is not mathematics. Linguistic theorization is not mathematics, but uses mathematics as its tool of reasoning. The evaluation of linguistic theories, however, is intrinsically mathematical. Has the theorist constructed a consistent theory? Do the theorist’s predictions follow from the patterns abstracted from their observations? Are there other predictions that follow from the proposed theory that also need to be evaluated empirically? These questions cannot be answered by the linguistic theory, which takes some aspect of language as its object of study, since these questions naturally take the linguistic theory itself as the object of study.

When physicists propose specific theories, these theories are evaluated not only for agreement with the empirical data and for compatibility with generally known physical properties, but they also evaluate and validate the mathematical properties of these theories. If the theory proposes a set of relationships among the observed patterns that is inconsistent, or if there is no way to construct any object satisfying the properties proposed by the theory, then that theory is rejected. The situation in linguistics is analogous. Just as the study of physical theories is part of physics, the study of linguistic theories is just as much a part of linguistics as the study of languages is.

This relationship, in which mathematics is the instrument by which we analyze linguistic theories, is the more fundamental relationship that I hinted at from the outset. It comes with an immediate corollary: mathematical modeling is necessarily a valid research methodology in linguistics. It does not replace empirical studies or the myriad research methodologies associated with them; it complements such studies. Both aspects are important for the health of linguistics as a science. For the formal evaluation of linguistic theory, however, mathematical modeling may well be the only valid methodology.

Copyright © 2008 Michael L. McCliment.