Abstract corpora (part 1)

June 6, 2008

In The Logical Structure of Linguistic Theory (written around 1956, although not published until 1975), Chomsky outlined a theory of linguistic form, and suggested from the beginning that “we will try to show how an abstract theory of linguistic structure can be developed within a framework that admits of operational interpretation, and how such a theory can lead to a practical mechanical procedure by which, given a corpus of linguistic material, various proposed grammars can be compared and the best of them selected” (Chomsky 1975, p. 61). In order for such a mechanical procedure to be used, it would be necessary to present an actual collection of linguistic material—utterances recorded in some suitable form—on which it could operate. A grammar, in this context, is construed as a theory (Chomsky 1975, p. 63):

By “the grammar of a language L” we mean that theory of L that attempts to deal with such problems as [projection, ambiguity, sentence type, etc.] wholly in terms of the formal properties of utterances. And by “the general theory of linguistic form” we mean the abstract theory in which the basic concepts of grammar are developed, and by means of which each proposed grammar can be evaluated.

The relationship between a language L and a grammar of L, in early generative theory, is conceived of as follows (Chomsky 1957, p. 13):

From now on I will consider a language to be a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements. … The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences. The grammar of L will thus be a device that generates all of the grammatical sequences of L and none of the ungrammatical ones.

The general proposal here is, to some degree, analogous to filling out tax forms. A person’s actual financial situation is a collection of transactions, with money being received and dispensed at various points in time. In filling out a tax form, they need to deal with certain problems—net income, withholdings, and the like—which are the financial properties of the transactions, and ignore such things as whether the money was earned by clearing clogged plumbing or by managing a team of financial auditors. The financial situation is evaluated based on the tax laws, which define the basic concepts independent of any specific person’s financial situation. The “general theory of linguistic form” is roughly analogous to the pertinent tax laws, and the “grammar of a language L” plays a role similar to the information provided on a tax form. In the tax scenario, all of this description and analysis is performed relative to an actual set of financial transactions. The language L is analogous to these transactions, in that it provides the material to be described and analyzed.

Mathematical logic and biological foundations

June 5, 2008

Last week, we considered generative linguistics as a theory of the faculty of language, and identified four distinct scopes that can be encompassed by the term faculty of language. In order to be clear about these different meanings, I adopted the notations FLB and FLN which were proposed by Chomsky, Fitch, and Hauser in a pair of articles, and I introduced FLC and FLG to represent a similar division independently of the evolutionary history of the faculty of language. All of this presupposes a biolinguistic perspective, in which language is treated as a biologically-founded cognitive phenomenon rather than as a collection of observable sentences. This view is essentially synchronic, considering only the current state of generative theory. It is also instructive to look at the historical development of the theoretical framework in order to understand why there is a distinction between FLC and FLG within the theory.

The origins of generative linguistics are often traced to Chomsky’s Syntactic Structures (1957/2002) and The Logical Structure of Linguistic Theory (1975, written ca. 1956). The fundamental idea of generation, however, has a longer history in algebra and in symbolic logic, dating as far back as the end of the 19th century; Moore (1894), for example, defines a particular abstract group in terms of generators and generating relations; these relations generate all of the elements of the group from the generators . A more direct antecedent to Chomsky’s initial work on generative grammar was Emil Post’s work from ca. 1921, by way of Rosenbloom’s The Elements of Mathematical Logic (Chomsky 1975 p. 105 fn 1; Post 1943, p. 215 fn 18; Rosenbloom 1950, p. 206). Rosenbloom even proposed that “one might also expect that many concepts in linguistics which have resisted all attempts up to now at clear and general formulation may now be treated with the same lucidity and rigor which has made mathematics a model for other sciences. The wealth of detail and the manifold irregularities of natural languages have often obfuscated the simple general principles underlying linguistic phenomena” (1950, p. 163). Chomsky’s early works pursued precisely this direction.

Some recent claims notwithstanding, the original literature suggests that generative linguistics was not originally conceived as a theory of the faculty of language, but rather just as a theory of language as an abstract corpus of sentences. (I’ll have more to say on this point in a later post.) The initial steps towards a treatment of generative theory as a theory of the faculty of language were evidently taken within a decade of the publication of Syntactic Structures. By the mid-1960s, Chomsky was writing an appendix to Lenneberg’s The Biological Foundations of Language (1967), and had already formulated the separation between competence and performance. A clearer distinction was drawn between the notions of I-language and E-language by the mid-1980s, where E-language treats language “independently of the mind/brain” (Chomsky 1986, p. 20), and I-language “is some element of the mind of the person who knows the language, acquired by the learner, and used by the speaker-hearer” (Chomsky 1986, p. 22). Taking generative grammar then to be the study of this I-language, we have a clear claim that it is a theory of the faculty of language.

“The” faculty of language

May 30, 2008

When we talked about the specialist’s view of linguistics, I mentioned that the scientific study of language can be approached from a variety of standpoints. Generative linguistics, in its contemporary form, assumes from the outset that there is a “species property, close to uniform across a broad range” (Chomsky 2004, p. 104) that is responsible for the human capacity for language. This faculty of language is “more or less on a par with the systems of mammalian vision, insect navigation, and others” (Chomsky 2005, p. 2). This point of view is often referred to as biolinguistics.

Broadly construed, the human faculty of language is a cognitive system, realized by the brain, that enables the production and consumption of language. Modern generative linguistics is generally conceived of as a theory of the faculty of language, or at least some portion thereof. A more precise characterization would be that generative linguistics is a family of theories of a portion of the faculty of language; theories in this family share some basic assumptions, have a variety of characteristics in common with one another, and partake of a common intellectual tradition.

The distinction between the faculty of language and what we can observe as spoken and written language is often expressed as a distinction between internal language (I-language) and external language (E-language). Intuitively, we might expect that a theory of internal language, being the cognitive component that enables language production and consumption, should provide the underpinnings of a theory of external language, which is the observable result of that cognitive function. However, there is a gap between the two.

The notion of internalized language is taken to be a “‘notion of structure’ in the mind of the speaker ‘which is definite enough to guide him in framing sentences of his own’” (Chomsky 1986, pp. 21-22, citing Otto Jespersen). The cognitive processes that lie between this “notion of structure” and the externally observable phenomena of language are not represented in the division between internal and external language. “The standard assumption in linguistics,” suggests Lyle Jenkins, “has always been that the theory of the language faculty must be embedded in a real-time theory of speech synthesis, perception, parsing, and the like in accordance with the modularity viewpoint” (2000, p. 71). The language faculty to which he refers here is already a relatively constrained conception, corresponding to the notion of I-language, and excluding a number of cognitive functions that must occur in the production and consumption of observable language.

This gap was part of the subject of discussion in a 2002 article by Hauser, Chomsky, and Fitch. In this article, they distinguish between broad and narrow senses of the term “faculty of language”. The broad sense of the faculty of language (FLB) “includes an internal computational system (FLN, below) combined with at least two other organism-internal systems, which we call ‘sensory-motor’ and ‘conceptual-intentional’” (pp. 1570-1571). Further, the narrow sense of the faculty of language (FLN) is “the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces” (p. 1571). This would be a useful distinction, if it were not for later discussion claiming instead that “the contents of FLN are to be empirically determined, and could possibly be empty, if empirical findings showed that none of the mechanisms involved are uniquely human or unique to language, and that only the way they are integrated is specific to human language” (Fitch, Hauser, and Chomsky, 2005, p. 181).

When we look at any specific theory of generative grammar, we find that the gap between the internal and external views of language will continue to exist, independent of the status of any evolutionary arguments regarding homologues in other species or the evolutionary purpose of an adaptation. In deference to the 2005 clarifications, I will allow FLB and FLN denote the distinctions related to biological homologues and evolutionary purpose. I will further distinguish between the generative faculty of language (FLG), which is the constrained sense of “faculty of language” (I-language) referenced by Jenkins, and the cognitive faculty of language (FLC), consisting of all of the cognitive processes realized by the brain that enter into language production and consumption.

Mode of inquiry, object of inquiry

May 29, 2008

My linguistics posts the past few weeks have dealt with linguistics in very general terms. The purpose of the Mathematics and linguistics posts has been to outline a specific mode of inquiry within theoretical linguistics: the examination of the mathematical properties of a proposed theory. This mode of inquiry is fairly agnostic about specific theoretical details, and is very much in line with Pierce’s contention that mathematics is “the judge over both [induction and hypothesis], and it is the arbiter to which each must refer its claims” (1881, p. 97). Before we can proceed, however, we need to look at some actual linguistic theory. As with any active branch of scientific inquiry, there are multiple theories that researchers are actively pursuing. At least for the time being, I’m going to focus on generative grammar.

Generative grammar is not a single theory, but rather a family of theories that share a number of common assumptions. Historically, there are three main periods in the development of generative grammar. The first of these saw the development of theories of transformational grammar, the second introduced the principles and parameters framework, and the most recent period focuses on minimalist grammars. The intellectual roots of generative grammar go back further, drawing on mathematical logic and adopting Post’s (1943) notion of productions. Since at least the mid-1970s, there has been a growing trend to consider generative grammar within a biolinguistic context. My goal for the next few linguistics posts is to look at this historical development in more detail, and identify some of the common assumptions that are made in generative theories of language.

