## Multiset operations as binary operations (part 1)

June 9, 2008

Review: Associativity, commutativity, and idempotence

A binary operation $\cdot$ on a set $X$ is a function $\cdot: X\times X \to X$. It is usually written using an infix notation $x\cdot y$ rather than a functional notation such as $\cdot\left(x, y\right)$.

The operation $\cdot$ is associative if and only if $a\cdot\left(b\cdot c\right) = \left(a\cdot b\right)\cdot c$ for all $a,b,c\in X$. It is commutative if and only if $a\cdot b = b\cdot a$ for all $a,b\in X$. It is idempotent if and only if $a\cdot a = a$ for all $a\in X$.

We’ve defined four operations on families of multisets $\left\{\mathcal{M}_i\right\}_{i\in I}$ where $\mathcal{M}_i \in \mathbf{MSet}_X$ for all $i\in I$: sums, products, intersections, and unions. As I’ve already commented in a couple of places, we can restrict our attention to families where $\left|I\right| = 2$. This provides us with four binary operations on $\mathbf{MSet}_X$. Today’s post collects together the properties of these binary operations.

Proposition 1: The operations $\uplus$, $\:\cdot\!\!\!\!\cup$, $\cap$, and $\cup$ on $\mathbf{MSet}_X$ are all associative and commutative.

Proof:

Let $\mathcal{M}_1, \mathcal{M}_2, \mathcal{M}_3 \in \mathbf{MSet}_X$. The first four parts follow directly from the associativity and commutativity of $+$ and $\cdot$ as binary operations on $\mathbf{Card}$.

Associativity of $\uplus$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \uplus \left(\mathcal{M}_2 \uplus \mathcal{M}_3\right) & \left(X, f_1 + \left(f_2 + f_3\right)\right) \\ & \left(X, \left(f_1 + f_2\right) + f_3\right) \\ & \left(\mathcal{M}_1 \uplus \mathcal{M}_2\right) \uplus \mathcal{M}_3\end{array}$

Commutativity of $\uplus$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \uplus \mathcal{M}_2& \left(X, f_1 + f_2\right) \\ & \left(X, f_2 + f_1\right) \\ & \mathcal{M}_2 \uplus \mathcal{M}_1\end{array}$

Associativity of $\:\cdot\!\!\!\!\cup$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \:\cdot\!\!\!\!\cup\: \left(\mathcal{M}_2 \:\cdot\!\!\!\!\cup\: \mathcal{M}_3\right) & \left(X, f_1 \cdot \left(f_2 \cdot f_3\right)\right) \\ & \left(X, \left(f_1 \cdot f_2\right) \cdot f_3\right) \\ & \left(\mathcal{M}_1 \:\cdot\!\!\!\!\cup\: \mathcal{M}_2\right) \:\cdot\!\!\!\!\cup\: \mathcal{M}_3\end{array}$

Commutativity of $\:\cdot\!\!\!\!\cup$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \:\cdot\!\!\!\!\cup\: \mathcal{M}_2& \left(X, f_1 \cdot f_2\right) \\ & \left(X, f_2 \cdot f_1\right) \\ & \mathcal{M}_2 \:\cdot\!\!\!\!\cup\: \mathcal{M}_1\end{array}$

The remaining parts follow from the following facts:

1. Let $A, B$ be partially ordered sets. Then $\inf \left(A \cup \left\{\inf B\right\}\right) = \inf\left(A\cup B\right)$ and $\sup \left(A \cup \left\{\sup B\right\}\right) = \sup \left(A\cup B\right)$ whenever the suprema and infima exist.
2. The supremum and infimum of every set of cardinals exist.

Associativity of $\cap$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \cap \left(\mathcal{M}_2 \cap \mathcal{M}_3\right) & \left(X, \inf \left\{f_1, \inf\left\{f_2, f_3\right\}\right\}\right) \\ & \left(X, \inf\left\{f_1, f_2, f_3\right\}\right) \\ & \left(X, \inf \left\{\inf\left\{f_1, f_2\right\}, f_3\right\}\right) \\ & \left(\mathcal{M}_1 \cap\mathcal{M}_2\right) \cap \mathcal{M}_3\end{array}$

Commutativity of $\cap$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \cap \mathcal{M}_2& \left(X, \inf \left\{f_1, f_2\right\}\right) \\ & \mathcal{M}_2 \cap \mathcal{M}_1\end{array}$

Associativity of $\cup$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \cup \left(\mathcal{M}_2 \cup \mathcal{M}_3\right) & \left(X, \sup \left\{f_1, \sup\left\{f_2, f_3\right\}\right\}\right) \\ & \left(X, \sup\left\{f_1, f_2, f_3\right\}\right) \\ & \left(X, \sup \left\{\sup\left\{f_1, f_2\right\}, f_3\right\}\right) \\ & \left(\mathcal{M}_1 \cup\mathcal{M}_2\right) \cup \mathcal{M}_3\end{array}$

Commutativity of $\cup$.

$\begin{array}{r@{\:=\:}l} \mathcal{M}_1 \cup \mathcal{M}_2& \left(X, \sup \left\{f_1, f_2\right\}\right) \\ & \mathcal{M}_2 \cup \mathcal{M}_1\end{array}$

Proposition 2: The operations $\cap$ and $\cup$ on $\mathbf{MSet}_X$ are idempotent.

Proof:

Let $\mathcal{M}\in \mathbf{MSet}_X$.

Idempotence of $\cap$.

$\begin{array}{r@{\:=\:}l} \mathcal{M} \cap \mathcal{M}& \left(X, \inf \left\{f, f\right\}\right) \\ & \left(X, f\right) \\ & \mathcal{M}\end{array}$

Idempotence of $\cup$.

$\begin{array}{r@{\:=\:}l} \mathcal{M} \cup \mathcal{M}& \left(X, \sup \left\{f, f\right\}\right) \\ & \left(X, f\right) \\ & \mathcal{M}\end{array}$

The other two operations—sum and product—are not idempotent. One counterexample is the multiset $\mathcal{M} = \left\{a^3\right\}$, for which we have

$\mathcal{M}\uplus\mathcal{M} = \left\{a^6\right\} \neq \mathcal{M}$

and

$\mathcal{M}\:\cdot\!\!\!\!\cup\:\mathcal{M} = \left\{a^9\right\} \neq \mathcal{M}$.

## Abstract corpora (part 1)

June 6, 2008

In The Logical Structure of Linguistic Theory (written around 1956, although not published until 1975), Chomsky outlined a theory of linguistic form, and suggested from the beginning that “we will try to show how an abstract theory of linguistic structure can be developed within a framework that admits of operational interpretation, and how such a theory can lead to a practical mechanical procedure by which, given a corpus of linguistic material, various proposed grammars can be compared and the best of them selected” (Chomsky 1975, p. 61). In order for such a mechanical procedure to be used, it would be necessary to present an actual collection of linguistic material—utterances recorded in some suitable form—on which it could operate. A grammar, in this context, is construed as a theory (Chomsky 1975, p. 63):

By “the grammar of a language $L$” we mean that theory of $L$ that attempts to deal with such problems as [projection, ambiguity, sentence type, etc.] wholly in terms of the formal properties of utterances. And by “the general theory of linguistic form” we mean the abstract theory in which the basic concepts of grammar are developed, and by means of which each proposed grammar can be evaluated.

The relationship between a language $L$ and a grammar of $L$, in early generative theory, is conceived of as follows (Chomsky 1957, p. 13):

From now on I will consider a language to be a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements. … The fundamental aim in the linguistic analysis of a language $L$ is to separate the grammatical sequences which are the sentences of $L$ from the ungrammatical sequences which are not sentences of $L$ and to study the structure of the grammatical sequences. The grammar of $L$ will thus be a device that generates all of the grammatical sequences of $L$ and none of the ungrammatical ones.

The general proposal here is, to some degree, analogous to filling out tax forms. A person’s actual financial situation is a collection of transactions, with money being received and dispensed at various points in time. In filling out a tax form, they need to deal with certain problems—net income, withholdings, and the like—which are the financial properties of the transactions, and ignore such things as whether the money was earned by clearing clogged plumbing or by managing a team of financial auditors. The financial situation is evaluated based on the tax laws, which define the basic concepts independent of any specific person’s financial situation. The “general theory of linguistic form” is roughly analogous to the pertinent tax laws, and the “grammar of a language $L$” plays a role similar to the information provided on a tax form. In the tax scenario, all of this description and analysis is performed relative to an actual set of financial transactions. The language $L$ is analogous to these transactions, in that it provides the material to be described and analyzed.

## Mathematical logic and biological foundations

June 5, 2008

Last week, we considered generative linguistics as a theory of the faculty of language, and identified four distinct scopes that can be encompassed by the term faculty of language. In order to be clear about these different meanings, I adopted the notations FLB and FLN which were proposed by Chomsky, Fitch, and Hauser in a pair of articles, and I introduced FLC and FLG to represent a similar division independently of the evolutionary history of the faculty of language. All of this presupposes a biolinguistic perspective, in which language is treated as a biologically-founded cognitive phenomenon rather than as a collection of observable sentences. This view is essentially synchronic, considering only the current state of generative theory. It is also instructive to look at the historical development of the theoretical framework in order to understand why there is a distinction between FLC and FLG within the theory.

The origins of generative linguistics are often traced to Chomsky’s Syntactic Structures (1957/2002) and The Logical Structure of Linguistic Theory (1975, written ca. 1956). The fundamental idea of generation, however, has a longer history in algebra and in symbolic logic, dating as far back as the end of the 19th century; Moore (1894), for example, defines a particular abstract group in terms of generators and generating relations; these relations generate all of the elements of the group from the generators . A more direct antecedent to Chomsky’s initial work on generative grammar was Emil Post’s work from ca. 1921, by way of Rosenbloom’s The Elements of Mathematical Logic (Chomsky 1975 p. 105 fn 1; Post 1943, p. 215 fn 18; Rosenbloom 1950, p. 206). Rosenbloom even proposed that “one might also expect that many concepts in linguistics which have resisted all attempts up to now at clear and general formulation may now be treated with the same lucidity and rigor which has made mathematics a model for other sciences. The wealth of detail and the manifold irregularities of natural languages have often obfuscated the simple general principles underlying linguistic phenomena” (1950, p. 163). Chomsky’s early works pursued precisely this direction.

Some recent claims notwithstanding, the original literature suggests that generative linguistics was not originally conceived as a theory of the faculty of language, but rather just as a theory of language as an abstract corpus of sentences. (I’ll have more to say on this point in a later post.) The initial steps towards a treatment of generative theory as a theory of the faculty of language were evidently taken within a decade of the publication of Syntactic Structures. By the mid-1960s, Chomsky was writing an appendix to Lenneberg’s The Biological Foundations of Language (1967), and had already formulated the separation between competence and performance. A clearer distinction was drawn between the notions of I-language and E-language by the mid-1980s, where E-language treats language “independently of the mind/brain” (Chomsky 1986, p. 20), and I-language “is some element of the mind of the person who knows the language, acquired by the learner, and used by the speaker-hearer” (Chomsky 1986, p. 22). Taking generative grammar then to be the study of this I-language, we have a clear claim that it is a theory of the faculty of language.

## Properties of multiset union

June 4, 2008

We defined the union of a family of multisets last week. Today we’re going to look at some basic properties of the union operation on $\mathbf{MSet}_X$.

Suppose $\left\{\mathcal{M}_i = \left(X, f_i\right)\right\}_{i\in I}$ is a family of multisets over $X$. Then the following relationships hold:

(i) $\mathrm{support}\left(\bigcup_{i\in I}{\mathcal{M}_i}\right) = \bigcup_{i\in I}{\mathrm{support}\left(\mathcal{M}_i\right)}$.

(ii) $\mathcal{M}_i \subseteq \bigcup_{i\in I}{\mathcal{M}_i}$ for all $i\in I$.

(iii) $\bigcup_{i\in I}{\mathcal{M}_i} \subseteq \biguplus_{i\in I}{\mathcal{M}_i}$.

(iv) $\bigcup_{i\in I}{\mathcal{M}_i} \subseteq \:\cdot\!\!\!\!\!\!\;\bigcup_{i\in I}{\mathcal{M}_i}$ provided that all of the multisets $\mathcal{M}_i$ have the same support.

(v) $\bigcap_{i\in I}{\mathcal{M}_i} \subseteq \bigcup_{i\in I}{\mathcal{M}_i}$.

The proof of part (i), like the similar result for multiset intersections, is a straightforward series of equivalencies:

$\begin{array}{r@{\:\Leftrightarrow\:}l} x\in \mathrm{support}\left(\bigcup_{i\in I}{\mathcal{M}_i}\right) & \sup f_i(x) \neq 0 \\ & \left(\exists i\in I\right)\: f_i\left(x\right)\neq 0 \\ & \left(\exists i\in I\right)\: x\in\mathrm{support}\left(\mathcal{M}_i\right) \\ & x\in\bigcup_{i\in I}{\mathrm{support}\left(\mathcal{M}_i\right)}. \end{array}$

Part (ii) follows directly from the definition of the supremum of a set, since $f_i\left(x\right) \leq \sup f_i\left(x\right)$ for all $x\in X$ and $i\in I$.

To prove part (iii), start by recalling that the sum of a family of cardinals $\left\{\kappa_i\right\}_{i\in I}$ is given by

$\sum_{i\in I}{\kappa_i} = \left|\bigcup_{i\in I}{A_i}\right|$

where the sets $A_i$ are disjoint and $\kappa_i = \left|A_i\right|$ for all $i\in I$. Therefore, letting $A_{i,x}$ be a family of disjoint sets such that $\left|A_{i,x}\right| = f_i\left(x\right)$, we have

$\kappa_x = \sum_{i\in I}{f_i\left(x\right)} = \left|\bigcup_{i\in I}{A_{i,x}}\right| \geq \left|A_{i,x}\right| = f_i\left(x\right)$

for all $i\in I$ and $x\in X$. This establishes that $\kappa_x$ is an upper bound for $f_i\left(x\right)$ in the linearly ordered set $\left(\mathbf{Card}, \leq\right)$, and so $\kappa_x \geq \sup f_i\left(x\right)$. The result in part (iii) follows immediately since this holds for all $x\in X$.

The proof of part (iv) is similar, but relies on the definition of the product of a family of cardinals. The product of a family of cardinals $\left\{\kappa_i\right\}_{i\in I}$ is given by

$\prod_{i\in I}{\kappa_i} = \left|\prod_{i\in I}{A_i}\right|$

where $\kappa_i = \left|A_i\right|$ for all $i\in I$. Letting $A_{i,x}$ be a family of sets such that $\left|A_{i,x}\right| = f_i\left(x\right)$, we have

$\kappa_x = \prod_{i\in I}{f_i\left(x\right)} = \left|\prod_{i\in I}{A_{i,x}}\right| \geq \left|A_{i,x}\right| = f_i\left(x\right)$

for all $i\in I$ and $x\in X$, provided that $A_{i,x} \neq \emptyset$ for all $i\in I$. This establishes that $\kappa_x$ is an upper bound for $f_i\left(x\right)$ in the linearly ordered set $\left(\mathbf{Card}, \leq\right)$ whenever $x\in\mathrm{support}\left(\bigcup_{i\in I}{\mathcal{M}_i}\right)$, which is the common support of all multisets in the family $\left\{\mathrm{M}_i\right\}_{i\in I}$. If $x$ is not in this support, then $\prod_{i\in I}{f_i\left(x\right)} = 0 = \sup f_i\left(x\right)$. In either case, $\kappa_x \geq \sup f_i\left(x\right)$, and part (iv) follows immediately since this holds for all $x\in X$.

The requirement that all multisets in the family have the same support is necessary; you can see this by considering the product and union of the multisets $\left\{a\right\}$ and $\left\{b\right\}$.

Finally, part (v) follows immediately from the fact that

$\inf f_i\left(x\right) \leq f_i\left(x\right) \leq \sup f_i\left(x\right)$

for all $i\in I$ and $x\in X$.

## Transitions: May → June

June 2, 2008

Wow… one month already. (Technically, slightly more than a month, since my first post was published on 21 April.) Pretty much everything on the site is new since that post, but a couple of things are worth mentioning.

• God Plays Dice has been around since June 2007. Isabel Lugo’s posts range from the somewhat whimsical to self-contained discussions of mathematical topics. Always worth reading.
• The only linguistics site I’ve added so far is Language Log, which moved to a new server a couple months ago where the new material is posted. As a bonus, they’ve kept their older material available as well. And yes, this is plural they; several well-known linguists author posts there.
• John Armstrong started The Unapologetic Mathematician back in January 2007. Mathematical topics are explored in some depth over the course of several posts, much like I’m attempting to do. In fact, this is the site that showed me what can be done with a blog and inspired me to start writing here.
• Topological Musings is, for me, the most recent discovery out of these four additions. Vishal Lama started writing this in November 2007, and was recently joined by Todd Trimble. I really like the idea of their Problem of the Week series, and just wish my current schedule allowed me enough time to participate.

Bibliography and glossary.

I’ve added a bibliography (12 entries at the end of May) and a glossary (4 entries at the end of May) to the site. I’d really rather have these in a limited-authorship wiki associated with the blog, but I’m not set up to do that at the moment.

Site design.

As some early readers are aware, I’ve changed the theme to something more $\LaTeX$-friendly. With some of the themes available here, the $\LaTeX$ rendering process gives results that aren’t particularly legible. (Thanks for prompting me to take care of that, Vishal!) I’m still not exceptionally happy with the result (by default, the theme I’m using now puts a little bit of extra padding around each image, including rendered mathematics), so things may yet change again in the future when I have time to deal with the CSS.

One of the things I’d really like to do is distinguish visually between external links, links back to other posts here, and links to the bibliography and glossary. If I ever tackle the CSS project, that will be part of the design. In the meantime, I’ve started making sure that I put title text on every link. (In most browsers, you’ll be able to see it by letting your pointer hover over the link.) External links are now preceded by an arrow (→), and the text should make it obvious when a link leads to the bibliography or the glossary.

## Perspectives 6: Uniform surjective grading?

June 1, 2008

I’m somewhat disconcerted by this recent article that appeared in USA Today. I wouldn’t have noticed it, except for the fact that it was critiqued to some extent by Mark Chu-Carroll and various commenters over at Good Math, Bad Math. I might be somewhat happier if I hadn’t noticed it.

The proposal to impose a 50 point minimum grade for an assignment (on a 100 point scale) is problematic for a number of reasons. Some of these are practical, and can be seen clearly by considering some analogous scenarios. I’m fairly certain that it would be in some people’s interest to impose similar artificial bounds in other situations, but such a proposition would be a complete non-starter. Declaring an NHL player’s plus-minus rating to have a minimum value of 4 would clearly invalidate the descriptive value of the statistic. Declaring that employees can be given no worse than a “satisfactory” review would have interesting repercussions in the corporate environment, since it would effectively eliminate the ability to fire someone for non-performance of their job, or even incompetence.

Apparently, however, mathematical competence is not required in this arena. According to the article, the argument in favor of this artificial minimum is this:

Other letter grades — A, B, C and D — are broken down in increments of 10 from 60 to 100, but there is a 59-point spread between D and F, a gap that can often make it mathematically impossible for some failing students to ever catch up.

“It’s a classic mathematical dilemma: that the students have a six times greater chance of getting an F,” says Douglas Reeves, founder of The Leadership and Learning Center, a Colorado-based educational think tank who has written on the topic.

Last time I checked, the closed interval $S := \left[0,100\right]$ in the integers contains 101 elements, and a standard 90-80-70-60 scale allocates them in such a way that there is a 60 point spread between the minimum scores meriting an F and a D. Feel free to check my arithmetic, but even a hand calculator can usually get $60 - 0 = 60$ correct. Apparently correct arithmetic is not important in an argument supporting how grades should be computed.

I’m also astounded—perhaps I shouldn’t be, but I am—that the head of a think tank would propose that statement as a “classic mathematical dilemma”. It certainly appears to contain two classic mathematical errors, however: one being a fencepost error, and the other being an unsupported assumption that grades are uniformly distributed over $S$. If the grades obtained by students were distributed uniformly over $S$, then the expected value of student grades would be 50. So far as I am aware, neither high school graduation rates nor university retention and graduation rates support such a claim. Moreover, we should expect strictly more values corresponding to an A than to any of the grades B, C, or D on a given assignment or exam, since it contains one more element of $S$ than the others do. Would anyone care to provide the empirical evidence justifying such a result?

As unsettling as these errors are, I find the sidebar to be more disquieting. The examples it provides for calculating grades do not correspond to the general assumption in the article that a letter grade of F is recorded, and then considered to be 0 at some later point. Rather, it shows what happens when we have recorded the scores on a 100-point scale, and then compute the mean with the actual score or with a false minimum of 50. In these comparisons, the recorded failing grade is not in general a zero.

Let’s take a slightly closer look at what is happening here. There are two grading scales in use here: the closed interval $S=\left[0,100\right]$ of integer scores, and the set $L=\left\{\mathrm{A,B,C,D,F}\right\}$ of letter grades. The 90-80-70-60 convention establishes a surjective mapping $\varphi: S\to L$ that preserves the standard order on each of these sets. However, $\varphi$ is not injective, so we cannot define a consistent arithmetic on $L$ in terms of $\varphi^{-1}$. The argument provided in support of this 50-minimum grading scale is, in effect, an argument about how a representative of the preimage $\varphi^{-1}\left(x\right)$ should be selected for each $x\in L$.

Unfortunately, this argument does not go through if we already have scores recorded. We are not free to select any element of the preimage. At least in the sciences, there’s a term for such an operation: it’s called falsification of data. Viewed this way, the implications of the grading policies being adopted by some school districts are, at best, disturbing.