## Perspectives 4: Predictably random

I don’t follow the churn of technology news closely anymore. So, every once in a while, one of the webcomics that I read will send me scurrying to Google one topic or another. The most recent of these was Security Holes at xkcd:

My immediate reaction was “Huh? What fiasco?” This was followed almost immediately by a help-me-please-I’ve-been-living-under-a-rock trip to Google. A quick search not only gave me the answer, but caused me to choke on my coffee. You see, apparently someone actually removed the code that seeded the random number generator!

For those of you who don’t already know, let me explain why I choked on my coffee. Let’s start with a story. Once upon a time, in my very first programming class, we were given a programming assignment that introduced us to screen coordinates. Our goal was to “fill the sky with stars”—in other words, we needed to put a bunch of random white points on a black background. Not particularly difficult, unless either

1. you didn’t get the fact that the origin is at the top-left of the screen, and the positive y-axis points down (forgetting this plotted the point somewhere off the screen), or
2. you didn’t get how to scale a floating-point number in the interval $\left(0, 1\right)$ to get an integer in the interval $\left[1, n-1\right]$ (getting this wrong also plotted points off the screen, often at the origin).

The trickier part was this: I correctly used the random number generator, but every time I ran the program it gave me the exact same pattern of “stars”. The problem is that nearly all random number generators on computers don’t do what they claim; there’s not anything actually random about the process used to generate the numbers.

What computers actually have are pseudo-random number generators (PRNGs). Here’s the thing about a PRNG: it actually returns elements from a fixed, finite sequence of numbers. (There’s a good writeup on how PRNGs work at random.org.) However it happens to be implemented, a PRNG is characterized by a sequence

$\sigma\left(0\right), \sigma\left(1\right), \dots, \sigma\left(s-1\right),$

where $s$ is the length of the sequence. If it has just given you $\sigma\left(i\right),$ it will give you $\sigma\left(i+1\left(\mathrm{mod}\ s\right)\right)$ the next time you call it. Each time that I ran my stars program, the PRNG started at the same point in the sequence, and there was no degree of randomness from one run to the next.

PRNGs generally have one (or more) ways to make the numbers they return appear to be more random than they are. I mentioned above how it decides on the next number to return, but not how to get the first number. This is where the seed enters into the process. When you seed the PRNG, you tell it where to start in $\sigma.$ Seeding it with a constant doesn’t help, because that is also exceptionally predictable. Usually, the PRNG will be seeded with something that is at least hard to guess, like the millisecond portion of the time that it is seeded. (When I fixed my stars program, I set the seed to the current time of day.)

The thing is, this is basic. I’m not entirely sure how you can learn programming and not learn about the consequences of pseudo-random generation, and what you have to do for it to be of any use at all. It comes up in graphics programming, computer simulation, sampling, security, and many other areas. Seeding the PRNG is like putting gas in the car before driving out to a remote camping spot for the weekend. It’s like measuring someone for a bridesmaid’s dress before cutting into the \$40-per-yard material that it’ll be made from. Sure, you can fail to do so; but the consequences of this failure…

So there you have it—that’s why I choked on my coffee.