Computer asks: 'Where y'all from?'

If you have some free time over the holidays you might try out the online dialect quiz in the New York Times. You answer 25 multiple-choice questions based on how you pronounce certain words and what words you use to describe certain things.

For example, you might be asked whether “Mary,” “merry,” and “marry” all rhyme, whether are all pronounced differently, or whether two out of three rhyme, in which case, which two. Or you might be asked what you call an effort to sell unwanted household goods—”tag sale,” “yard sale,” or “rummage sale,” for example.

I also learned a few things taking the quiz—for example, I'm not familiar with the drive-through liquor store, but I've learned such things exist and go by names such as “brew thru” and “party barn.” I am familiar with the phenomenon of “rubbernecking” but did not know that the resulting traffic jam can be known as a “gawk block.” I also learned that a sunshower is often associated with various animals marrying or giving birth.

I tried to give answers based on the dialect I grew up with—in the town of Clearfield in central/western Pennsylvania (with Pittsburgh the nearest large city). For example, although I've lived in the Boston area for more than 30 years now, I gave the answer “traffic circle” for what I now call a rotary.

The program generates a heat map showing the probabilities of the respondent's having grown up in a specific location. In a couple of passes, the computer did a pretty good job based on my responses, placing me somewhere between Pittsburgh, Philadelphia, Buffalo, or—more improbably—Newark. I was a bit surprised that my preference for calling a sweetened carbonated beverage “pop” instead of “soda” didn't place me more firmly in western Pennsylvania, but maybe my years in Boston have given my accent a more eastern sound. My use of the term “sneakers” to refer to rubber-soled athletic shoes was instrumental in placing me near Pittsburgh or Philadelphia, according to the program. That seems to have outweighed the fact that my preference for “hoagie” for submarine or grinder increased the probability of my having grown up in Yonkers.

The blogger Kevin Drum reports much less success. He says he's always lived within a 20-mile radius of Orange County, but the program persistently and erroneously located him in Northern California. He tried to game the program with answers that he thought would more accurately locate him. He finally concluded that a lack of preference for the phrase “service road” might have resulted in the inaccuracy. He writes, “The truth is that here in Orange County we don't really have roads like this, so I don't call them anything. The only time I see them is when I'm traveling, usually in a car going north on I-5. Once you get up into the San Joaquin Valley, there are signs for these roads all over the place, and they're always called frontage roads. Since that's the only exposure I have to them, I call them frontage roads and thus peg myself as a northern Californian.”

The Times writes, “Most of the questions used in this quiz are based on those in the Harvard Dialect Survey, a linguistics project begun in 2002 by Bert Vaux and Scott Golder.” You can find more and help with the project by visiting Dr. Vaux's current website.

The Times continues, “The data for the quiz and maps…come from over 350,000 survey responses collected from August to October 2013 by Josh Katz, a graphics editor for the New York Times who developed this quiz.

You'll find the Times quiz here.

It would make an interesting carnival game to make use of the underlying data and speech-recognition technology in a robot that could converse with a contestant for a few minutes and locate the contestant's place of birth—perhaps it's already been done.

As an aside, accents and dialects are generally challenges for speech recognition that need to be overcome. Slate has a brief article, published at the time of the debut of Apple's Siri, of the issues involved.

More in Computers/cloud computing