Friday, October 3, 2008

Words

The debate last night showed two quite different styles. Thanks to Wordle for these images comparing Joe Biden’s and Sarah Palin’s word frequencies.


5 comments:

Vardibidian said...

Hee hee. Beat you by, um, well, I'll go back and change the timestamp.

Thanks,
-V.

Michael said...

Hey, no fair!

I was struck by a couple of things as I tooled around with Wordler and the debate transcript. First, the transcript had a lot of errors, including misattributing several sentences from one candidate to the other. Second, Wordler doesn't do phrases by default, nor does it combine word variants.

Michael said...

Well, I was also struck by the way you can adjust variables such as color palette and word orientation to create quite different effects. Even the version that you linked to, which uses an identical font and color scheme for the two sides, still pushes a viewpoint by having Biden's words all horizontal and Palin's words a mix of horizontal and vertical. I went a little further.

Whoops, it's Wordle, not Wordler.

irilyth said...

I was really intrigued by the variable adjustment thing -- the first thing I thought when I saw the two blobs was "huh, why is Biden's all organized, and Palin's looks like an incoherent jumble?".

Do you think that deliberately tuning that sort of thing is fair, when making comparisons like this? Would you be outraged if the blobs had been tuned the other way, making Palin look sharp and organized and Biden look like a jumbly mess?

Michael said...

the first thing I thought when I saw the two blobs was "huh, why is Biden's all organized, and Palin's looks like an incoherent jumble?".

That's exactly the impression I was going for.

Do you think that deliberately tuning that sort of thing is fair, when making comparisons like this?

Well, I think that what I posted was clearly the data presentation equivalent of a caricature. I didn't adjust the word frequencies, and in fact tried to correct the underlying data, but I certainly wasn't trying (or claiming) to be fair.

In general, word clouds are an interesting approach to data presentation, but they have rather significant flaws for both individual and comparative scientific analysis. They are based on visualization techniques that are designed to stimulate creative associations and non-linear recall, not logical understanding. Word clouds are neither neutral nor accurate; they are impressionistic. Pretending otherwise would be unfair.