Wednesday, October 31, 2007

Little Is Learned By Triathlon Survey

In yesterday's Capital, Nicole Young wrote a story reporting favorably on the overall reception of the triathlon by the citizens. I am acquainted with Ms. Young, and might have to send her a thank you card accompanied by a $5 gift certificate to Caffe Pronto for providing me with such good material. The article mixes 2 of this blog's favorite topics: statistics and the triathlon.

A major obstacle that can frustrate effective analysis is the misinterpretation of statistics. I have said before and shall reiterate--for most people, it's not very important to know how to derive, say, the formula for standard deviation. What is important is the ability to see where data is coming from, who is presenting it, and what certain statistics really mean.

And when the people interpreting the data make claims like these:....
City officials were surprised by the positive results
.
But the results speak for themselves. We were expecting the results to be
extremely negative, but they were saying 'We like the event, we'd like to see it
come back, but we'd like to see it done better,' which is OK.

.....it is important to assess whether this is actually the case.

Two surveys were taken of stakeholders in the triathlon, one by a firm called the Minor Group and another by the event itself.

First things first: the survey done by the triathlon's organizers is meaningless. According to the article:

The other (survey) was completed by the triathlon itself, which
surveyed the athletes, volunteers and spectators who participated in the
event.

Those participating in the triathlon responded favorably to the event,
with 83 percent calling the triathlon positive and 84 percent saying they would
participate again, the Annapolis Triathlon Club survey showed.


This is the definition of selection bias. Of course the people who participated in the triathlon are likely to think it was a success. I would bet that these numbers are way different than numbers that are collected from a randomized sample.

(Note to readers: in the interest of full disclosure, I had already seen the other numbers when I wrote the last sentence, so I knew I would be right! But it would have been fairly easy to predict.)

Now then:
According to the survey conducted by The Minor Group, businesses and
residents were split on the overall impact of the triathlon, with 34 percent of
businesses and 37 percent of residents saying it had a positive impact. The
number saying it was negative was slightly lower and undecided responses made up the remainder of the survey.

Luckily for us, based on these numbers we can group businesses and residents together, and don't have to analyze them separately. Otherwise this post would be unnecessarily long, and that's not good for anybody. Let's assume that the numbers go something like this:

-35% in favor
-30% against
-35% undecided

This is a lot of undecided's. It is probably accurate that 1/3 of people have no feeling either way. The issue here is the concept of sampling. Since it would be way to expensive and time consuming to survey all 40,000 city residents/people involved in the event, you have to take a sampling. Such a high portion of undecided responses means that were you actually able to survey everyone, it is more likely that the numbers would significantly change because you would be capturing everyone that has an opinion. And with the survey results so close, the majority faction could easily reverse.

Furthermore:
The whole thing was rushed through and didn't take into account all of the
people affected," he (
W1RA President Doug Smith) said. "The residents were next
to the bottom of the list and the churches were exactly at the bottom of the
list.

If accurate, this fact skews the data even more. Residents and churches are the most likely to oppose the event, and if they were not accurately represented, the true 'against' figure is higher.

So, in summary:

1. The survey done by the event itself should be used for kindling.

2. The 'random' survey quite possibly reflects data distortions resulting from sampling difficulties.

3. Since the only survey that matters is in fact too close to call--and not overwhelmignly positive--saying things like "but the results speak for themselves" and declaring resounding support for a future event is inappropriate when based solely on the surveys.

No comments: