Anecdotal Evidence versus Statistics

Posted By Pamela on Aug 17, 2008 | 8 comments

One of the running jokes in physics/astronomy departments is that astronomers consider 4 instances of anything as statistically significant. In fact, the story goes, two points is enough to define a trend, and 1 is enough to form a theory.

Take for instance our solar system. Up until 1995 it was the only one with a normal sun we knew of (there were some pulsar planets found earlier). Based on it, and it alone, we built an entire detailed nebular theory of solar system formation that we think is mostly true.

This isn’t the only place in research where instances of “observation” lead to “understanding.” With observational astronomy we at least have the option to go out and search for new data. And sometimes we even find it. Sometimes. And until that sometimes is realized, most astronomers are more than willing to say “This is based on 1 example – we’re looking for more.”

The Sloan Digital Sky Survey is an example of large science done well. As it chews through large areas of the sky we’re greatly multiplying the sizes of our samples and the numbers of our examples. Tidal tails were barely understood a dozen years ago, and now we know of many of them wraping like starry spaghetti around our galaxy. The number of overlapping galaxies has jumped from under a thousand to many thousand thanks to SDSS and its child project Galaxy Zoo (which sprang out on its own, via spontaneous, pub fertilized, gestation). The science coming out of SDSS is amazing (and it’s currently being live blogged over here).

The Sloan Digital Sky Survey is just the starting point. In the coming years two new large survey telescopes will be coming online. The first is Pan-STARRS, which has a primary mission of trying to find pretty much every medium to large sized rock or chunk of ice out in the solar system threatening to cross the orbit of the Earth. It is set to come online late this year. Following on its heels is the LSST, which presently anticipates seeing first light in 2015 and has science goals of finding the remaining unidentified random small objects in the solar system as well as finding transitory objects like supernovae.

On their way to accomplishing their core missions, these telescope projects will obtain hundreds of images of different areas of pretty much every corner of the sky they are each able to access. By adding these images together we can probe deeper and deeper into the sky as more and more months and years pass by. In this deep means faint, and as we survey the faintest fuzzies of the sky, astronomers will find a new definition of what it means to have a statistically significant data set.

Hopefully, in a couple years, the physicists will joke that astronomers consider a mere 40 or 400 data points as being statistically significant.

Statistically significant is a fancy way to say a result is believable. If you see me toss a basketball into a hoop from 5 meters once, it might be a fluke (or a miracle, more likely). If you see me do it 28 out of 30 times, you can say with statistical certainty that I am capable of accurately throwing basket balls (this would never ever happen). Statistically significant crops up in many places, and we use it for silly reasons. The house I live in is Pink (it’s an old Victorian that once was a farm house). When we bought it my colour blind husband told people it was salmon or mauve or puce. I said it was pink. This lead to us surveying every poor soul who came over about what color they perceived our house to be. Based on a statistically significant sample of about 20 people who answered at the 80% level “Pink” or “Pink or Salmon,” my husband now tells the pizza guy we’re the first pink house after the intersection

This idea of statistical significance applies across all non-theoretical astronomy research. Along with the hard science areas of observational astronomy, high-energy accelerator-based cosmology, and rover / probe based planetary science, astronomy/physics departments sometimes also hide a few soft scientists working on astronomy educational research who can do statistically significant research as well.

In the hard versus soft science discussion, I’m a bit half-baked. Different bits of my research likes to sit on both sides of that fence.

In Astronomy Education Research we have very few large scale surveys. There are the occasional longitudinal studies. Using them I can tell you how performance varies by ethnicity and gender on the GRE. I could tell you how graduating classes matriculate men and women across the decades. There is even some large scale surveying of what type of jobs we get, what classes we take, and who does better on exactly which standardized tests.

But these surveys don’t always help instructors like me get inside my students heads and understand how the specifics of what I do does or does not improve learning. There is no 10,000 person survey to explain if one demonstration of diffraction is better at explaining spectroscopy than another demonstration. In designing my courses, I rely on my personal experience, and I rely on limited case studies. I do have data sets demonstrating demos do deserve to be done. I have facts and figures fully justifying labs. Somethings I know about my formal class thanks to the work of others. But mostly, I have to follow my gut and simply employ what are called best practices (broad concepts like using labs, encouraging student interaction, and keeping my class actively engaged in the content rather then lecturing)

As a new media content provider, I find that the best practices for blogging, podcasting and even YouTube have yet to be defined. (And I’m doing my part, where I can.) As a trained researcher, I get frustrated with comments that all my experience says are true, but I personally don’t have the data (or know of anyone else who has published the data) to back up. For instance, a quick run through YouTube will find videos that qualify as trash, campy, quality, and OMG why. My experience looking at hit numbers tells me that it is the camp and the why that seem to get hit the most. To prove my stomachs interpretation of a not statistically significant data set I’d need to define a way to quantify a video as into a category like “campy,” and then use that metric to classify a few 1000 videos, and then document how many hits those videos received in a known period. To help remove bias, I’d want to do things like only use first time posters, and similar restricting factors to keep my data as easy to understand as possible. Its a question that intrigues me. And if I had a fleet of marketing students, I’d probably put them to work answering these questions.

But I don’t have that fleet, so I answer the questions I can, as I can, and I chew out papers when I can.

And until that data can be gathered, I and others talk in anecdotes. In the press room at AAS, we all talk about the seemly magical combinations of actions that we have to do to get our posts dugg on digg. Sometimes it feels like our not-well-documented anecdotes amount to, “When the moon is full, and I include the word serendipity and bite my tongue while hitting the publish button I hit the front page.” The same is true of some teaching.

The trick is to remember, all we have are our personal anecdotes, and what works for Phil Plait won’t work for me. And what works for Fraser probably won’t work either. And nothing will ever replicate the magic of Astronomy Picture of the Day.

New media is young. Large surveys will come. Just not today. Until then, let the gossip and story telling begin (just don’t claim statistical significance.)


  1. This is something that’s been bothering me for a while. There does seem to be some magic ‘limit’ of exposure, beyond which the frequency of people seeing your post/work/video vs the frequency of people linking to it, or faving it or digging it etc is enough to carry it all the way to the relevant popular page… kind of like an escape velocity. Every time it comes up, I think “there’s got to be some quantifiable statistical mechanism happening here”, but I never find out if there is or not.

  2. I’ve been keeping a daily hit list on the YouTube radio astronomy clips I put there. I tried at first to see if there was a correlation between promotional actions I took and jumps in the hit count, but I just couldn’t find anything significant.

    I do wish I could know more about the viewers, though. Are they stumbling onto the clips haphazardly or accidentally? Did they then look at other astronomy clips like The Pale Blue Dot? Not that I’d stoop to making videos just to appeal to the lowest common denominator or anything, but I’d like to know how to connect with my audience more effectively.

    When I make programs for people (I’m a video producer by trade) I ask a few key questions:
    – What is the budget?
    – Who is the audience?
    – What message do you want the audience to come away with?
    With those answered I can begin to craft a program that will achieve the end result the client is looking for. Often I don’t get an answer to the first one, though.
    Oh well! 😆

  3. The Sloan Digital Sky Survey has many righteous claims to fame but I’m not sure it’s the “starting point” of rigorous statistical astronomy. It’s probably healthy for new generations to imagine that they’re inventing everything, though sometimes it sounds more like fogetting than inventing.

    In 1967 I studied statistical astronomy with Harold Weaver at UC Berkeley. Even without the brilliant SDSS technology, there were then (and had been for decades) many areas where numbers of data points allowed excellent significance – of course limited to areas where the measurements could be made then. As the next decades will be limited to measurements possible even with the great upcoming survey instruments and programs. The faintest fuzzies may become the new backyard, while the next new generation may hope for a high confidence level in counting extragalactic angels on the head of the latest pin.

  4. There’s a big difference between the data of physics and the data of blogland (and YouTubeland). The first are the realm of physicists and people interested in science. The second are the realm of the Average Man. If we believe the results of TV viewing habit surveys, the A.M. is not a philosopher (theoretical or natural). Consider, for instance, the immense popularity of and “reality TV”.

    I think it would be more satisfying to attract 1000 people of like interest (like readers of this blog) than 10000 of the Average Man.

    Back to your first point, now that we have tons of data, would it make sense to revisit the theories that are “based on one example” and see if they still hold up? I don’t think there have been any great revolutions in cosmology. If there haven’t, that says a lot about those “one-note theories” and the people behind them.

  5. “I do have data sets demonstrating demos do deserve to be done.” Lovely alliteration!

    Having read the other comments on this post, I now realize that mine is by far the least useful. Oh well…

  6. I have to admit I’m skeptical about the claim that the SDSS, Pan-STARRS, LSST, or whatever new telescope will change the tendency of astronomers to work with small samples. Uniform surveys definitely do make it possible to do statistics, but there have been uniform surveys taken for many years. The new ones certainly increase the numbers of stars that appear in the surveys, but a big part of astronomy focuses on the weird objects. For example, the Parkes Multibeam pulsar survey discovered over a thousand radio pulsars, but a great deal of the research that happens is done on very small subsets: the binary millisecond pulsars (tens?), for example, the intermittent pulsar (only one discovered so far) the RRATs (about ten known), the anomalous X-ray pulsars (about five known)… Maybe the best example is the double pulsar: exactly one system is known where there are two neutron stars in orbit and where we can see pulsations from both. What’s more, the system is the most highly relativistic binary known, and it’s viewed so nearly edge-on that it’s eclipsing (in spite of the fact that the eclipsing volume is about the size of the Earth). There have been more papers than I can shake a stick at published about this single system, many of them extremely interesting (for example a very recent Nature paper measuring relativistic precession by fitting the eclipse profiles; sure enough it matches GR’s predictions).

    There’s no question improving the number of known objects will help, but we are also almost certain to discover another bizarre anomaly that will be extremely interesting.

  7. i love it when you post really long articles like these, then i can unwind and really dig in without fearing that its gonna run out too soon, but i understand you have a very hectic schedule. thanks.

    1.30 a.m

  8. You recognize therefore considerably when it comes to this subject, produced me individually believe it from numerous varied angles. Its like men and women don’t seem to be interested except it’s one thing to do with Girl gaga! Your own stuffs nice. Always deal with it up!

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.