One of the running jokes in physics/astronomy departments is that astronomers consider 4 instances of anything as statistically significant. In fact, the story goes, two points is enough to define a trend, and 1 is enough to form a theory.
Take for instance our solar system. Up until 1995 it was the only one with a normal sun we knew of (there were some pulsar planets found earlier). Based on it, and it alone, we built an entire detailed nebular theory of solar system formation that we think is mostly true.
This isn’t the only place in research where instances of “observation” lead to “understanding.” With observational astronomy we at least have the option to go out and search for new data. And sometimes we even find it. Sometimes. And until that sometimes is realized, most astronomers are more than willing to say “This is based on 1 example – we’re looking for more.”
The Sloan Digital Sky Survey is an example of large science done well. As it chews through large areas of the sky we’re greatly multiplying the sizes of our samples and the numbers of our examples. Tidal tails were barely understood a dozen years ago, and now we know of many of them wraping like starry spaghetti around our galaxy. The number of overlapping galaxies has jumped from under a thousand to many thousand thanks to SDSS and its child project Galaxy Zoo (which sprang out on its own, via spontaneous, pub fertilized, gestation). The science coming out of SDSS is amazing (and it’s currently being live blogged over here).
The Sloan Digital Sky Survey is just the starting point. In the coming years two new large survey telescopes will be coming online. The first is Pan-STARRS, which has a primary mission of trying to find pretty much every medium to large sized rock or chunk of ice out in the solar system threatening to cross the orbit of the Earth. It is set to come online late this year. Following on its heels is the LSST, which presently anticipates seeing first light in 2015 and has science goals of finding the remaining unidentified random small objects in the solar system as well as finding transitory objects like supernovae.
On their way to accomplishing their core missions, these telescope projects will obtain hundreds of images of different areas of pretty much every corner of the sky they are each able to access. By adding these images together we can probe deeper and deeper into the sky as more and more months and years pass by. In this deep means faint, and as we survey the faintest fuzzies of the sky, astronomers will find a new definition of what it means to have a statistically significant data set.
Hopefully, in a couple years, the physicists will joke that astronomers consider a mere 40 or 400 data points as being statistically significant.
Statistically significant is a fancy way to say a result is believable. If you see me toss a basketball into a hoop from 5 meters once, it might be a fluke (or a miracle, more likely). If you see me do it 28 out of 30 times, you can say with statistical certainty that I am capable of accurately throwing basket balls (this would never ever happen). Statistically significant crops up in many places, and we use it for silly reasons. The house I live in is Pink (it’s an old Victorian that once was a farm house). When we bought it my colour blind husband told people it was salmon or mauve or puce. I said it was pink. This lead to us surveying every poor soul who came over about what color they perceived our house to be. Based on a statistically significant sample of about 20 people who answered at the 80% level “Pink” or “Pink or Salmon,” my husband now tells the pizza guy we’re the first pink house after the intersection
This idea of statistical significance applies across all non-theoretical astronomy research. Along with the hard science areas of observational astronomy, high-energy accelerator-based cosmology, and rover / probe based planetary science, astronomy/physics departments sometimes also hide a few soft scientists working on astronomy educational research who can do statistically significant research as well.
In the hard versus soft science discussion, I’m a bit half-baked. Different bits of my research likes to sit on both sides of that fence.
In Astronomy Education Research we have very few large scale surveys. There are the occasional longitudinal studies. Using them I can tell you how performance varies by ethnicity and gender on the GRE. I could tell you how graduating classes matriculate men and women across the decades. There is even some large scale surveying of what type of jobs we get, what classes we take, and who does better on exactly which standardized tests.
But these surveys don’t always help instructors like me get inside my students heads and understand how the specifics of what I do does or does not improve learning. There is no 10,000 person survey to explain if one demonstration of diffraction is better at explaining spectroscopy than another demonstration. In designing my courses, I rely on my personal experience, and I rely on limited case studies. I do have data sets demonstrating demos do deserve to be done. I have facts and figures fully justifying labs. Somethings I know about my formal class thanks to the work of others. But mostly, I have to follow my gut and simply employ what are called best practices (broad concepts like using labs, encouraging student interaction, and keeping my class actively engaged in the content rather then lecturing)
As a new media content provider, I find that the best practices for blogging, podcasting and even YouTube have yet to be defined. (And I’m doing my part, where I can.) As a trained researcher, I get frustrated with comments that all my experience says are true, but I personally don’t have the data (or know of anyone else who has published the data) to back up. For instance, a quick run through YouTube will find videos that qualify as trash, campy, quality, and OMG why. My experience looking at hit numbers tells me that it is the camp and the why that seem to get hit the most. To prove my stomachs interpretation of a not statistically significant data set I’d need to define a way to quantify a video as into a category like “campy,” and then use that metric to classify a few 1000 videos, and then document how many hits those videos received in a known period. To help remove bias, I’d want to do things like only use first time posters, and similar restricting factors to keep my data as easy to understand as possible. Its a question that intrigues me. And if I had a fleet of marketing students, I’d probably put them to work answering these questions.
But I don’t have that fleet, so I answer the questions I can, as I can, and I chew out papers when I can.
And until that data can be gathered, I and others talk in anecdotes. In the press room at AAS, we all talk about the seemly magical combinations of actions that we have to do to get our posts dugg on digg. Sometimes it feels like our not-well-documented anecdotes amount to, “When the moon is full, and I include the word serendipity and bite my tongue while hitting the publish button I hit the front page.” The same is true of some teaching.
The trick is to remember, all we have are our personal anecdotes, and what works for Phil Plait won’t work for me. And what works for Fraser probably won’t work either. And nothing will ever replicate the magic of Astronomy Picture of the Day.
New media is young. Large surveys will come. Just not today. Until then, let the gossip and story telling begin (just don’t claim statistical significance.)