While poking through arXiv, I came across a fascinating piece of community introspection. Titled “Disentangling Visibility and Self-Promotion Bias in the arXiv:astro-ph Positional Citation Effect”, this paper by J. Deitrich (accepted into the Proceedings of the Astronomical Society of the Pacific) asks why papers that appear at the top of daily arXiv listings tend to get referenced more often then those that fall down to the bottom of the list. (And, if my own blog is representative, the top papers get blogged more often too)

First some background: arXiv is where many astronomers post their science papers prior to publication or at the time of publication so that those without subscription access can still read and comment on their results. Papers for a given day can be submitted up until 4pm. Papers submitted after 4pm (eastern time, BTW), land in the next day’s pile of papers. This means that if I submit a paper at 4:00:01, I may be able to land the very top spot on the list for the next day.

As a reader of the astronomy listings (fondly referred to as astro-ph by all the cool kids), I have to admit I typically only make it through the first 20 or so entries before my brain goes numb and my eyes start trying to roll into the back of my skull. As much as I love astronomy, there is just too much content and too little attention span to go around.

So, this brings us to theory 1 on why papers at the top of astro-ph get more references – They just get read more. In the face of all the data (often well over 50 papers, and when a conference proceedings hits, way more then that) no one can read everything and you are more likely to reference things you’ve read (just stating the obvious there). While most of us will do our best to do comprehensive literature searches prior to writing a paper, it is often exceedingly hard to find everything because the words we use in our abstracts vary so much. So… All of us pick up random “Ooo neat!” papers while flipping through Astro-Ph, and we cite them. But probably not if they are the last paper of the day.

But this may not be the only factor. For instance, I had this terrible pit of the stomach feeling last time I graded tests for a 50+ person class. The longer I sat grading, the better the scores got. I was wondering if I was biased or if I was tired and not grading as harsh, etc. So I stopped myself and started from the very bottom, and low and behold I found a magical perfect paper. What I had discovered was the students who turned papers in first (and were graded last) were the A students on average, and the students who turned papers in last had often given up all hope and began madly splashing physics equation as they tried not to drown.

While astro-ph doesn’t have the same “A students submit first” it has something similiar – those authors who really really want to get their paper read by a lot of people sit at their keyboard and wait for the astro-ph clock to flip to 4pm. Then they hit SUBMIT and get their paper in so it will be at the top of stack the next day. This type of self-promotion is unlikely to occur when an author is just submitting a run of the mill something – the nth paper in a series, or a conference proceeding that they felt obligated to submit. Thus, the best papers will tend to be at the top (assuming self-promotion) more often then, well, yet another obligatory conference proceeding paper.

So, two factors play in – self-promotion putting the best at the top and the tendency of folks like me to only read what’s at the top. Which is more important? With their vast data sample, after correcting for the  papers with few or no citations and the papers with insane number of citations, they statistically separated their sample into papers where the author presumably tried really hard (submitted between 4pm and 4:05pm) to get their papers at the top of the list, and papers where they probably just pressed submit when they were good and ready to press submit (submitted after 5:30pm). They found that in general the eager submitters had a statistically higher citation rate then the submissions that randomly landed at the top of astro-ph. Go type-A personalities!

What I love about this result is that it implies there are days when there are no eager submitters getting their papers in at 4:00:01 on the dot. There is another paper in there somewhere. I’d love to know how the submissions ebb and flow with time of year (corresponding to key phases in the academic cycle, like grant due dates and finals weeks), and what days are least often submitted on. There could be a whole seminar on “marketing your research 101.” Like trying to find just the right opening date for a movie, paper authors want their papers to come out when the most people will be reading and when they are least likely to compete with some huge, likely to get a nobel prize, announcement. It is certainly an art form.

I strongly recommend reading the conclusions of this 3-page piece. It is an easy read and it is always fascinating when researchers turn their analytic skills on themselves. (And it was the second paper submitted today).


    Instead of clumping by days, the papers could be clumped in groups of 20 or so. Then people couldn’t self-promote. But tailoring the listing by reader would be much more useful. I thought one of my professional lists did that, but I usually just skim the headlines. Information overload.

