A Nature News piece is out today featuring comments from me, about how high retraction rates correlate with impact factors in scholarly journals. However, the piece cherry picks my comments a little, and doesn’t really go into that much depth. Bjorn Brembs already has a response up, and seeing as when I was contacted for comments, I mentioned a piece of research from him and other colleagues, I feel it is in the spirit to echo what he mentions by publishing my full response to Nature here.
In response to a tweet about an article on retraction rates, I was contacted with the following questions: “Did anything in particular inspire you to share this paper at this time? Can I assume that you feel the paper is still relevant?”
My full response:
I do feel the paper is relevant still, especially given several recent ‘horror stories’ regarding retractions, and falsification of results. However, I do feel it is missing key discussion points, such as what is creating the pattern. I think there are two ways of reading it. The first, is that the ‘publish or perish’ culture very much alive and prevalent within academia is still causing academics to chase after ‘high impact’ journals, and in doing so are more likely to create incorrect results, deliberate or not, in the pursuit of achieving substantial enough results deemed worthy of ‘top tier journals’ – sensationalism over quality. The second way is that journals with higher impact factors generally have higher readerships, and as such increase the probability of detection of incorrect results. However, there is no current information that I’m aware of to support this latter link, beyond anecdote.
The raw way of reading it, however, is that the higher the impact factor of a journal, the increased probability that the contents within are wrong. I think although it is not as black and white as this, it is certainly another piece of evidence to caution against ‘impact factor mania’ within academia (something that I still see my colleagues suffer from on a daily basis, and try to engage with).
Perhaps more significantly is that it draws attention to the short-comings of peer review, if detection of incorrect results is not picked up during this screening process. Perhaps even further, it highlights the rift between editorial decisions and recommendations of review, highlighted in the ‘Bohannon Sting’ operation last year, if you recall (i.e., bad results, accepted for publication anyway). Either way, it highlights a need for more transparency in the review process – this will ultimately drive down retractions as mistakes can be detected and dealt with much quicker.”
Björn Brembs
Only one thing: actually, our paper is not a study, just a review article covering the empirical data around journal rank. One of the main studies we cite is the one by Fang & Casadevall, among about 100 other studies. In fact, the Fig. 1 you mention is just reprinting the data from Fang & Casadevall with a log scale.
In our review, we also details the data on distinguishing between attention and paper methodology and conclude that the data swings heavily towards the bad methodology as being the more important driver for retractions (while emphasizing that both are significant).
Egon Willighagen
Dear Jon, it is important that you bring up peer review in this context. One aspect of papers in higher impact journal is the tendency (anecdotal too) that the higher the impact the less detail given, but more importantly, the amount of results presented. It would not surprise me, however, if the amount of time spent on reviews does not linearly scale with the amount of results presented. Seriously, who can expect reviewers to accurately peer review a full genome. I expect reviewers to *assume* a lot methods to be applied properly.
But the amount of results is not the only issue; I think the multidisciplinary scope of the results also make it unfeasible that two, three, or even four reviewers can accurately review all the results presented in the paper.
I short, I hereby postulate that the underlying mechanism is that higher impact attracts more results and moreover results of higher complexity, resulting on putting increased requirements on the skills and time resources of the reviewers, causing an increase error rate.
But surely someone must have proposed this earlier.