Geia sou! (Hi there!). I'm sitting on a balcony overlooking the Aegean, jet-lagged and beaten from the journey, but I'm here in Rhodes, which is really lovely. On the endless flights (four!) to get here, I've been thinking about papers (not actually reading them, because I was so uncomfortable, and I managed to sleep a bit, which is probably why my neck hurts so much!).

But I was thinking about them, and how it is that we can read a paper and accept most of the conclusions, but often not all of them. And we do, at least, I do, and so do the Molets who present them in our journal club (and I suspect you do, too). These results look convincing, but this result doesn't seem right, so while I accept this conclusion, about this other conclusion, I have reservations. I can see why the paper was published, but I have to actually read it to decide if I accept what they claim. Sure, that's good science, but why is it this way?

Before we get into this, I really have to stress the reading part. When I read an article, I often have the sinking feeling that the authors didn't actually read some of the papers they are citing. Okay, there is a lot of literature, and sometimes it's just easier to search for an abstract that seems to say what we want to claim, so that we can get to our own stuff. I'm afraid that this trend is pretty common, and is particularly insidious in review articles, since people will simply assume that the cited studies were carefully read and that the conclusions from those papers are robust. This leads to other authors citing those papers, because they too want to get to their own results. I suspect that there are many ‘highly cited’ papers that few have actually read.

Once upon a time, in a galaxy far, far away, I took a remarkable course (actually, it was required, and it was really hard) in which we read and critiqued ‘landmark’ papers. Our job was to find what was potentially wrong about them. By ‘landmark,’ I mean this in the nautical way – a landmark is a navigational feature that does not move, and therefore can be relied upon to guide a ship through safe waters. To be a landmark paper, it has to have comfortably steered research efforts to further, useful findings. A mountain top cannot be in the wrong place, and for a paper to truly be a landmark, it cannot direct us to false conclusions. So how can these be potentially ‘wrong?’

We are not talking about the so-called ‘landmark’ papers that have recently come under scrutiny as not being ‘reproducible.’ This was the conclusion of a paper that launched worldwide initiatives into ‘rigor and reproducibility’ that currently occupy much discussion and sections in our grant applications. (That's another discussion, into why we accept this claim, without evidence, but not those of the original authors). But we'll come back to this idea later. We are talking about truly classic works that put us on what seems to be the right track, given where we were and how far we've come since accepting their conclusions.

In our course, we learned, for example, that while the classic paper by Avery that demonstrated that DNA is the genetic material (I hope you know what I mean; if not, look it up) had a protein contamination of one percent, leading scientists at the time to question the conclusion. Several years later, the classic paper of Hershey and Chase (again, look it up!) used another approach to show that it was, indeed, DNA and not protein, that carried genetic information, and this convinced the skeptics, and here we are today. However, that paper had a margin of error closer to ten percent. Yet this paper, and not the former, set the stage for the tremendous advances that followed. By the time that Rosalind Franklin's structure, by way of Watson, Crick and Wilkins, gave us the insight into how DNA replicates, we already knew that DNA was the genetic material. But why was a ten percent margin of error acceptable, when an earlier one percent protein contamination was not? And by the way, we now know that both of these papers were ‘correct’ in their conclusions.

When we read a paper, usually it is not without context (papers that present something that is conceptually completely new are welcome, but rare). We consider whether or not the data are consistent with what we know from other papers and, sometimes, from our own work. And the approaches can also influence us; when a finding depends on a new, or recently developed technology, this can sometimes bring an element of conviction to the conclusions (assuming that we trust the technology, which can be a big assumption). I remember that back in the day, my professors argued that it was the use of a new technique, autoradiography, that helped to promote the experiments by Hershey and Chase. Could be. Certainly, when I read a paper, I notice it when an emerging technique is brought to bear on an interesting question. But it is not only techniques; a well-thought-out experiment, especially with a degree of ‘elegance,’ can sway me.

So, what makes an experiment elegant? It can incorporate innovative internal controls, or it can logically answer a question in a way that seems pretty air tight. Or it can just be something clever I don't think I would have thought of (I know, this is very subjective and self-centered, but when it's me reading the paper, it does have bearing, and besides, I learn something!).

But of course, papers also include experiments that don't contain the controls I would have liked to see, or the results, although perhaps statistically significant, do not seem to be biologically significant. In biology, many things happen over time, but often, we only see what happens at one or two particular times. Or the effects seem to be smaller than what we would have expected based on the other findings reported in the paper. There are lots of reasons to doubt a conclusion, and sometimes doubt the major conclusion of a paper. But even then, there may be findings in the work that do seem to be informative, and even if we (I) might doubt such conclusions, we (I) may still find value in reading, and remembering, the results.

Here's the thing. When we write a paper, we look at the body of data we have generated and frame our conclusions to the best of our ability to interpret these data. And of course, since we did these experiments to test hypotheses, and answer questions, we examine the strength of the data to determine how close we have come to addressing these questions and hypotheses, usually with an eye to stating conclusions that are as interesting and compelling as we can support. Then we organize the data to best present our argument (which we detail in the text). We send it for consideration and review, and reviewers point out the deficiencies in our work. Ideally, this is to strengthen our conclusions, but unfortunately, it seems that most of the time reviewers just say what they like and don't like, and what else they would like to see, regardless of how it might affect the conclusions. We argue as much as we can, but generally try to give the reviewers what they want (because, again unfortunately, many journals do not have the sort of editors who will champion a paper against spurious demands for ‘more stuff,’ and even if they do, we want to make the reviewers as happy as possible. Sometimes this becomes a treadmill of, okay, good, now give us still more stuff.) Eventually, if the paper will be accepted, we wear down these reviewers and they give up, and the work is published, even if the logic we had developed has been a bit lost in the shuffle. Experiments are included to ‘satisfy the reviewers’ that might not be as rigorously performed and repeated as we might have liked (this is often due to the time–money continuum, which has some complex math behind it). We make conclusions (not the major conclusion, usually, but small, sub-conclusions) based on these new data (a sub-conclusion can be, ‘hey, our results are not due to this artifact,’ which can be important, but it can also be, ‘oh, by the way, other things also work like this, because someone else wanted to know what would happen if we tried this experiment.’) And because biology is usually very complex, many of these sub-conclusions are not necessarily based on particular data.

There is very often a margin of error. That's fine, provided that people carefully read the paper to decide which conclusions are best supported by quality data. But this brings me back to the papers we read now, including so-called ‘landmark’ papers (which are really just papers that get cited a lot). They have to be read and evaluated. Please don't cite papers only because others have cited them, or because you read about them in a review. Don't promote (cite or re-cite) published conclusions that might be weak or suspect. If you cite it, make sure you've read it, and have used your skills to decide if it is worth citing. Because someone else will trust you, and they'll cite it, too.

We can live with a margin of error in the scientific literature. I think we have to. But we don't have to broaden this margin when we cite work in our own papers. As for me? I'm going to the beach. Antío gia tóra!