Considerable attention has been recently paid to improving replicability and reproducibility in life science research. This has resulted in commendable efforts to standardize a variety of reagents, assays, cell lines and other resources. However, given that microscopy is a dominant tool for biologists, comparatively little discussion has been offered regarding how the proper reporting and documentation of microscopy relevant details should be handled. Image processing is a critical step of almost any microscopy-based experiment; however, improper, or incomplete reporting of its use in the literature is pervasive. The chosen details of an image processing workflow can dramatically determine the outcome of subsequent analyses, and indeed, the overall conclusions of a study. This Review aims to illustrate how proper reporting of image processing methodology improves scientific reproducibility and strengthens the biological conclusions derived from the results.
Image processing and analysis have become an indispensable component of the biology toolbox. There are several important forces that drive this trend, including (1) the increasing complexity and volume of modern imaging data (Ouyang and Zimmer, 2017); (2) the emphasis placed by journals and funding agencies on quantitative, rather than qualitative, data interpretation; and (3) the efforts by the biological community to reliably share data [e.g. the brain connectome (Dance, 2015), or the OMERO project (Allan et al., 2012)]. Collectively, the ubiquitous reliance on quantitative imaging as a biological tool highlights an important problem that has remained inadequately spotlighted until recently – the lack of accurate and sufficient reporting of the processing steps required for image analysis (Marques et al., 2020).
An image file contains information that can yield important insights into a wide range of biological structures and processes. Unfortunately, any image is intrinsically an imperfect representation of the object(s) being studied (Aaron et al., 2019; Sluder and Wolf, 2013). The complexity of a biological specimen and its interaction with light often further degrades the fidelity of how the biological reality is represented in the acquired images (Booth and Patton, 2014; Ji et al., 2010) by distorting the image data, resulting in suppressed image signal and resolution.
Image processing is the vital intermediary step that aims to isolate and/or emphasize the desired signals in a raw acquired image before eventual analysis (see Fig. 1A). In this discussion, we make a distinction between image processing and image analysis, whereby the latter comprises a vast set of diverse approaches to extract biologically meaningful, quantitative measurements from a dataset. Image processing, on the other hand, serves to digitally transform an acquired dataset by enhancing or isolating signals of interest and/or suppressing other signals and noise that will otherwise prevent accurate analysis. Furthermore, as indicated in Fig. 1A, there are often steps that are required to be taken before image data can be properly processed, termed pre-processing. Such steps generally do not serve to enhance particular features in an image per se, but rather correct for imperfections commonly encountered in imaging systems. This may include, for example, corrections for proper color channel registration (Zhang and Carter, 1999) or microscope stage drift (Lee et al., 2012).
The myriad of available image processing and analysis software packages enable biologists to perform highly complex digital operations with ease. However, herein lies the ‘double-edge sword’. While easy access to such algorithms is empowering, their foundations are grounded in complex mathematical concepts that may be unfamiliar to many researchers. An ill-advised application of these algorithms can drastically change the underlying image data in unanticipated and counterproductive ways. Yet, many image processing software programs, especially commercial solutions, provide an easy ‘one-click’ route to unwittingly alter images with little regard to understanding the resulting effects on the data, as long as the end justifies the means – the background is silenced, the image is sharpened, and the desired objects are segmented. This conceptual opacity, combined with the lack of safeguards in implementation, has perpetuated the notion that image processing requires little thought, and that the underlying details of digital operations can be ignored and go unreported.
It is important to point out that digitally processing raw images for later analyses is not intrinsically unethical; indeed, good use of processing techniques can be integral to achieving the experimental goals (Miura and Sladoje, 2020; Miura and Tosi, 2017). What becomes unacceptable, usually unintentionally, is the failure to document the processing steps taken. Proper reporting would allow possible contradictory findings to be reconciled by retracing the processing steps, even if a published conclusion is erroneous due to an ill-advised processing workflow. Furthermore, many image processing techniques, while entirely justifiable as a means to accurate quantitative analysis, are not appropriate to generate ‘improved’ images for display. Indeed, the displayed images themselves should be presented in a format that is as close as possible to the original data, with any alterations consistently applied and carefully disclosed. Finally, there do exist certain manipulations, such as image ‘cutting and pasting’, the use of lossy compression, or any alteration that is applied to a user-selected sub-region of an image that will nearly always be inappropriate regardless of their application (Cromey, 2010). In any case, proper documentation of all digital alterations serves as a traceable means to identify inappropriate manipulations.
Our goal here is not to exhaustively discuss the many image processing methods available, as this is a topic far too vast for the scope of this paper. Many excellent and comprehensive treatments on the principles behind image processing are available for review (Bhabatosh and Dutta, 2011; Burger and Burge, 2013; Ekstrom, 2012; Gonzalez, 2002; Nixon and Aguado, 2019; Pitas, 2000; Szeliski, 2010; Wu et al., 2010). Likewise, it is not our goal to discourage experimenters from performing any particular technique, so long as such use is well-understood and scientifically justified. We aim to provide instructive examples that illustrate how various image processing operations, as outlined in Fig. 1A, can impact the resulting image data, and how their inaccurate and insufficient documentation impede or even prevent reproducibility. Additionally, we will provide the readers guidance and examples for proper reporting.
Reproducibility in image processing – a simple example
Reporting image-processing methodology inaccurately and/or incompletely can easily result in findings that cannot be reproduced or reconciled – often because of a seemingly trivial omission in detail. This can lead to contradictory conclusions from multiple researchers, when in fact, the true inconsistency is one of data processing. As a simple theoretical example, consider Fig. 1B, which shows a fluorescence image of bacterial colonies expressing a reporter of interest. Two different experimenters were each interested in reporting the size distribution of these colonies from the image. However, high background signal and noise were deemed to likely bias this measurement. Therefore, both experimenters decide to first employ a processing method, and report it as, ‘Prior to analysis, images were background-subtracted, followed by Gaussian smoothing, and automatic intensity threshold calculation in FIJI’.
While seemingly a comprehensive description of a reasonable approach, it is in fact insufficient to reproduce the true workflow. To see why, note the bar graph in Fig. 1C that reports the results of average colony size (with s.e.m.) as determined by each experimenter. While both followed the same general steps as reported, the results from each researcher differ by more than three-fold.
As may be surmised, the source of this inconsistency lies in the details. While Experimenter A used a background subtraction radius (Sternberg, 1983) and Gaussian smoothing (Nixon and Aguado, 2019) kernel size of 5 and 1 pixels, respectively, experimenter B used 10 and 2 pixels. And in one case Otsu's automatic threshold (Otsu, 1979) was applied, while the triangle method (Zack et al., 1977) was used in the other. Otsu's method serves to minimize the variance in intensity between above- and below-threshold pixels, while the triangle method finds a suitable threshold based on how pixel intensities are distributed between the most common and highest values, respectively. Either approach may be technically justified; but unfortunately, such specifics can often be deemed too nuanced or unimportant to report in a manuscript with a tightly controlled word limit. As a result, even relatively small unreported differences in processing methodology can translate into apparently inconsistent results with multi-fold differences. The two experimenters may draw quite different conclusions based on these competing results, each claiming, for example, a significantly different growth rate for the bacteria. However, proper documentation and reporting will allow these two disparate results to be easily reconciled.
The above example describes two common image enhancement steps – background subtraction, and image de-noising. Both techniques are designed to isolate the signal of interest from the rest of the image. To better understand why image processing is often necessary prior to analysis, it is important to understand what factors contribute to image degradation. These factors include: (1) optical diffraction that ‘blurs’ the image of an observed object; (2) extraneous or uninformative signals not associated with the structure of interest, referred to as ‘background’; and (3) random intensity fluctuations, collectively described as noise (Lambert and Waters, 2014). Overall, biological, optical and electronic factors will always work in concert to degrade an image. The choice of strategy for recovering or amplifying the signal of interest, and how it is applied, can substantially affect the outcome of any subsequent analysis. There are image processing techniques designed to address each of these factors, which will be briefly surveyed in the following sections. We will also articulate why complete reporting of their implementation is critical.
Background can refer to any detected but non-informative signal, whose presence may render subsequent analysis inaccurate or otherwise misleading. There are many sources that can contribute to background signals: (1) fluorescence generated from molecules that are not the reporter of interest (e.g. riboflavin in cell culture medium) (Aubin, 1979); (2) out-of-focus signals that may emanate from the fluorescent reporter molecules, but are too far away from the microscope focal plane to form a suitable image; and/or (3) non-specific signals, which emanate from the reporter of interest, and may be located in the focal plane, but are not associated with the biological structure of interest (e.g. non-specific binding of a secondary antibody). Regardless of their source, background signals will often skew the results of many image analysis techniques.
Fortunately, there is a retinue of methods that can estimate and remove some unwanted background signals from an image (Gonzalez, 2002). These estimates, however, almost always rely on one or more limiting assumptions. Thus, it is paramount to clearly describe any background removal technique when reporting results, as illustrated with an image of fluorescently labeled viral particles in a cell (Fig. 2A).
To accurately measure the fluorescence intensity of individual viral particles, it is almost certainly necessary to employ some background subtraction method. There are many viable options for background subtraction, including (1) Fourier domain filtering (Nixon and Aguado, 2019), (2) Gaussian smoothing (Nixon and Aguado, 2019), or (3) ‘rolling ball’ subtraction (Sternberg, 1983).
Fig. 2C–E show the estimated image background signal (as surface plots) calculated from each of these methods, respectively. Note that the Gaussian smoothing and rolling ball filters each had a 5-pixel radius, and the Fourier frequency filter removed the first five frequency components (including the offset). Then, the total pixel intensity within the box indicated in Fig. 2A was summed for each background subtraction scenario as a measure of the total fluorescence signal associated with the encompassed single viral particle. Strikingly, a 75% variation in total intensity among the differently processed images is observed (Fig. 2B). Therefore, simply indicating that an image was ‘background-subtracted’ is not sufficiently specific to render a workflow reproducible; rather, explicitly stating the chosen method with the accompanying parameter values is required.
As outlined previously, noise is inevitable in any imaging system. While every effort should be made to maximize the signal-to-noise ratio (SNR) of an image, practical considerations, such as fluorophore choice, imaging speed, duration and phototoxicity, can force experimenters to compromise (Galdeen and North, 2011; Jonkman et al., 2020; North, 2006). This can render subsequent measurements more difficult. Fortunately, image analysis can be aided by denoising methods. While such algorithms never remove noise entirely from an image, they can often substantially reduce its contribution (Fan et al., 2019), albeit often with a loss in image detail. How, and to what extent this is done can vary widely from one technique to another – potentially producing widely variable results, thus reinforcing the importance of proper documentation.
The noise component of an image is often assumed to vary more from one pixel to the next than does the structure itself. Thus, many denoising algorithms force each pixel within an image to be more similar in intensity to its neighbors. As an example, a noisy image of a neuron is shown in Fig. 3A; it was subjected to Gaussian smoothing (Nixon and Aguado, 2019), median filtering (Huang et al., 1979) and non-local means denoising (NLMD) (Buades et al., 2011), with end-results shown in Fig. 3B–D, respectively. The Gaussian smoothing and median filtering each used a filter radius of 6 pixels. NLMD was used as implemented in FIJI, with a smoothing factor of 1 and automatic estimation of sigma parameters. In Fig. 3E, an intensity profile plot is shown for the same region in each image, indicated by white lines, for the original data and for the indicated denoising method. It is useful to compare areas primarily with and without the signal of interest, indicated by the yellow and light blue areas, respectively.
Note that Gaussian smoothing results in less noise than the median filter in areas of low signal (light blue areas). However, it also tends to reduce the intensity and broaden the apparent axon widths by blurring their edges (yellow areas), as denoted by the arrows. The median filter results in greater pixel variation in the dimmer areas of the image, but with better intensity preservation and sharper axon edges.
NLMD (shown in Fig. 3D) can often circumvent the limitations of the Gaussian and median filtering methods, as it does not assume that neighboring pixels need to have similar intensity. But because of its ‘non-local’ nature, NLMD can give different results depending on whether an image is cropped – another potential source of irreproducibility.
Background signal and noise can represent formidable foes in image analysis. But diffraction has historically been the ultimate limiter of image quality. Any acquired image will be the convolution of the structure being observed and the point spread function (PSF) of the microscope (Sibarita, 2005). Deconvolution techniques therefore attempt to reverse the effects of diffraction by approximating a real structure, given (1) a captured image and (2) a known or estimated PSF. The desired result is a crisper image whose details may better facilitate image analysis. The sharpened image, however, can only be an estimate. The accuracy of this estimate is further complicated by the extent of image noise and the accuracy of the assumed or measured PSF. In practice, deconvolution algorithms must balance image deblurring and noise – increasing the former will unfortunately increase the latter. Many algorithms, therefore, incorporate one or more user-defined parameters that attempt to optimize that balance.
Deconvolution techniques can be grouped into ‘direct’ or ‘iterative’ algorithms. A common example of a direct method is based on a Wiener filter (Gonzalez, 2002; Sekko et al., 1999). The Richardson–Lucy algorithm, on the other hand, is a common iterative scheme (Lucy, 1974; Richardson, 1972), whereby an experimenter specifies the number of cycles to perform. In any study that makes use of deconvolution, the specific algorithm employed should be cited. In addition, reporting the user-defined parameters is equally vital to ensure reproducibility. To illustrate this point, consider the image in Fig. 4A, which shows mitochondria in several HeLa cells. The images in Fig. 4B–E show an enlargement of the area denoted by the white box in Fig. 4A and indicate the results of four deconvolution attempts. Each image was obtained using the same Richardson–Lucy algorithm. The only variation between each image is the number of user-defined iterations, with Fig. 4B–E corresponding to 5, 10, 25 and 75 iterations, respectively. This example illustrates the marked effect of iteration number on a deconvolved image. Note that, particularly in Fig. 4E, the chosen number of iterations is too large, as unintended structures have become apparent in dim regions of the image. But a complete reporting of this processing step can be vital in identifying such errors. More generally, the chosen number of deconvolution iterations can have significant effects on downstream analyses; in this example, measurements of mitochondrial size, shape, and number can all be expected to vary depending on precisely how the preceding deconvolution was implemented. However, this image processing detail is often omitted in published studies.
Deconvolution can be a conceptually difficult image-processing task for many researchers, in part due to its mathematical complexity. Additionally, commercial implementations may not disclose a rigorous description of the exact algorithm used for proprietary reasons, or a clear explanation of the user-defined parameters. This is compounded in the case of iterative algorithms, where the optimal number of deconvolution iterations may not be clear. Some implementations make use of a particular image quality criterion that, when met, would terminate the operation (Laasmaa et al., 2010). In any case, researchers should obtain as much information about the chosen deconvolution algorithm as possible from their microscopy facility. At a minimum, any utilized commercial software tools should be clearly listed (with version number), along with all the requisite user-defined parameters. While details on commercial implementation of deconvolution may be difficult to obtain, there are other open source options available (Sage et al., 2017; Sun et al., 2009) with more transparent descriptions.
Feature detection and object segmentation
The image enhancement techniques described above can be powerful techniques to reduce unwanted signals and noise, and to amplify signals of interest. However, this may only be an initial step in an image processing workflow. Distinguishing pixels that primarily contain signal of interest – termed foreground – from those that do not is often the next logical processing step toward eventual analyses.
The simplest way to identify foreground pixels is to select a minimum threshold intensity value, although other strategies exist (Maini and Aggarwal, 2009). Manual selection of a threshold value is often used, based on a visual inspection of the resulting foreground pixels, but this approach can be prone to user-bias. To minimize such bias, automated threshold methods, which do not rely on human perception, are common alternatives.
Few topics have seen as wide a variety of implementations as automated intensity threshold calculation. For example, FIJI (Rueden et al., 2017; Schindelin et al., 2015; Schneider et al., 2012) currently offers no fewer than 17 different methods to automatically calculate an image intensity threshold with a diversity in results (Fig. 5). Fig. 5A features an image of fluorescently labeled intracellular vesicles after background subtraction (with rolling ball subtraction, 5-pixel radius) and denoising (using median filtering, 3-pixel radius). Three different automatic thresholding techniques (Kittler and Illingworth, 1986; Otsu, 1979; Prewitt and Mendelsohn, 1966), each included in ImageJ/FIJI, were applied with results displayed in Fig. 5B–D, respectively. White pixels denote foreground, while black pixels correspond to those pixels deemed not to contain signal of interest. Clearly, different thresholding techniques return very different sets of foreground pixels. This is primarily due to the unique assumptions of each method about the underlying pixel intensity distribution.
As suggested before, microscopy facility staff can often be the best initial resource in selecting an appropriate threshold application technique. However, describing the method being used and its associated parameters is of even greater importance. Even if a threshold method is poorly chosen, documenting its use can prompt a manuscript reviewer to suggest a different method prior to publication, and/or resolve any issues of reproducibility after publication.
Object segmentation, in many instances, can be thought of as a final processing step that groups adjacent foreground pixels into discrete objects for analysis. Despite the apparent simplicity of this task, there are important details that will affect the outcome. Most fundamentally, there are multiple ways to define ‘adjacent’ in this context. For example, pixels can be joined into an object if they share a common edge (4-connectivity rule), or if they share either an edge or a corner (8-connectivity rule) (He et al., 2017). While seemingly inconsequential, this difference can have a large impact on the measured number, size, and shape of objects – particularly if those objects are comprised of relatively few pixels.
Once pixels have been grouped, further refinements are often made. Spuriously bright pixels are often misclassified as objects but tend to be of different size or shape from the objects of interest. Thus, it is common to simply not consider any objects that fall outside acceptable bounds of shape and size. Such operations are not inherently unethical, so long as they are well-justified based on what is known about the biological structure(s), and so long as they are applied in a consistent fashion. But most importantly, such morphological filtering should always be noted with a complete description of all parameters that were considered.
Binary operations can also refine segmented objects. These operations represent image processing techniques that are reserved only for binary images – that is, images with only two possible intensity values, which typically denote whether a pixel is above or below a specified threshold, respectively. For example, binary erosion followed by dilation – called ‘image opening’ – can be useful to remove small unwanted objects produced by spurious signals. Similarly, binary dilation followed by erosion – or ‘image closing’ – can be useful to fill holes within objects that happen to have areas of relatively dim signal (Gonzalez, 2002). Watershed operations can separate objects that share a common border (Roerdink and Meijster, 2000). Although each of these processing steps aim to improve upon the initially segmented objects, many results will depend heavily on their precise implementation.
This can be seen by continuing to consider the images in Fig. 5. Assuming the threshold technique represented in Fig. 5C, with the enlarged view shown in Fig. 5Ci (Otsu, 1979), is deemed optimal, measuring the average area of each vesicle may seem to be a straightforward task and not subject to hazards of irreproducibility. However, the removal of specious small objects and apparent vesicle aggregates may be critical to avoid skewing these results. One experimenter may choose to use image opening and a watershed operation to accomplish this task, which we term Segmentation Method 1. Another experimenter may simply apply a maximum object area and minimum object circularity to select putative vesicles, termed Segmentation Method 2. While either approach may be appropriate, the segmented objects derived from each method are notably different, as illustrated in Fig. 5Cii and Ciii. It should then come as little surprise that when computing the average vesicle area, a 75% difference in outcome is found between these two methods (Fig. 5E). Importantly, differing results would be expected even if individual parameters within each method are varied. For example, varying the acceptable bounds of object circularity in Method 2 is likely to change the quantitative outcome. Thus, even when a pair of images is subject to identical background subtraction, image denoising and automated threshold application, the analytical result can still vary by a surprising amount due to differences in object segmentation and refinement.
An often overlooked, yet vital, aspect of proper reporting of image processing is the sequence in which multiple tasks are applied to an image. For example, applying a rolling ball background subtraction, followed by median-filter denoising will produce a different image from the one obtained with these same steps performed in the reverse order. As such, differences in the processed image will almost certainly affect any later segmentation and, ultimately, an analysis outcome. Thus, when describing any image processing workflow, careful attention should be paid to denote both the details of all individual steps, as well as the precise order in which they were applied. We have included a user-fillable form (see Table S1) aimed at aiding readers to properly document their image processing workflows.
The various examples shown here serve to illustrate how seemingly inconsequential details in the implementation of image processing can have dramatic effects on a final image analysis outcome. A common reason these details are deemed unimportant is largely due to the fact that many imaging experiments are comparative in nature. As such, it can be tempting to presume that the overall difference in outcome between experimental and control samples will be preserved, regardless of exactly how each image is processed and analyzed, so long as such workflows are applied consistently. It must be stressed, however, that this assertion cannot be assumed to be true. The preceding examples indicate that varying even one parameter in a single processing step can produce unintuitive and non-linear effects in the final outcome. Thus, extending such variability over several steps in a workflow would therefore render any changes in final results wholly unpredictable.
Perspectives and conclusions
As described here and elsewhere, digital image processing is an integral step toward quantitative image analyses. In fact, judicious utilization of these methods – not avoidance of them out of indiscriminate ethical concern – is often required for the success of accurate quantitative image analysis (Miura and Sladoje, 2020). A more insidious problem, however, is the lack of proper reporting of these essential digital operations in the literature (Marques et al., 2020). The preceding examples demonstrate the often-significant effect on the analysis outcome as a result of inconsistently applying seemingly insignificant image processing factors. Promoting accurate and sufficient documentation of image processing is not novel. In fact, it has been advocated by many (Lee and Kitaoka, 2018; Limare and Morel, 2011; National Academies of Sciences and Medicine, 2019). Furthermore, there are a number of organized efforts by the imaging science community dedicated to providing resources and guidelines aimed at improving microscopy data quality assurance and transparency, as exemplified by the OME (Allan et al., 2012) and QUAREP-LiMi (Nelson et al., 2021 preprint) initiatives. Furthermore, in a companion Review, we follow a similar approach to discuss how to properly report image acquisition parameters (Heddleston et al., 2021). Indeed, the increasing utilization of public data repositories such as OMERO and Zenodo (Sicilia et al., 2017) can only be truly effective with proper accompanying reporting of both image acquisition and processing/analysis metadata. Unfortunately, advocacy alone has thus far not garnered a sufficient response. In this Review and its companion (Heddleston et al., 2021), we take a different approach to this issue than mere advocacy.
Here, we assume that the widespread problem of under- and mis-reporting of image processing steps in published literature is not primarily driven by ill-intention to conceal pertinent information, but due to the general lack of understanding and appreciation of how these methods could affect the data, often in unintuitive ways. In other words, we presume many end-users simply do not recognize what matters, and therefore do not know precisely what to report. Shaw and Hinchcliffe (2013) observed that the propagation of computer-aided image analysis among life scientists often follows an ‘oral’ tradition, facilitated mainly by colleagues or collaborators. Another alarming occurrence is to see early-career scientists, such as postdocs and graduate students, receiving little or no guidance at all when performing image processing and analysis tasks. To further exacerbate the challenge, many imaging facilities – where such support should be readily available – do not receive sufficient institutional and federal funding to offer image processing and analyses as a core service, or training of their user base (Ferrando-May et al., 2016). This problem frequently leads to a situation wherein the biologist arbitrarily tries to process the image without a logically designed strategy, resulting in either suboptimal results, or in many cases, failure to achieve the experimental goals. Even more consequentially, it may also inadvertently allow unethical image manipulations to go undetected.
It is impractical and unreasonable to expect a full understanding of the theories behind every image processing step. Here, we aim to provide readers with an appreciation of how image data can be affected by these digital operations, and to increase the awareness of why it matters to report them. To help readers toward this goal, we have summarized several major image-processing tasks, examples of common algorithms used for each task, and the corresponding parameters that should be reported (Table 1). While not comprehensive, this table can aid readers in critically approaching a broad range of image-processing algorithms with an eye toward documenting and reporting the necessary elements to maintain reproducibility. More importantly, this is expected to raise awareness of the issues, so that researchers seek expert assistance when in doubt, thereby potentially avoiding costly mistakes.
A myriad of strategies can aid researchers in tracking, properly documenting and, ultimately, standardizing their image processing workflows. While the simplest approaches can merely rely on diligent notetaking and screenshot documentation, there are built-in features in many software packages that can further aid in this regard. For example, the OMERO platform (Allan et al., 2012) can allow storage and tracking of not just imaging data, but extensive annotations and metadata associated with both image acquisition and digital transformations. ImageJ/FIJI (Rueden et al., 2017; Schindelin et al., 2015; Schneider et al., 2012) features the ability to record individual processing steps, together with the associated parameters. This capability can be leveraged further to easily create customized image processing macros that can be applied to large data collections, thus increasing both the transparency, as well as efficiency of an image processing workflow. Similar functions exist in commercial software packages that allow researchers to store and recall function histories for later reporting. In general, workflow macros and custom software code should always be shared whenever practical. We encourage readers to download a user-fillable form (available as Table S1), based on Table 1, in order to aid in summarizing their image processing workflows for later reporting.
The need for accurate reporting will only become more acute as machine learning techniques become more commonly applied to perform a broad spectrum of image processing tasks (Arganda-Carreras et al., 2017; Smal et al., 2010; Sommer and Gerlich, 2013; Waller and Tian, 2015). This computational field, while often extraordinarily powerful (Ounkomol et al., 2018), presents formidable challenges regarding experimental reproducibility – not least of which is due to their reliance on training datasets. This is further complicated by the fact that the bases upon which machine learning algorithms arrive at decisions are generally not easily conveyed in non-specialist terms.
The end-user community, however, is not our sole target audience. This systemic problem in reporting is, disappointingly, further abetted by the poor implementation of the editorial guidelines set by the journals themselves (Marques et al., 2020). There is unfortunately no fool-proof mechanism that would serve as a perfect remedy to poor reporting. However, it is helpful to identify a series of ‘checkpoints’ that would have to be breached to lead to an irreproducible quantitative analysis – (1) lack of documentation of the image processing workflow by the end-users during the experimentation phase, (2) lack of oversight by someone knowledgeable to identify the mis- or under-reporting during the manuscript preparation phase, and (3) failure of the journals to assign the manuscript to at least one reviewer who could guide the inclusion of this pertinent information during the peer-review phase.
The root problem of under-documentation and under-reporting by the end-users during the experimental phase can only be remedied by better training and is therefore a loftier, longer term goal. However, there are immediate actionable solutions that could be taken by the journals to blunt the perpetuation of the problem. Firstly, inquire during manuscript submission if the authors had altered any of the images for analysis and whether all the image processing workflow and parameters had been documented. While this may not completely stem the problem, it would convey the gravity of the matter. More importantly, the emphasis by journals would encourage end-users to keep the record of image processing workflow as detailed as they would a biochemical or molecular biology assay. Secondly, the journal editors should assign the manuscript to at least one reviewer who has the related expertise when image data analysis either makes up a considerable proportion of the data presented in the manuscript, or the key findings are generated through image quantification. Importantly, ensuring proper reporting of image processing workflows not only helps to guard against unintended non-reproducibility, but can also serve as a valuable means in identifying and deterring deliberate malfeasance before publication of a study. Such instances, while comparatively rare, arguably damage public confidence in the scientific method far more than honest mistakes. A digital image represents an information-rich data source, from which valuable quantitative biological insights can be derived. However, simply demanding that modern biological data be more quantitative rather than descriptive is inadequate without the ability for independent validation. Scientific data are only as credible and valuable as they can be verified. To overlook the importance of documenting accurate and sufficient image processing details subsequently renders the published results unverifiable. This oversight, therefore, undermines all the time, effort and resources that go into generating the data for a publication, and severely diminishes the value of the scientific discovery.
We thank Wendye Quaye for her design assistance in creating the downloadable supplemental forms.
The Advanced Imaging Center at Janelia Research Campus is generously supported by the Gordon and Betty Moore Foundation and Howard Hughes Medical Institute.
The authors declare no competing or financial interests.