The increasing availability and performance of automated scientific equipment in the past decades have brought about a revolution in the biological sciences. The ease with which data can now be generated has led to a new culture of high-throughput science, in which new types of biological questions can be asked and tackled in a systematic and unbiased manner. High-throughput microscopy, also often referred to as high-content screening (HCS), allows acquisition of systematic data at the single-cell level. Moreover, it allows the visualization of an enormous array of cellular features and provides tools to quantify a large number of parameters for each cell. These features make HCS a powerful method to create data that is rich and biologically meaningful without compromising systematic capabilities. In this Commentary, we will discuss recent work, which has used HCS, to demonstrate the diversity of applications and technological solutions that are evolving in this field. Such advances are placing HCS methodologies at the frontier of high-throughput science and enable scientists to combine throughput with content to address a variety of cell biological questions.
Technological developments in the past years have led to the emergence of high-throughput methods in cell biology (Box 1), where multiple perturbations of a system are automatically repeated in a controlled and identical fashion. These capabilities allow the acquisition of vast amounts of data on a biological system. Evaluating all aspects of such data using computational and mathematical tools can then lead to an unbiased systematic representation of the process studied (Ahn et al., 2006). Technical breakthroughs, often driven by the pharmaceutical industry, have pushed this field forwards and have led to major advances in drug screening (reviewed by Zanella et al., 2010) and numerous discoveries in basic biology.
The need for systematic and unbiased approaches has always been at the core of scientific efforts. However, the technology that is required for such efforts often displays an inverse relationship between throughput (speed) and content (biological information). Sydney Brenner referred to this phenomenon by saying that high-throughput experiments are in danger of creating “low-input, high-throughput, no-output biology” (Brenner, 2008). Therefore, it remains a major goal to develop high-throughput science that will give all the advantages of being systematic, accurate, fast and unbiased without giving up the requirement to provide profound and highly informative data.
Among the variety of high-throughput approaches available to date (Box 1), one of the methods that holds the promise to bridge the gap between these expectations is high-throughput microscopy, often also referred to as high-content screening (HCS). This is mainly owing to the ability of microscopy methods to focus on single cells at a subcellular resolution, in a time-dependent manner and to measure a large number of parameters in each frame. In this Commentary, we focus on the current frontiers of HCS by presenting the wide range of tools that are utilized and the biological questions that can be tackled using such approaches. We also discuss possible ways to increase content while maintaining throughput at all levels of the screen – from sample design and preparation through to the acquisition and analysis capabilities of the high-content imager system.
The power of high-throughput microscopy
High-throughput microscopy can be employed to address a wide spectrum of biological questions. At the basis of this capability lies the ever-growing variety of labeling methods (discussed below) that allow visualization of cellular architecture and function, as well as developmental or behavioral processes (Fig. 1). In addition, the technological advance of biological tools and microscopic platforms now enables screens to be performed in an array of genetic backgrounds, under different growth conditions and at various time points, as well as allowing the comparison of multiple tissues, cell lines or organisms. Combining the richness of visualization approaches with various experimental strategies creates nearly endless options for generating biological insights (Fig. 1). For example, it is now feasible to perform functional genomics screens by capturing a microscopic phenotype of cells in which each gene has been knocked out or knocked down systematically. This can be done, for instance, by RNA interference (RNAi) in cell culture (Brass et al., 2008; Krishnan et al., 2008; Moffat et al., 2006; Neumann et al., 2006; Prudencio et al., 2008) or by using a deletion library, as has been performed in budding yeast (Vizeacoumar et al., 2010). Additionally, deletion libraries have been created for fission yeast (Kim et al., 2010) and for bacteria (e.g. the National BioResource Project E. coli developed at the National Institute of Genetics, Japan), and these could also be utilized for HCS.
Box 1. The repertoire of available high-throughput platforms
Technological developments in the past few decades have led to a flourish of large-scale studies that have yielded a wealth of data. Different types of such systematic high-throughput studies are geared towards quantifying and analyzing separate aspects of cell biology, such as genomics, transcriptomics and proteomics. One of the main differences between these current technologies is whether they allow readouts on the level of the population or the single-cell.
Currently, a number of platforms that allow readouts on the population level are available. These include plate readers (to study processes such as enzymatic function, protein–protein interactions and expression levels), deep and next-generation sequencing platforms (to study genomes and expression patterns), microarrays (expression chips, intron chips, chromatin immunoprecipitation, single nucleotide polymorphisms and genetic diversity), lipid arrays, protein arrays and whole-cell mass spectrometry (for proteins or metabolites).
By contrast, platforms that allow readouts at the single-cell level are less abundant to date. These include flow cytometry, high-throughput microscopy set-ups, high-throughput single-cell sequencing and DNA methylation assays. Despite their sparsity, these readouts at the single-cell level are essential for assaying diversity within a population.
Visualizing biological samples
Biological samples can be imaged directly by using transmitted light or genetically encoded fluorescent proteins, as well as indirectly by using labeling techniques such as those using antibodies or specific dyes. Depending on the biological question at hand, all of these methods can be used for HCS. The major advantage of direct imaging is that it can be performed using live cells. By contrast, indirect imaging requires sample manipulation but is highly diverse and allows specificity.
Performing label-free HCS
Label-free imaging does not impose manipulations on the biological sample that might cause artifacts. However, it is not specific, thus it is usually used to assess morphological changes such as measuring fat accumulation in adipocytes (Dragunow et al., 2007) or for analyzing phagokinetic tracks (PKT) (Naffar-Abu-Amara et al., 2008). Label-free imaging can simplify the sample preparation stage because it eliminates the need for specific staining; however, it can also present challenges for image analysis such as non-homogeneous illumination and contrast enhancement.
Labeling methods for HCS
In order to take full advantage of HCS systems, experiments should be planned to maximize data content. For example, microscopes employing current technologies can resolve up to six synthetic fluorescent colors in a fixed sample and up to three fluorescent colors in live samples. It should be noted that by methods of color decomposition or by unique genetic labeling methods, such as the ‘brainbow’ method, it is possible to separate dozens of combinatorial labeling options (Cachero and Jefferis, 2011; Livet et al., 2007). However, such methods have only been used in low-throughput methodologies to date. By utilizing the entire range of labeling spectra it is possible to multiply the amount of data that can be extracted from a single experiment. This increase in content does not only increase the amount of information that can be gained in relation to the time and amount of work invested but can potentially also reveal new relationships that might not have been discovered otherwise. One example that stresses the strength of employing a multi-color labeling approach is a study that was aimed at determining the distribution of all yeast organelles relative to the polarization axis during the creation of mating projections (a process referred to as ‘shmooing’). This was only made possible by using a three-color labeling scheme in which organelles were visualized with a red fluorescent protein (RFP), the projection tip of the budding yeast was labeled with green fluorescent protein (GFP) and the DNA in the nucleus was stained with DAPI (Narayanaswamy et al., 2009). Indeed, the past years have brought about a wealth of labeling options for imaging biological molecules, which we will discuss in the following sections (see also Table 1).
Fluorescently tagged antibodies
The use of fluorescently tagged antibodies is the classic method for specifically labeling cellular components, and this technique is also widely used in HCS (Brass et al., 2008; Desbordes et al., 2008; Moffat et al., 2006; Prudencio et al., 2008). The main advantages of immunostaining are that it is specific and that it does not require genetic alteration of the cell. The main shortcomings are that it requires the availability of a good antibody, as well as cell fixation, and therefore cannot be used for live-cell imaging. This type of experimental approach has been used in a human genome-wide RNAi screen that was aimed at identifying genes associated with West Nile virus (WNV) infection. The screen was based on fixation of virally infected cells expressing small interfering RNAs (siRNAs) and immunostaining against the viral envelope (E) protein (Krishnan et al., 2008). Using this approach, 283 host proteins were found to facilitate WNV infection and 22 host proteins were found to reduce WNV infection.
Fluorescently labeled proteins
Genetically encoded fluorescent proteins can be used in a large number of screening methods. One approach that is routinely used is the fusion of a fluorescent protein to a cellular protein, thereby making it possible to monitor the localization and expression level of the cellular protein in a living sample. Fusion-protein libraries are now being developed in a variety of organisms ranging from bacteria (Kitagawa et al., 2005) and yeast (Huh et al., 2003) to human cells (Sigal et al., 2007). These libraries allow widespread use of fluorescently tagged proteins in systematic microscopic screens. In yeast, the generation of custom-made fusion protein libraries has been facilitated by automated mating technologies that allow integration of a genetically encoded probe (or any other genetic modification) into any systematic yeast library in a matter of weeks (Cohen and Schuldiner, 2011; Tong and Boone, 2006).
A demonstration of how both acquisition and analysis can be simplified by utilizing available systematic libraries and by combining several probes emitting different colors is provided by a study on the yeast ubiquitin ligase Grr1. To find new targets for Grr1, the yeast collection of all GFP-tagged proteins (GFP library) was screened for strains with increased fluorescence in a Δgrr1 background relative to that in a wild-type strain; higher expression of GFP-tagged proteins in these strains indicates a reduced degradation rate. Thus such proteins are prime candidates for being Grr1-modified substrates. To increase the accuracy of the method the authors created an internally controlled three-color system that allowed auto-focusing of the microscope using the CellTracker Blue dye and simultaneous measurement of GFP levels in an RFP-labeled population of Δgrr1 strains versus GFP levels in a non-RFP labeled population of normal control cells (Benanti et al., 2007).
In addition, fusion proteins can act as markers for whole organelles in many cases. For example, in order to investigate yeast spindle morphogenesis, GFP-tagged tubulin was inserted into a yeast deletion library (Vizeacoumar et al., 2010). Another example, in human cells, is the use of a GFP-tagged version of core histone 2B, which acts as a marker for chromosomes in all cell cycle stages, to uncover genes involved in chromosome segregation during cell division using a high-throughput RNAi time-lapse screen (Neumann et al., 2006). In both examples mentioned above the protein that was GFP-tagged was not of interest in itself, rather it was used as a marker to follow the organelle or cellular structure in which it resides (i.e. spindle or chromosomes).
In some cases, the marker protein for an organelle can itself cause a genetic perturbation with a known phenotype, thus allowing identification of regulatory mechanisms that govern that phenotype. For example, the overexpression of a GFP-tagged version of Hmg2 (GFP–Hmg2) has led to the identification of the pathways that are required for the formation of Hmg2-induced membrane proliferations in the endoplasmic reticulum (ER). By introducing the overexpressed GFP–Hmg2 into a yeast deletion library and screening all 4700 mutants for their ER architecture (which was visualized by the ER localization of GFP–Hmg2 itself) it was possible to assess how overexpression of this protein affects the organelle (Federovitch et al., 2008).
Genetically encoded fluorescent proteins can additionally be used as reporters for promoter activity by simply placing a fluorescent protein under regulation of endogenous promoters. This type of approach has been carried out in the nematode worm Caenorhabditis elegans to map the developmental control and tissue specificity of various promoters (Hunt-Newbury et al., 2007), as well as in bacteria (Zaslaver et al., 2006) and yeast (Bell et al., 1999; Jonikas et al., 2009).
In addition to classical fluorescent proteins, it is also possible to use enzymatically activated fluorescent labels, such as HaloTag. HaloTag is a protein that binds specific fluorescent ligands and has already been shown to be suitable for high-throughput approaches (Benyounes et al., 2011; Los et al., 2008). In the event that a large proteinacious marker interferes with protein function, it is also possible to fuse the short sequence of the FlAsH (fluorescein arsenical hairpin) receptor peptide (FRP) that binds the FlAsH reagent (Griffin et al., 1998).
Labeling of mRNA and DNA
RNA and DNA are most commonly labeled through hybridization with fluorescently tagged sequence-specific probes. An example of this type of technique is the use of high-resolution fluorescent in situ hybridization (FISH) to comprehensively evaluate mRNA localization dynamics during Drosophila melanogaster embryogenesis (Lecuyer et al., 2007).
In addition, mRNA distribution can be followed by genetically encoding specific binding sites for RNA-binding proteins into RNA molecules. Such methods allow specific mRNA visualization in living cells at a subcellular resolution and hold the promise to open exciting avenues for screening of factors that affect mRNA localization and stabilization using live-cell imaging. For example, MS2 loops are short sequences that serve as a binding site for the RNA-binding MS2 coat protein (MS2-CP). These sequences can be inserted into untranslated regions and coexpressed with a coat protein fused to GFP. GFP–MS2-CP binding to the MS2 loops thereby enables in vivo labeling of the RNA molecule (Haim-Vilmovsky and Gerst, 2011; Lionnet et al., 2011).
Commercially available lipid dyes are widely used to study lipid accumulation and metabolism in many different model organisms. For example, the Nile Red dye, which selectively stains lipid droplets, has been used in live C. elegans to identify genes (Ashrafi et al., 2003) or small molecules (Lemieux et al., 2011) that regulate fat storage. In another attempt to understand the basic biology of lipid droplets formation, a screen was carried out in human liver cells that were stained with a lipid droplet imaging kit to identify microRNAs (miRNAs) that regulate their accumulation (Whittaker et al., 2010).
Lipid dyes have also been used to study various metabolic diseases. For example, the sterol dye filipin has been used to stain cells with lipid storage defects in order to screen for compounds that partially revert their cholesterol accumulation defect (Pipalia et al., 2006). In another example, a fluorescent analog of lactosylceramide (BODIPY–LacCer) was used to follow lipid accumulation in fibroblasts from patients with different lipid-storage diseases (Chen et al., 1999).
Interestingly, lipid dyes can also be used as a simple means to stain organelles (see the section below) or cell membranes. For example, the lipophilic dyes DiI and DiO have been used in a screen for zebrafish mutants with retinotectal projection defects (Baier et al., 1996). Unfortunately, there are many more lipid species than there are available specific conjugated dyes for them. The lack of reagents to study specific lipid moieties and chain lengths means this field lags behind the imaging of proteins in terms of its capacity to be assayed by HCS methods.
Fluorescent analogs of sugars are still sparse and limit the use of HCS for investigating sugar metabolism in cells. One example for their use is a HCS that measured glucose uptake in cancer cell lines by using a fluorescent 2-deoxyglucose analog. These direct, quantitative measurements allowed the full spectrum of metabolic variability within populations of tumor cells to be dissected in high resolution in vitro (Hassanein et al., 2010).
Cellular metabolite and ion labeling
Fast modulation of metabolites and ion gradients (e.g. of neurotransmitters, calcium ions, cAMP etc.) in the cell has a key role in the regulation of many signal transduction pathways and is essential for maintaining cellular homeostasis. Several approaches have been developed to follow the dynamics of changes in metabolite and ion concentration in vivo with high temporal and spatial resolution. One of them makes use of chemical sensors that become fluorescent upon ion binding. For example, a screen for molecules that affect the cholecystokinin 1 (CCK1) receptor was performed by staining cells with the Ca2+-activated dye Fluo-4AM (Staljanssens et al., 2011). In another approach genetically encoded sensors, such as GCaMP for Ca2+, pHluorins for pH and Grx–GFP for glutathione measurements, can be used to monitor changes inside the cell (reviewed by Okumoto, 2010). For example, the genetically encoded yellow fluorescent protein (YFP)-based Cl− sensor YFP H148Q/I152L has been used in a screen to identify proteins that rescue the phenotype created by a mutated cystic fibrosis transmembrane conductance regulator (CFTR) channel (Trzcinska-Daneluti et al., 2009).
As discussed above, it is possible to use fluorescently tagged proteins that reside in a specific organelle to study organelle morphology and dynamics using high-throughput microscopy. An additional approach to visualizing organelles is the use of commercially available organelle-specific dyes. Organelle-specific dyes can replace the need for the time-consuming creation and integration of fluorescently labeled proteins. Indeed, yeast high-throughput screens to uncover proteins with a role in organelle biogenesis and inheritance have used Rhodamine B hexyl ester for labeling mitochondria (Dimmer et al., 2002), FM4-64 for labeling the vacuole (LaGrassa and Ungermann, 2005) and BODIPY 493/503 for labeling lipid droplets (Szymanski et al., 2007). Furthermore, organelle-specific dyes can be implemented as reporters for different biological processes. For example, LysoTracker, a marker of acidic compartments, has been used in a screen to identify Mycobacterium tuberculosis genes that are involved in phagosome maturation arrest in host macrophages: because the surface area of the acidified compartment marked with LysoTracker was shown to be directly proportional to the amount of bacterial particles within the compartment, the need for additional staining of the bacteria was relieved (Brodin et al., 2010).
HCS for identifying additional fluorophores
Interestingly, HCS can itself be applied in order to identify new fluorophores that recognize specific cellular domains or states. For example, in the search for a fluorophore that allows researchers to distinguish myoblasts from differentiated myotubes, 1606 optically active compounds were screened using murine myoblasts and myotubes. Six compounds with the desired properties were identified in the screen, and one of them was further tested in a pilot screen for myogenesis inhibitors (Wagner et al., 2008). In another example, a combinatorial library of 125 fluorescent styryl molecules was screened to identify RNA probes that were specific to the malaria parasite, and three RNA-binding dyes that revealed the morphology of the live parasite were identified (Cervantes et al., 2009).
In summary, the ever-growing list of fluorescent labeling technologies, and the ability to combine several of them in a single experiment in an expanding number of systems, allows new aspects of cell biology to be explored.
Optimizing the choice of high-content imagers
Apart from sample design and preparation, an important feature that controls both the content and the throughput obtained from a given experimental set-up is the type of microscope or high-content imager that is used. In the past years there has been vast technological progress in the development of fast automated microscopes that are especially designed for HCS. The common denominator of such systems is that they possess sample positioning and fast autofocusing that can be hardware and/or image based. In an image-based autofocus, several images in different z-planes of the object are acquired automatically, followed by an online image analysis that defines the plane of focus. A hardware-based autofocus usually uses optical methods to identify the bottom of the surface on which the cells are plated, and images are subsequently acquired at a fixed offset from that point (Starkuviene and Pepperkok, 2007). Each system provides its own combination of optical features, containing a confocal and/or widefield microscope with various lenses that defines the maximal magnification and resolution of the acquired images. These systems are usually provided as part of an HCS platform or are designed to be compatible with options for automation, such as a liquid handler and incubators that enables live- and/or fixed-cell imaging [Table 2, modified from those published in previous reviews (Zanella et al., 2010; Cohen and Schuldiner, 2011)].
When choosing a system, it is important to remember that image-based autofocusing is relatively slow (in comparison with hardware autofocusing) and can cause bleaching, but that it allows the acquisition of in-focus images even when the cells are irregularly plated (Dragunow, 2008; Starkuviene and Pepperkok, 2007). Because live-imaging depends on the ability to maintain samples in a competent state, the maximal speed of the microscope can dictate the ability to image live or fixed preparations. Therefore, any screen should be designed according to the capabilities and limitations of the system.
Although certain similarities exist between the different high-content imagers, each system is unique, making it more suitable for certain applications than for others. For example, in order to take images of thick samples, such as zebrafish embryos (Vogt et al., 2010), it is advisable to use systems that allow confocal visualization. By contrast, a platform that provides an advanced flow cytometer with imaging capabilities is the method of choice for analyzing images from a large number of cells in suspension. In addition, this system has the advantage of allowing muliparametric analysis of the data. This enables powerful statistical analysis and the ability to work with samples that change their properties upon adhesion, such as cells of the immune system. Indeed, using this type of system means it is even possible to quantify rare events, such as apoptosis or protein translocation [more details are given elsewhere (Zuba-Surma et al., 2007)].
Systems that allow live-cell imaging and are coupled with liquid-handling robots enable the study of highly dynamic processes, such as cell migration and mitosis. The ability to monitor live cells during a timecourse from the moment of perturbation also allows primary defects and secondary effects, which might cause a phenotype, to be distinguished from one another (Neumann et al., 2006). An example of optimal utilization of time-lapse capabilities in such set-ups is a recent study that performed an RNAi screen in human HeLa cells expressing fluorescently labeled chromosomes. The screen was unique in that it used a special workflow that coupled solid-phase transfection by siRNA microarrays to automatic time-lapse microscopy and thereby made it possible to follow the cells throughout the 48 hours following transfection (Neumann et al., 2010). To score such an enormous amount of data, phenotypes, including cell division, proliferation, survival and migration, were analyzed with a computational pipeline (Walter et al., 2010a). The fact that the entire screen was performed using time-lapse microscopy demonstrates the great advancement in the field of high-throughput microscopy, especially when compared with earlier work in which time-lapse imaging was performed as a secondary screen (Kittler et al., 2007; Kittler et al., 2004).
Interestingly, HCS can also be used to detect dynamic processes even in the absence of live imaging. For example, cell migration can be assayed by HCS in fixed samples, such as assaying for cells that migrate across a membrane (Mastyugin et al., 2004), by analyzing phagokinetic tracks (Naffar-Abu-Amara et al., 2008) or in wound healing assays (Yarrow et al., 2004). Alternatively, a large number of cells can be used to identify ‘stages’ of a process across a fixed population, such as in a heterogeneous population of yeast in various cell cycle stages, which can be identified by the size of their bud.
The diversity of available HCS platforms promotes the ability to maximize the content of screens without limiting the throughput achieved, even in cases where sample design and preparation is complicated and requires special attention, as in the experiments that combine siRNA transfections with time-lapse analysis mentioned above or in experiments that examine animal behavior.
Analysis solutions to maximize throughput and data mining
“An image is worth a thousand words”. Although this saying was coined long before automated microscopy was developed, it expresses the notion that HCS can produce enormous amounts and diverse types of data. It is not by chance that the terms ‘high-throughput microscopy’ and ‘HCS’ have been used interchangeably in the field. This is because the main goal of high-throughput microscopy is to reach a state at which highly resolved functional and morphological information (or ‘content’) can be extracted from populations of individual cells. This requires quantitative analysis at a single-cell level to be carried out for the whole of the population being imaged. Therefore one major challenge, which parallels the drive to develop faster and more sensitive acquisition systems, has been to formulate new and accurate methods of data extraction and analysis.
Manual image analysis
Manual image analysis is labor intensive, slow and carries the risk of human errors and bias (Huth et al., 2010). However, most probably the greatest shortfall of the human eye is that it is not quantitative and cannot always distinguish subtle effects. Therefore, manual analysis alone cannot capture the entire content that is found in microscopic screens, such as small changes in mean overall intensity and shifts in distribution of a phenotype across a population (noise), and it cannot decide on a confidence score for its findings. For such reasons, it is very important to develop automated image analysis to increase not only throughput but also content.
Automated image analysis
The primary steps in automated analysis solutions usually include image pre-processing (such as background subtraction), segmentation (for example threshold, watershed and edge-detection-based segmentation) and measurements of the required phenotype in the selected cells by classification (using parameters such as intensity, size or texture) (Walter et al., 2010b; Wollman and Stuurman, 2007). The more quantitative and accurate the acquired data are, the better they serve as a platform for sophisticated data and statistical analysis, which can then be used to identify insightful dependencies between unexpected parameters.
Most high-content imagers are supplied with analysis software (often termed ‘turn-key’ solutions) (Table 2). The analysis software usually contains a set of readymade solutions for standard applications of the high-content imager such as quantifying the translocation of proteins from the cytoplasm to the nucleus (Borchert et al., 2005; Dull et al., 2010; Granas et al., 2006; Kau et al., 2003; Link et al., 2009; Straschewski et al., 2010; Xu et al., 2008; Zanella et al., 2007; Zanella et al., 2008). The combination of high-content imagers and integrated analysis programs, has contributed to many applications of HCS in both basic biology and in drug discovery (reviewed by Dragunow, 2008; Bullen, 2008). For example, an automated imaging platform alongside its inherent software has been used in a screen of small molecules aimed to identify inhibitors of the p38 [mitogen-activated protein kinase 14 (MAPK14)] pathway. To measure p38 pathway activation, the nuclear (active) fraction of the MAPK-activated protein kinase-2 (MAPKAPK2, also known as MK2) fused to enhanced GFP (MK2–EGFP) was measured and quantitatively reported relative to the cytosolic (non active) fraction (Trask et al., 2009).
A more powerful turn-key solution is provided by cytometry-like data analysis programs, which allow parameters to be viewed in a quantitative manner on a large number of graphs and plots, as well as populations to be gated and population level analysis to be performed. Such analysis programs are supplied, for example, by the ImageStreamx flow cytometer and HCS unit and the ScanR analysis software. In these systems, images and their representation on graphs and plots are linked, which gives extra power to the person that manually inspects the data. Using this software, the manual inspector can visualize the image represented by any data point on a graph at the click of a button. An example that highlights the strength of such single-cell-based analyses is a study that aimed to establish a robust HCS assay to investigate the pharmacological activities of bacterial extracts on eukaryotic cells. In this study, the ScanR microscopy and analysis system was used to determine the ploidy and vitality of ovarian insect cells grown in the presence of crude extracts of ten different Myxobacteria cultures (Jensen et al., 2009).
However, for more sophisticated analysis requirements, new custom-made computational tools must be created. Such custom-made algorithms can be added onto the turn-key analysis platforms or built independently. One example of such an independent complete platform, which integrates both image and data analysis, originated from a HCS with the aim to identify molecular pathways involved in the assembly of focal adhesions (Winograd-Katz et al., 2009). To analyze the results obtained from an siRNA screen, these authors created a data pipeline that included: an image database to store the large amounts of data accumulated in the screen, a visualization module to allow displaying of selected images, image processing, image segmentation, statistical analysis and multiparametric scoring of changes in treated cells for comparison with those from control cells (Paran et al., 2006). The multiparametric image and clustering analyses allowed identification of different gene families whose perturbation induced similar effects on focal adhesions, thus revealing major correlations that could not otherwise have been described (Winograd-Katz et al., 2009). Another example comes from the drug-profiling field, where multiparametric analysis platforms were tailored to meet the complexity of analyzing HCS for drug effects on single cells (Loo et al., 2007; Perlman et al., 2004).
A major contribution to the field is the fact that the advanced applications that are published are often also freely distributed. This should allow more general use of both sophisticated image analysis and data analysis. For example, CellProfiler is a free open-source software for automatic quantitative measurement of phenotypes from thousands of images (Jones et al., 2008; Kamentsky et al., 2011). The ability to distinguish between a multitude of phenotypes is enabled owing to the development of classifications achieved with machine learning algorithms. These algorithms allow the computer to learn how to recognize complex patterns in an experimental group on the basis of examples from both positive and negative control groups. Using such approaches, it is possible to label many different objects in the image and assign them to one of several possible phenotypes. Freely available programs that are able to perform this feature, such as Enhanced Cell Classifier, allow rapid analysis of complex phenotypes (Misselwitz et al., 2010). An additional analysis software program that is available upon request is CalMorph, a high-throughput image-processing program that was specifically developed for morphological analysis of the yeast Saccharomyces cerevisiae (Negishi et al., 2009; Ohtani et al., 2004). TimeLapseAnalyzer (TLA) is a free program package designed for live-cell image analysis, including applications such as wound healing, multiple cell tracking, cell counting and proliferation quantification (Huth et al., 2011). Yet another example of freely available software is The Parallel Worm Tracker, which is a publicly available automated tracking system that was developed to quantitatively measure the locomotion of multiple individual nematode worms in parallel (Ramot et al., 2008).
In summary, according to the phenotype being screened and the microscopic system being used, it is essential to explore the different options for data analysis in order to select the optimal solution to ensure that data of the highest quality is extracted from the images, which in turn can lead to new biological insights (Fig. 2).
Sharing the wealth
By their nature, high-throughput approaches generate enormous amounts of data, which are often organized in online databases that are available to the community [e.g. The Stanford Microarray Database (Sherlock et al., 2001)]. The accumulation of so many databases has led to the foundation of meta-databases and designated journals [e.g. Databases (Bader et al., 2006; Wren and Bateman, 2008)].
To this end, one of the goals of the HCS community is to develop platforms for data sharing. However, a single HCS generates greater amounts of data (in terms of disk space required for storage) than thousands of microarray experiments. This sheer computational weight makes data sharing much more difficult. Nevertheless, it is of great importance to share images that could be retrieved by textual annotations. As has been the case in other high-throughput fields, separating data acquisition from data analysis allows pre-existing data to be re-analyzed when more sophisticated analysis programs become available. This should provide the opportunity for other scientists to use the same images to answer different biological questions, thereby potentially revealing additional phenotypes that were not uncovered in the original analysis scheme. Generating databases that contain the original images also allows separate computational research groups to develop sophisticated image and data analysis software for the existing data. For example, the yeast GFP fusion localization database allowed the development of automated yeast image analysis (Chen et al., 2007).
Indeed, some screening efforts have created databases to share the raw data with the entire scientific community [for example (Gregan et al., 2005; Hunt-Newbury et al., 2007)]. Some of the more general screens are even integrated into platforms that offer additional information on the genes and phenotypes of choice. Two useful examples of such databases are the Yeast GFP Fusion Localization Database, which is run by the University of California, San Francisco (Huh et al., 2003) and the database on epithelial cell migration, which is hosted by the Cell Migration Consortium (Simpson et al., 2008). As more screens are published, the HCS community should strive to make such efforts the norm.
HCS platforms provide a rare opportunity to gather functional and morphological information on populations of single cells under various conditions. Despite the temptation to acquire large amounts of data using the available automated technology, throughput must not come at the expense of content. Doing so might risk compromising the scientific insights that can be obtained from the screen, thus reducing its relevance. Increasing the content alongside maintaining throughput can be achieved at all levels of screen implementation: from sample design and preparation through to the hardware and software characteristics of the high-content imager system.
The great technological accomplishments in the field bring the promise that the future development of tools will increase the diversity of the biological questions that can be answered using HCS platforms. From the perspective of labeling techniques, for example, as more diverse and specific fluorescence probes are created to measure ions, lipids and sugars, it will be possible to follow more cellular variables than previously possible. From a hardware point of view, it is appealing to speculate that high-resolution microscopes will also be adjusted to high-throughput experimental systems in the future. From the perspective of the available software, more sophisticated modes of data storage, sharing and analysis are being produced daily, which indicates that a vast number of analysis opportunities still exists.
The ability to specifically label almost any cell component and visually follow it spatially and temporally holds great potential. Indeed, HCS can be applied to answer a wide variety of biological questions ranging from purely basic science to pharmacological research, as we have discussed extensively in this Commentary. Most importantly, the real goal of any HCS is to generate hypotheses that can, and should, be followed up using low-throughput hypothesis-driven approaches. Only such in-depth explorations of the mechanisms of the cellular and organismal functions will ensure that HCS efforts indeed become a powerful tool for generating high-input, high-throughput and outstanding output biology.
The authors wish to deeply thank Benjamin Geiger for his profound insights in this field and for fruitful discussions. We wish to thank Tzvi Kam, Galit Cohen, Silvia Gabriela Chuartzman, Tslil Ast, Michal Breker, Ido Yoffe and Oren Schuldiner for critical reading of the manuscript.
Our work is supported by an EU Marie Curie FP7 re-integration grant [grant number 239224].