‘Omics’ methods, such as transcriptomics, proteomics, lipidomics or metabolomics, yield simultaneous measurements of many related molecules in a sample. These approaches have opened new opportunities to generate and test hypotheses about the mechanisms underlying biochemical and physiological phenotypes. In this Commentary, we discuss general approaches and considerations for successfully integrating omics into comparative physiology. The choice of omics approach will be guided by the availability of existing resources and the time scale of the process being studied. We discuss the use of whole-organism extracts (common in omics experiments on small invertebrates) because such an approach may mask underlying physiological mechanisms, and we consider the advantages and disadvantages of pooling samples within biological replicates. These methods can bring analytical challenges, so we describe the most easily analyzed omics experimental designs. We address the propensity of omics studies to digress into ‘fishing expeditions’ and show how omics can be used within the hypothetico-deductive framework. With this Commentary, we hope to provide a roadmap that will help newcomers approach omics in comparative physiology while avoiding some of the potential pitfalls, which include ambiguous experiments, long lists of candidate molecules and vague conclusions.
A goal of comparative physiology is to understand the diverse mechanisms that allow animals to function and survive in their environments. Comparative physiologists (1) inquire vertically across levels of biological organization, (2) connect processes horizontally among organisms within ecosystems (e.g. host–pathogen interactions), and (3) aim to understand how physiological systems adapt in response to changing environments (see Mykles et al., 2010; Somero et al., 2017). A decade (or so) ago, there was a disconnect between the tools available for the non-model organisms favored by the Krogh principle, and the burden of proof that could be shouldered by transferring those questions to established model organisms (Dow, 2007). Now, high-throughput omics is routine in non-model species, genetic manipulation via RNA interference (RNAi) is almost de rigueur in some taxa (e.g. Howlett et al., 2012; Huang et al., 2018; Maori et al., 2019), CRISPR/Cas9 allows genomic manipulation in non-models (e.g. Gui et al., 2020) and genetic models can be established (relatively) quickly (for discussion, see Matthews and Vosshall, 2020). However, the core use of omics (largely transcriptomics, metabolomics and proteomics; see Glossary) in comparative physiology remains exploratory, identifying candidate molecules and processes – and thus allowing researchers to generate hypotheses – rather than revealing the mechanisms that underlie physiological phenomena. We argue that the latter should drive the questions of ‘when’ and ‘why’ to use omics in comparative physiology.
In this Commentary, we draw on our experiences – and especially our own mistakes – to build a road map for incorporating omics into comparative physiology. We do not address the strengths, weaknesses or ‘promise’ of specific methods or technologies – this is covered and criticized elsewhere (e.g. Dow, 2007; Karahalil, 2016; Madr et al., 2017; Suarez and Moyes, 2012), and rapid changes in technology would render our thoughts obsolete before publication. Nor do we presume to preach to the many comparative physiologists already making sterling use of omics. Rather, we aim to provide an entry point for comparative physiologists who are considering incorporating omics into their studies for the first time. We hope that this Commentary will be useful for designing and interpreting experiments, and for tempering expectations: if there is anything we have learned as a discipline, it is that omics seldom yields clear answers. Because it is our area of focus, we center our examples around invertebrate thermal biology, but the principles presented here should be applicable across comparative physiology. Opinions and recommendations are, of course, our own.
What are ‘omics’?
For the purposes of this Commentary, we define omics as any technique that identifies and quantifies a suite of sub-cellular molecules in a high-throughput fashion (Raghavachari, 2011). Although technology has increased the sophistication of these methods, comparative physiologists have long addressed questions using multivariate and integrative techniques. For example, understanding metabolic flux requires the simultaneous measurement of the activity of many enzymes (Darveau et al., 2005; Driedzic and Hochachka, 1976; Fernández et al., 2011) and understanding changes in membrane dynamics during thermal stress requires the characterization of membrane fluidity, lipid composition and activity of membrane-associated proteins (Biederman et al., 2019). Thus, these multi-molecule approaches have a longer history in the field than their recent categorization as ‘omics’.
Omics approaches vary in their functional significance. For example, genomics (see Glossary) can address population-level structure (Kovach et al., 2015) and evolutionary questions (Mock et al., 2017); epigenomics can unveil regulatory mechanisms (Glastad et al., 2019) and transgenerational plasticity (Ho and Burggren, 2010); and microbiomics (usually microbial metagenomics, e.g. Ren et al., 2020) can reveal how microbes influence the phenotypes of their hosts (e.g. Moriano-Gutierrez et al., 2019). However, comparative physiologists tend to focus on functional omics such as transcriptomics, proteomics, lipidomics and metabolomics (Fig. 1), because they most directly connect to organismal and cellular function. Many of these omics tools are available off-the-shelf from private companies or central platforms, most of which will perform some level of analysis (for a fee, of course). We caution that commercial (usually biomedical) laboratories seldom have the resources to optimize their protocols for non-model systems; as a result, the results can be unsatisfactory or unusable. Identifying a collaborator with a vested interest in the question or organism is typically more effective, especially for metabolomics and proteomics. Nevertheless, generating some level of omics data for almost any question about an organism is now relatively affordable and accessible, whether it is a traditional model system or not.
Gene co-expression network
An undirected graphic that shows which genes or gene products have correlated expression profiles. Co-expressed genes are of biological interest because they likely indicate shared regulatory control or functional relationships.
De novo transcriptome assembly
The assembly of short sequences into a transcriptomic sequence without the use of a reference genome.
Differential expression analysis
A method of taking RNA sequencing data and conducting a statistical analysis to evaluate quantitative changes in expression between experimental groups.
Gene ontology (GO)
A system of classification used to characterize gene and gene-product functions across species under a unified vocabulary. This database is often used to categorize groups of genes with similar function during omics analyses. See http://geneontology.org/.
The study of the structure and function of all the genetic information encoded in an organism's DNA.
Genome-guided transcriptome assembly
A method of transcriptome assembly that relies on existing genomic DNA sequence to align and assemble RNA sequencing reads.
KEGG (Kyoto Encyclopedia of Genes and Genomes)
A bioinformatics database linking genomic information to higher-level cellular processes such as metabolism, cell cycling and signal transduction. See https://www.genome.jp/kegg/.
The characterization of a large number of metabolites in a sample. Untargeted metabolomics is the global analysis of all molecules of a similar type (e.g. polyamines) independent of known identities. Targeted metabolomics, in contrast, quantifies a suite of molecules that have already been identified.
The identification and functional characterization of genes and gene products based on sequence similarity and assumed functional conservation among species.
A subcategory of proteomics that identifies changes in the phosphorylation state of proteins.
Identification and quantification of the entire set of proteins that is produced or modified by an organism.
A template nucleic acid database that contains all or most of the genomic DNA sequence of an organism.
RNA sequencing (RNA-seq)
A high-throughput sequencing technique used to identify and quantify all expressed genes (i.e. mRNA) of an organism at a given point in time.
The study of the complete set of RNA molecules transcribed by an organism using high-throughput methods such as RNA-seq and microarray.
A typical omics workflow, such as a study using RNA-seq (see Glossary), would start with (1) a simple experimental design that will facilitate pair-wise comparisons in the analysis; (2) extraction of the relevant molecules from a sample; (3) data processing [for RNA-seq, this might include genome-guided transcriptome assembly (see Glossary)]; (4) analysis [for RNA-seq, this might include differential expression analysis (see Glossary) to identify transcript expression that varies in response to a treatment]; and (5) data visualization and interpretation (often with the help of functional databases or pathway-mapping tools). These high-throughput techniques can yield long lists of ‘candidate’ molecules or pathways, especially when the experiments are intended to be hypothesis generating. For this reason, omics studies have a reputation as ‘fishing expeditions’, and it can be difficult to publish omics-only studies unaccompanied by additional experiments. How, then, do we effectively use omics in comparative physiology? We argue that a successful omics study in comparative physiology has (1) an experimental design that facilitates inquiry at and across omics-levels, (2) sophisticated interpretation of results that moves past the generation of ‘candidates’, and (3) an emphasis on hypothesis evaluation and/or explicit hypothesis testing.
Designing experiments with omics in mind
Elegant experimental design remains a cornerstone of comparative physiology. Omics is now readily affordable, such that cost is increasingly unlikely to constrain experimental design. However, there are practical and design-related decisions that must be made before embarking on an omics experiment in comparative physiology. Here, we break down some of these important decisions.
Availability of existing omics resources
Some non-model systems already have omics tools available. The existence of reference genomes (see Glossary), transcriptomes or metabolite databases diminishes some of the challenges that come with starting from scratch. However, although a high-quality reference genome or transcriptome will simplify RNA-seq workflow, published genomes vary widely in their completeness and quality (for discussions see: Hanschen and Starkenburg, 2020; Mardis et al., 2002; Seppey et al., 2019). If a reference genome is unavailable, de novo transcriptome assembly (see Glossary) is necessary, but relatively straightforward (e.g. Duan et al., 2015; Poelchau et al., 2011; Toxopeus et al., 2018). However, there are drawbacks associated with incomplete (i.e. draft-level) reference genomes and de novo-assembled reference transcriptomes: it is much more difficult to detect the involvement of regulatory elements such as promoters or alternative splicing events (Conesa et al., 2016), and some epigenomics techniques may be of little value without a high-quality reference genome (Cazaly et al., 2019).
Other omics techniques, such as metabolomics and proteomics, also benefit from established resources. Accurate identification of metabolites and proteins (as opposed to metabolite or protein ‘features’, which may simply be peaks in a spectrogram) is critical for informed analysis and interpretation of results (Calvete, 2014; Pinu et al., 2019). Model systems benefit from sequenced genomes and transcriptomes (which facilitate protein identification; e.g. Modahl et al., 2018), and established metabolite libraries and databases (e.g. MetaboLights, a cross-species database which covers metabolite structure and reference spectra; Haug et al., 2020). For non-models, metabolite libraries may need to be developed in-house and protein identification must rely on orthology-based approaches (see Glossary), which can limit conclusions. For example, MacMillan et al. (2016) were limited to 34 identified metabolites in D. melanogaster because of a small reference panel of metabolites associated with their untargeted metabolomics, whereas Hariharan et al. (2014) identified over 1000 metabolites in the same species with a larger reference library. Although a small reference panel decreases inferential power in untargeted metabolomics studies, if the hypothesis being tested predicts specific patterns of response, then identifying all the metabolites is less important than identifying large-scale patterns and the key drivers of those patterns; alternatively, a small reference panel of highly relevant metabolites will suffice. Characterizing the features in proteomics often relies on matching peptide fragments to a reference genome (Yang et al., 2019). In the absence of high-quality reference information, this process depends on orthologous relationships with existing resources, which rapidly becomes less precise as the references are less closely related to the study species (Armengaud, 2016).
Time scales of biological processes influence the choice of omics
Metabolic and gene regulatory networks drive cellular-level responses to stimuli on distinct time scales (Shamir et al., 2016). RNA-seq is widely used by comparative physiologists because it is affordable and sensitive, and turn-key analyses are available. However, biological processes occur over a range of timescales. In eukaryotes, transcriptomic or proteomic responses (requiring ∼5–10 min or tens of minutes, respectively) are too slow to regulate, for example, metabolic flux, which requires control via allosteric interaction and post-translational modification (Milo and Phillips, 2015). In ectotherms, these processes will also be temperature dependent. We suggest that omics choice should be governed by the timescale of the process: biological processes that take minutes might require, for example, phosphoproteomics (see Glossary) or metabolomics, those on the order of tens of minutes to hours are well-suited to transcriptomics and proteomics, generational-scale phenomena might be best addressed in the epigenome, and longer-term (evolutionary) processes might require input from the genome (Fig. 1).
As an example in insects, a brief (minutes) exposure to mild low temperatures increases tolerance to more extreme cold exposures, a process called rapid cold-hardening (RCH; Teets et al., 2020). However, transcriptomic signatures of RCH have been elusive: for example, Zhang et al. (2011) identified only 20 genes upregulated after a RCH-like treatment in D. melanogaster. In retrospect, this is unsurprising. Rapid cold-hardening can be induced in as little as 10 min, a timeframe that is unlikely to be influenced heavily by transcriptional regulation – in fact, RCH does not even require protein synthesis (Misener et al., 2001). However, such rapid changes fall within the timeframe captured by metabolomics and phosphoproteomics, which revealed gluconeogenesis, cryoprotectant synthesis and phosphorylation of cytoskeletal and stress response proteins during RCH in Sarcophaga bullata (Teets and Denlinger, 2016; Teets et al., 2012). By contrast, low-temperature acclimation over 6 days also improves D. melanogaster cold tolerance, but these physiological changes are accompanied by differential expression across one-third of the transcriptome (MacMillan et al., 2016). Thus, transcriptomic-level inquiry, which is a common first step into omics studies, may not be appropriate for all physiological processes.
Choice of sample material could restrict inference
Animals integrate and partition physiological processes among tissues and cells. Tissue-level resolution in omics is especially important when tissues within a system respond differently to a given stimulus. In insects, for example, water balance is governed by the balance of excretion by the Malpighian tubules and absorption by the hindgut (Nation, 2015). Increased expression or activity of sodium pumps would thus increase active transport in the hindgut but decrease transport across Malpighian tubules (Des Marteaux et al., 2018a), but the antagonistic responses would be masked in a sample that includes both tissues.
While vertebrate biologists have almost always used specific tissues (e.g. the liver, brain or gills; Akashi et al., 2016; Windisch et al., 2014; Zhang et al., 2015), invertebrate comparative physiologists often homogenize entire animals (e.g. Deng et al., 2018; Des Marteaux et al., 2019; Robert et al., 2016; Torson et al., 2017; Zhang et al., 2011). This whole-animal approach is forgivable for very small animals, fast physiological processes (i.e. those that proceed more quickly than the time needed to dissect out tissues) or large experiments, and has been used to successfully generate testable hypotheses (e.g. Gleason and Burton, 2015; MacMillan et al., 2016; Meyer et al., 2011; Poupardin et al., 2015; Teets et al., 2012; Torson et al., 2015). However, we argue that, at best, using whole animals limits the capacity to draw inferences about sub-organismal processes, and at worst, may obscure some processes entirely. When whole-animal sampling is unavoidable, additional care can be inserted into data processing. For example, bacterial, fungal and plant contamination included with the gut tissue can be identified through matches to databases and tools such as DeconSeq (Sangiovanni et al., 2019; Schmieder and Edwards, 2011). It is possible to extend this argument further to include among-cell variation within a tissue (e.g. there are distinct cell types in the Malpighian tubules, Halberg et al., 2015), and techniques are available for single-cell omics (e.g. scRNA-seq; Kolodziejczyk et al., 2015), should these be necessary to answer a specific question. However, such single-cell approaches are technically challenging and likely impractical in many non-model systems. As a result, one needs to choose an appropriate tissue or cell type when designing omics experiments. These decisions require some working knowledge of the physiological system in question and careful logistical considerations related to optimizing sample preparations. Even if the fingerprint of the hypothesized response is observable in whole-organism homogenates, a tissue-specific approach will increase signal and reduce noise. To keep results repeatable, the sample collection also needs to be repeatable. For both single-cell and tissue-specific studies, this might require a formal detailed protocol in the methods, or even stand-alone methods descriptions (e.g. Torson et al., 2020).
More replicates are better, but pooling can reduce costs without losing (too much) signal
Balancing the difficulty of obtaining samples against the expected among-individual variation in a response defines the acceptable sample size for any experiment in comparative physiology (McNab, 2003). Three biological replicates are currently acceptable for comparative physiology omics experiments (Fig. 2A), but this is really an analytical minimum rather than an example of best practice for either statistical or inferential power. The cost of omics has been falling (e.g. genome sequencing; Wetterstrand, 2019), allowing larger sample sizes. This allows easier identification of outliers and, therefore, increased accuracy (Fig. 2B). When performing omics studies, one must consider the relative importance of among-individual versus among-treatment variation. In studies that focus on differences between treatments, small variations in the age or prior experience of individuals (especially if they were field-collected) is noise. By contrast, if differences among individuals are paramount (e.g. where there are precise phenotypic measurements for each individual), then the variation among individuals is signal. The best way to improve the signal-to-noise ratio and statistical power is to increase replication (Fig. 2C). However, pooling (Fig. 2D) can mask inter-individual variation, increasing the signal-to-noise ratio when comparing treatments at the cost of decreased statistical power and the capacity to detect individual differences. Thus, if individual variation is not a primary research objective then pooling is both cost-effective and improves the capacity to detect a treatment effect.
Although omics is getting cheaper, cost is still important, especially when the cost is per sample. For example, metabolomics usually requires sequential sample processing through chromatography/mass-spectrometry instruments. In these cases, the cost of replication scales linearly with sample size. In other technologies, such as RNA-seq, analyses are parallel, and many replicates can be sequenced on a single ‘lane’ or ‘cell’ (at the expense of number of reads per sample, i.e. sequencing depth). In this case, there is still a per-replicate cost for sample preparation, but adding more samples to a sequencing run bears no additional cost: the ability to detect differentially expressed genes plateaus after about 10 million reads (Liu et al., 2014), but, of course, this plateau increases if one is interested in isoform-level resolution. Thus, more biological replicates at a cost of sequencing depth returns the greatest statistical power per unit cost. If there is a per-replicate cost to acquiring omics data, then pooling individuals within biological replicates can increase the signal-to-noise ratio with a small sample size, as discussed above.
Simple experiments are still best
Analyzing omics output from complex experiments is not trivial. Most omics analysis pipelines such as the Tuxedo Suite for RNA-seq (Trapnell et al., 2013, 2012) and MetaboAnalyst for metabolomics (Chong et al., 2019) are optimized for pairwise comparisons. Experimental designs comparing two treatments or multiple treatments back to a common control (Fig. 3A) are easily analyzed under this framework. Pairwise tools are also appropriate for more complex designs that compare controls and treatments among tissues or populations (Fig. 3B; e.g. Des Marteaux et al., 2017). However, a typical ‘simple’ comparative physiology experiment might be a 2×2 design examining the interaction of two treatments (Fig. 3C). Ordinarily, we would analyze such data using some version of a two-way ANOVA. Pairwise comparisons allow the dissection of main effects, but not the interactions (which may, in fact, be the purpose of the experiment). Such designs can be analyzed (for RNA-seq, in any case) using generalized linear models, which are more complex than the turn-key analyses provided in many packages (McCarthy et al., 2012). Pairwise analyses are also ill-suited to time series data (Fig. 3D), since dividing a time series analysis into discrete pair-wise comparisons fails to account for temporal autocorrelation (Spies et al., 2019). However, there are tools for clustering temporal patterns of gene expression (e.g. maSigPro; Nueda et al., 2014) that can identify sets of genes that respond similarly over time (e.g. Toxopeus et al., 2018). The tools to address these complex designs are still in the early stages of their development and are challenging for those without great familiarity with omics analyses. Thus, there are currently no well-established standards and the most-common methods for complex analyses are often out-performed by pairwise differential expression methods (summarized by Spies et al., 2019). Nevertheless, the proverbial statistician's cry still holds true in the face of omics: ‘plan your data analysis before you do the experiment!’
Interpreting omics: moving beyond the fishing expedition
A key advantage of omics is that when few or no hypotheses exist, omics should allow the generation of testable hypotheses by identifying candidate molecules and pathways underlying physiological processes. However, this approach is often derided as a ‘fishing expedition’ (Ning and Lo, 2010). It may be argued that such omics approaches fall into the ‘hypothesis formulation’ stage of the classic scientific method (Doughty and Kerkhoven, 2020). Nevertheless, because large amounts of omics data can be acquired cheaply and easily, and the analysis can be challenging, an unkind criticism might be that the quality of the hypotheses generated has not generally been high. If this is indeed the case, how do we improve the quality of omics as a hypothesis-generating tool?
Most omics analyses generate long lists of candidate molecules as their output. In some cases, these candidates are themselves direct evidence of a molecule's involvement in a physiological process. For example, high concentrations of hemolymph glycerol in emerald ash borer (Agrilus planipennis) prepupae during winter strongly suggest that this metabolite is a cryoprotectant (Crosthwaite et al., 2011). Differential expression of genes associated with a transcriptionally regulated process (e.g. caspase-8 in apoptosis; Li et al., 2014) could directly imply that pathway's involvement in a phenotype. Alas, we are rarely so lucky in omics analyses. Often the evidence is indirect and requires either more-sophisticated omics-level analyses or additional experiments.
When faced with an overwhelming degree of differential expression from RNA-seq or proteomics experiments, many authors use orthology-based approaches to categorize their candidates into functional classes (using, for example, gene ontology, GO; see Glossary) or biological pathways (e.g. using the Kyoto Encyclopedia of Genes and Genomes, KEGG; see Glossary), and test for enrichment of these functions against a reference. For example, Zhang et al. (2020) evaluated differential expression of proteins using a metabolic pathway analysis to show that fatty acid degradation increased during diapause in rice water weevil, Lissorhoptrus oryzophilus. Because some of the tools used to categorize candidates overlap, there can be integration among omics datasets: for example, pathway outputs from metabolomics and transcriptomics can be compared (e.g. Košt'ál et al., 2016). However, there are caveats to these analyses: (1) not all genes/transcripts are annotated meaningfully, and in most datasets a significant number of genes and transcripts have no annotation or known function, which introduces bias into the processes that will (or can) be identified; (2) gene products could serve purposes that have not yet been functionally annotated (e.g. moonlighting proteins; Jeffery, 2014), which could lead to inaccurate conclusions about the involvement of a process in a phenotype; (3) gene ontology and pathway analyses rely heavily on the assumption of homologous relationships in gene-product function across taxa, which could allow novel functions or processes to be overlooked; and (4) taxonomic coverage is limited in most databases and heavily biased towards biomedical models, which means omics results from the more weird and wonderful Krogh models may miss key processes or functions (Gaudet et al., 2017).
One way to avoid the bias inherent in these functional analyses is to use clustering and gene co-expression network analyses (see Glossary) to identify associated groups of molecules or processes without a priori expectations about their function or pathway (Horvath, 2011; Scaria et al., 2016; Wu et al., 2003). For example, Riddell et al. (2019) used a co-expression network analysis to identify functional modules of gene expression and show that plasticity in the desiccation tolerance of a montane salamander correlates with suites of genes involved in blood vessel development. Thus, clustering and co-expression analyses can contribute directly to the generation of testable hypotheses about novel gene function, which can then be followed up with functional experiments. The implementation and interpretation of the results generated from these more advanced analytical tools requires a robust bioinformatics skill set.
Omics in a hypothetico-deductive framework
Designing and testing elegant, efficient and discriminatory hypotheses is the epitome of experimental science (Platt, 1964). Although the emergence of omics led to claims that this moved science to a post-hypothesis world (e.g. Ning and Lo, 2010), most omics still require follow-up ‘small science’ hypothesis-driven studies (cf. Alberts, 2012) to draw conclusions. What, then, is the role of omics in hypothesis-driven comparative physiology, and what opportunities for testing hypotheses do omics technologies afford?
Increasing the sophistication of omics-level analyses will increase the quality of the resulting hypotheses, but if an omics study generates hypotheses, reviewers and editors will probably expect the authors to test those hypotheses. We note that the hypothesis testing need not take place at the same level of organization – indeed, to do so would be overwhelming if there are tens or hundreds of potentially interacting candidate molecules, and there are limited (or no) genetic tools for the organism in question. However, omics-derived hypotheses can be readily tested using the tools with which physiologists are comfortable (Table 1). For example, transcriptomics suggested that cold-acclimated crickets remodel their hindgut cytoskeleton (Des Marteaux et al., 2017), which was then tested using microscopy (Des Marteaux et al., 2018b). Similarly, Whitehead et al. (2012) used transcriptomics to identify aryl-hydrocarbon receptor pathways upregulated after crude oil exposure in killifish and evaluated this response with immunohistochemical measurements of gill cytochrome p450 A1 protein. Thus, in these situations, omics are contributing directly to the integrative goals of the comparative physiology program, but the omics are used to generate, rather than test, the hypotheses (Table 1).
One approach to the use of omics in hypothesis-driven comparative physiology is to use omics data to evaluate a priori hypotheses. For example, MacMillan et al. (2016) articulated six hypotheses associated with the mechanisms underlying insect cold acclimation, and used a combination of metabolomics and transcriptomics to support four of those hypotheses. One of the other hypotheses concerned phospholipid properties (a potential mechanism that is likely to be neither transcriptionally regulated nor identifiable using a metabolomics approach that focused on polar metabolites). The remaining hypothesis was on the regulation of apoptosis: this process is transcriptionally regulated (Franzetti et al., 2012) and should produce a metabolomic signature, so a lack of evidence in the omics datasets suggests that this hypothesis is not well supported. We refer to this approach as ‘hypothesis evaluation’, rather than testing, because an omics approach will rarely provide a direct and definitive test of a physiological hypothesis. To be most effective, we advocate the systematic compilation of hypotheses to yield a priori expectations about candidate molecules and processes. This approach can then facilitate the evaluation of multiple (and possibly competing) hypotheses based on predicted omics-level responses under the strong-inference paradigm (cf. Platt, 1964). Analysis and interpretation can then be two-pronged: (1) hypothesis evaluation and (2) hypothesis generation based on the dataset. Explicit hypothesis evaluation can be applied to data analysis and interpretation after data collection, making it possible to reconcile otherwise overwhelming ‘fishing expedition’ data sets with a more recognizable hypothetico-deductive approach.
Can omics be used to directly test a hypothesis? The short answer to this question is ‘yes’, but designing and interpreting that hypothesis is more challenging. Omics methods should lend themselves well to testing hypotheses that predict patterns of responses. For example, Sinclair et al. (2013) hypothesized that insects have similar cellular responses to cold and dehydration stress (i.e. that they display cross tolerance). This hypothesis yields clear predictions of shared patterns of omics responses to the two stressors, which can be readily evaluated (Fig. 4). Similarly, Harada and Burton (2020) used a transcriptomic approach to test the hypothesis that the heat shock response in the intertidal copepod Tigriopus californicus is regulated by heat shock transcription factor-1 (HSF-1). After exposure to a heat shock, HSF-1 trimers bind to highly conserved sequences in the genome (heat shock elements; HSEs) to promote the expression of genes involved in the heat shock response (Vihervaara and Sistonen, 2014). Thus, this hypothesis predicts that knocking down HSF-1 prior to heat stress should result in a decreased expression of genes with HSEs in their promoter regions and should ultimately hinder the heat shock response. Knocking down HSF-1 using RNAi resulted in decreased survival after heat stress but did not alter the expression of any of the approximately 400 genes in C. californicus containing at least one HSE in their promoter region, not supporting the original hypothesis.
Omics techniques are now readily accessible for comparative physiology investigations in non-model species, giving researchers new tools to study complex physiological problems. The physiological process under investigation will dictate the most appropriate omics method, the tissue sampling and replication approach, and the experimental design. However, analysis of omics experiments can be challenging, so begin with simple experimental designs. A strength of omics is the ability to generate new hypotheses, but these studies need not be fishing expeditions. The hypotheses generated by omics can be tested (often at another level of biological organization) using more conventional physiological tools, but omics data themselves can also be used to evaluate competing hypotheses. We recommend compiling these hypotheses a priori and developing clear predictions to allow them to be evaluated. With some care, omics can even be used to directly test hypotheses about physiological processes. Ultimately, we hope that this Commentary saves those who are new to omics from falling victim to their many pitfalls and inspires established comparative physiologists to integrate these tools into their research.
We are grateful to members of the Sinclair lab and three anonymous referees for their critical feedback on an earlier draft of the paper. Thanks to Andrew Whitehead, Nick Teets, Dennis Kolosov, Richelle Tanner, Andy Turko and Eric Riddell for suggestions of studies that integrate omics into hypothesis testing.
This work was supported by a Genome Canada (LSARP 10106) grant for the Biosurveillance of Alien Forest Enemies (BioSAFE) project and the National Natural Science Foundation of China (41776135).
The authors declare no competing or financial interests.