We define a quantitative relationship between the affinity with which the intestine-specific GATA factor ELT-2 binds to cis-acting regulatory motifs and the resulting transcription of asp-1, a target gene representative of genes involved in Caenorhabditis elegans intestine differentiation. By establishing an experimental system that allows unknown parameters (e.g. the influence of chromatin) to effectively cancel out, we show that levels of asp-1 transcripts increase monotonically with increasing binding affinity of ELT-2 to variant promoter TGATAA sites. The shape of the response curve reveals that the product of the unbound ELT-2 concentration in vivo [i.e. (ELT-2free) or ELT-2 ‘activity’] and the largest ELT-XXTGATAAXX association constant (Kmax) lies between five and ten. We suggest that this (unitless) product [Kmax×(ELT-2free) or the equivalent product for any other transcription factor] provides an important quantitative descriptor of transcription-factor/regulatory-motif interaction in development, evolution and genetic disease. A more complicated model than simple binding affinity is necessary to explain the fact that ELT-2 appears to discriminate in vivo against equal-affinity binding sites that contain AGATAA instead of TGATAA.
During animal development, the transcription of a single gene by RNA Polymerase II is regulated by scores, perhaps hundreds, of different proteins (Carey et al., 2009; Workman and Abmayr, 2014; Peter and Davidson, 2015, 2016; Furlong and Levine, 2018). In this study, we focus on arguably the earliest and most instructive step in this overall process: the binding of a specific activating transcription factor to a cis-acting regulatory motif in the control region of a developmentally regulated tissue-specific target gene. We wish to understand how the affinity of interaction between this transcription factor and its binding site influences the level or rate of target gene transcription. This general problem has been approached multiple times in the past, most often in yeast or cultured cells; however, it has been surprisingly difficult to settle on an unambiguous, let alone universal, answer. Many previous studies have concluded that increased binding affinity of a transcriptional activator does indeed have a positive influence on target gene transcription (Bain et al., 2012); however, other studies have found target gene transcription to be insensitive to transcription factor affinity or even anti-correlated (Meijsing et al., 2009). Experimental limitations have included unknown levels of free (unbound) transcription factors in vivo following induction or transfection, i.e. incompletely defined binding isotherms (Bain et al., 2012). In many studies, transcription factor affinity has been only one parameter among many that determines target gene transcription levels. Other parameters include: (1) chromatin accessibility (Grossman et al., 2017), which, in cases in which this has been looked at more closely, can reveal a detailed interplay between transcription factor affinity and nucleosome positioning (Lam et al., 2008; Rajkumar et al., 2013); (2) nearby binding of auxiliary transcription factors (Sasse et al., 2015; Grossman et al., 2017); (3) the form of the embedding regulatory network (e.g. feed-forward loops), especially in time-dependent systems (Sasse et al., 2015); and (4) more elaborate mechanisms, such as proposed allosteric changes in transcription factor conformation dictated by a particular DNA-binding sequence (Meijsing et al., 2009; Weikum et al., 2017).
In this study, we define a quantitative relationship between transcription factor binding affinity and target gene transcript levels for a gene associated with the differentiation of a specific cell lineage within a developing multicellular animal. The Caenorhabditis elegans intestine is a clonally derived and relatively homogeneous set of cells (Sulston et al., 1983), the differentiation of which is largely controlled by a single transcriptional activator, the zinc-finger GATA factor ELT-2 (Hawkins and McGhee, 1995; McGhee et al., 2007, 2009; Dineen et al., 2018). The gene selected as an ELT-2 target is asp-1, which encodes the C. elegans intestinal-specific aspartic acid protease ASP-1 (Tcherepanova et al., 2000), the transcription of which is almost entirely dependent on ELT-2 (McGhee et al., 2009; Dineen et al., 2018). We establish an experimental system that allows unknown parameters (e.g. the influence of ‘chromatin’) to effectively cancel out, thereby allowing us to isolate the transcriptional consequences of normal physiological levels of ELT-2 binding to variable-affinity XXTGATAAXX sites in the asp-1 promoter. We show that: (1) levels of asp-1 transcripts increase monotonically with increasing binding affinity of ELT-2 to variant promoter XXTGATAAXX sites; (2) the shape of the response curve determines an important relationship between the unbound ELT-2 concentration in vivo [i.e. (ELT-2)free or ELT-2 ‘activity’] and the tightest association constant (Kmax) to a TGATAA site; and (3) ELT-2 is able to functionally discriminate in vivo against binding sites that contain AGATAA rather than TGATAA, even though the binding affinity to these two different sequences can be closely comparable, i.e. for non-TGATAA target sites, a more complicated model than simple ELT-2 binding affinity must be invoked.
Genes expressed in the differentiated C. elegans intestine are controlled by extended TGATAA sites
We found 44 examples in which experimental mutation of cis-acting sequence motifs significantly diminished the expression of particular genes in the differentiated C. elegans intestine. (details and references are collected in Table S1). Fig. 1 shows the summarizing sequence logo; the predominant site was clearly a TGATAA sequence but with significant information content in the flanking two base pairs, both upstream and downstream. It is well established that TGATAA-like sites are enriched [and (A/C/G)GATAA-like sites are correspondingly depleted] in the regulatory regions of all genes transcribed in the C. elegans intestine, from embryos to adults [Pauli et al., 2006; McGhee et al., 2007, 2009; Dineen et al., 2018; Table S2 reproduces the position frequency matrix from McGhee et al. (2009)]. We have argued that these sites are primarily the direct targets of the intestine-specific GATA-type transcription factor ELT-2: ELT-2 protein binds to similar sites both in vitro (Hawkins and McGhee, 1995; Goszczynski et al., 2016; Wiesenfahrt et al., 2016) and in vivo (Mann et al., 2016; Wiesenfahrt et al., 2016). A subset of these sites is also a direct target of a second intestinal GATA-factor, ELT-7 (Dineen et al., 2018); however, ELT-7 can be removed without overt consequences (McGhee et al., 2007; Sommermann et al., 2010; Dineen et al., 2018) and our in vivo experiments were conducted in its absence.
Additionally, in the 12 base pairs upstream and 12 base pairs downstream of the XXTGATAAXX motif, the information content of these collected sites was essentially at background levels (Fig. 1), consistent with the absence of a co-factor binding in a constant and close relationship to ELT-2. Although there are certainly genes expressed in the C. elegans intestine that are co-regulated by ELT-2 and some other transcription factor (Neves et al., 2007; Sinclair and Hamza, 2010; Romney et al., 2011; Roh et al., 2015; Goszczynski et al., 2016), the relative disposition of the factors varies between different co-regulated promoters. We interpret the sequence logo data (Fig. 1) to be consistent with a model in which the isolated binding of ELT-2 by itself provides the dominant contribution to target gene activation, an important simplification for our analysis.
In vitro binding affinity of ELT-2 to the TGATAA motif is strongly influenced by flanking dinucleotides
An in vitro competitive band shift assay [electrophoretic mobility shift assay (EMSA)] was used to measure the affinity of (full-length) ELT-2 protein binding to a series of XXTGATAAXX sequences, relative to its binding affinity to the preferred sequence ACTGATAAGA (Fig. 1); this preferred sequence will turn out to have the highest affinity but such agreement is not necessary (see below). Experimental details of the binding competition are provided in the Materials and Methods section. The supplementary Materials and Methods describes how the competition data were analysed in order to produce estimates of Krel=KC/KA, i.e. the ratio of the ELT-2 binding affinity to competitor oligodeoxynucleotide C (association constant KC M−1) to the ELT-2 binding affinity to the labelled (and most tightly binding) oligodeoxynucleotide A (association constant KA M−1). Representative gel images are shown in Fig. S1; representative competition isotherms are shown in Fig. 2A; and numerical estimates of Krel for the series of XXTGATAAXX motifs used in this study are presented in Fig. 2B. The primary conclusion from this section is that alterations in two base pairs upstream and downstream of the core TGATAA motif can modulate ELT-2-binding affinity by ∼tenfold.
ELT-2 binds to TGATAA and AGATAA motifs with comparable affinity
GATA factors in vertebrates bind to a cis-regulatory motif of the general form (A/T)GATA(A/G) (Patient and McGhee, 2002). Indeed, in vitro measurements show that the residue preceding the core GATA-binding sequence of vertebrate GATA factors is an A or T with approximately equal frequency (Khan et al., 2017). In contrast, the functional motifs that regulate intestinal genes in C. elegans show much lower degeneracy (Fig. 1, Table S1) and TGATA appears to be favoured over AGATA by ∼30-fold (see also Table S2). We wished to test whether this increased specificity of the functional GATA motifs in C. elegans is imposed by the intrinsic sequence preferences of ELT-2 binding or by some other feature of the transcriptional process. Fig. 2C shows the results of a competitive EMSA experiment in which the ELT-2-binding affinity to an …ACAGATAAGA…containing double-stranded oligodeoxynucleotide is compared with that of the otherwise identical …ACTGATAAGA…containing oligodeoxynucleotide. Contrary to the implications of the data featured in Fig. 1 and Tables S1,S2, ELT-2 binds in vitro to the oligodeoxynucleotide containing the AGATAA motif with ∼45% of the affinity with which it binds to the otherwise identical TGATAA control motif. This conclusion is validated and extended by an experimental approach in which multiple degenerate double-stranded oligodeoxynucleotides are incubated with ELT-2, and the bound and unbound fractions electrophoretically separated, followed by sequencing [a low resolution implementation of the Spec-Seq procedure (Zuo and Stormo, 2014; Stormo et al., 2015)]. As explained in more detail in the supplementary Materials and Methods, we estimate that ELT-2 binds to an XAGATA sequence with 78±16% or 94±33% of the affinity that it binds to an XTGATA sequence, depending upon whether the identity of X is considered or ignored, respectively. Thus, the intrinsic in vitro sequence preference of ELT-2 appears to be similar to that of vertebrate GATA factors. However, we demonstrate below that the in vivo transcriptional potency of an AGATA motif is much lower than that of a TGATA motif in spite of comparable binding affinity to ELT-2.
Quantifying the influence of XXTGATAAXX motif affinity on in vivo transcription rates of a C. elegans intestine-specific gene
In order to measure the transcriptional consequences in vivo of ELT-2 binding to a particular XXTGATAXX site (or sites) in the promoter of an intestinal gene, we developed an experimental system that we refer to, for shorthand, as SQRIPT (simultaneous quantitation of reporter transcripts). C. elegans is routinely transformed by injecting plasmids into the syncytial gonad of the adult hermaphrodite; the injected plasmids assemble into an extrachromosomal multicopy array that might contain a hundred copies (or more) of the transforming plasmids (Mello et al., 1991; Stringham et al., 1992; Meister et al., 2010), which are passed on to ∼50% of next-generation animals. Most experimental analyses of transcriptional regulation in C. elegans have been performed using these arrays; the general consensus is that genes expressed from these transgenic arrays are correctly regulated, at least to a good first approximation, and reports of misregulation are rare (Hope, 1991; Boulin et al., 2006). The properties of these multicopy transgenic arrays provide the key rationale for the SQRIPT assay: that control and test constructs can be made to differ at only a small number of base pairs (typically fewer than ten). These constructs can then be incorporated in equal stoichiometry into the arrays, such that each experimentally manipulated test construct can be compared with an unperturbed control construct in the same (ideally identical) environment. Additional features of SQRIPT will be noted once more specific properties of the assay are described.
Our current version of SQRIPT is based on the C. elegans asp-1 gene, which encodes a major intestine-specific aspartic acid protease [a homologue of cathepsin D (Tcherepanova et al., 2000)]. asp-1 transcripts are first detected in late embryogenesis, reach peak levels in mid-larval stages and then decline modestly (∼twofold) in adulthood (data from modENCODE assembled in www.wormbase.org). The asp-1 gene has no introns, is highly expressed and transcript levels are reduced 40- to 50-fold in an elt-2 null mutant (measured at the arrested L1 stage) (McGhee et al., 2009; Dineen et al., 2018). There are eight TGATAA sites distributed over 6.5 kb of upstream flanking region but for our experiments, we confined our analysis to the ∼1.4 kb immediately upstream of the ATG start codon, which has been shown previously to drive intestine-specific reporter expression (Tcherepanova et al., 2000). As shown in Fig. 3A, this region contains two TGATAA sites lying just upstream of the asp-1 transcription initiation site; ChIP-Seq experiments detect ELT-2 binding to this region in vivo, with the only significant ELT-2 peak aligning with the two TGATAA sites (Wiesenfahrt et al., 2016).
We produced two variants of the asp-1 coding region by introducing a KpnI site at different positions so that transcripts produced by the two reporters (R1 and R2) in vivo can be distinguished (Fig. 3A). Each reporter differed by one base pair from the wild-type sequence and by two base pairs from each other; the encoded proteins remained unchanged. The basic assay is shown schematically in Fig. 3B. In a typical experiment, a variant of the asp-1 1.4 kb promoter fragment (e.g. with a mutated TGATAA site) is used to drive the expression of asp-1 reporter R2; the wild-type version of the promoter is used to drive the expression of asp-1 reporter R1. Equal amounts of these two constructs, test and control, are mixed with an unc-119-rescuing plasmid (Maduro, 2015) and injected into host strain JM189 [unc-119(ed3) III; elt-7(tm840) asp-1(tm666) V; elt-4(ca16) X]. (Although the ELT-7 and ELT-4 endodermal GATA factors make little or no contribution to asp-1 transcription, respectively, incorporating the null mutations into the host strain removes the possibility that either could act through experimentally introduced variant TGATAA sites.) For each construct being tested, several independent transgenic strains are produced, propagated and harvested. Both RNA and DNA are isolated. RNA is reverse transcribed and amplified by PCR using the asp-1 primers shown in Fig. 3A; the resulting cDNA is digested with KpnI and digestion products are separated by electrophoresis; the full sequence of reactions is performed in triplicate. In order to correct for any inequality in reporter stoichiometry, R1 and R2 copies in the genomic DNA are amplified using the same primers and the relative amounts of KpnI digestion products quantified. As will be shown in the following sections, the SQRIPT assay has a dynamic range of 10- to 20-fold and a precision of ∼10% in measuring the relative transcriptional activity of any particular promoter-modified construct.
TGATAA sites act synergistically to activate asp-1 transcription in vivo
Fig. 4A shows the relative transcript levels measured when both reporters (R1 and R2) are activated by the same wild-type promoter; the relative transcript levels were measured as 1.06±0.15 (mean±s.d.), i.e. there was no significant bias in vivo between the two reporters (unpaired, two-tailed Student's t-test P>0.2). Fig. 4A also shows that the destruction of either of the two TGATAA sequences reduced reporter transcript levels to 10 to 20% of the level measured with the wild-type reporter. In other words, these two motifs are acting neither redundantly (in which case, reporter transcript levels would have remained unchanged in the single mutants) nor additively (single mutant reporter transcript levels would have been approximately half of wild-type levels), but rather the two sites appear to be acting synergistically or cooperatively. This synergy is not complete because reporter transcripts were reduced by a further 50-60% if both TGATAA sites were destroyed simultaneously (unpaired, two-tailed Student's t-test P<0.001). The synergistic/cooperative behaviour of the two asp-1 TGATAA sites was qualitatively validated using GFP as a reporter (Fig. S2). We draw the following conclusions from Fig. 4A: (1) the two asp-1 promoter TGATAA sites act largely but not completely synergistically; (2) the five GATA sites in the 1.4 kb asp-1 promoter that are not TGATAA make only minor contributions to promoter activity; and (3) independent transgenic strains produced with the same injection mixture give similar results.
Fig. 4B,C describes two further important features of the SQRIPT assay. To test whether reverse transcription of nascent RNA produced the same estimate of relative transcript levels as did reverse transcription of total RNA, nuclear run-ons were performed according to Kruesi et al. (2013) with nascent mRNA being affinity isolated based on incorporation of bromouridine. As shown in Fig. 4B, relative reporter transcript levels were not significantly different (t-test, P=0.17) whether they were measured using total or nascent RNA, suggesting that the SQRIPT assay measures differences in the rates of transcript initiation. Although an effect on transcript elongation or degradation cannot be ruled out, such an explanation would seem unlikely considering the high degree of similarity between the two transcript sequences and the equivalent results produced when reporters are interchanged. Fig. 4C shows that the relative reporter transcript levels produced by a modified asp-1 promoter changed only modestly from embryo to adult. Supporting this observation, Fig. S3 shows similar data obtained with two different asp-1 variant promoters. A practical consequence of these results is that conclusions will not be strongly influenced by imperfect age-matching of different samples from different strains.
ELT-2 affinity to the XXTGATAAXX promoter motifs controls asp-1 transcription in vivo
XXTGATAAXX sequences with known Krel (Fig. 2) were inserted into the SQRIPT reporters, such that each variant reporter had two copies of the same variant replacing the two TGATAA copies in the wild-type asp-1 promoter. Three independent transgenic strains were produced for each construct and the transcript levels of the variant reporters were measured (at the L4/young adult stage) relative to transcript levels of wild-type control reporters incorporated into the same transgenic array. Fig. 5 plots the relative transcript levels measured for a particular test promoter versus the relative ELT-2 affinity constant (Krel) measured in vitro for the TGATAA variant present (as pairs) in each promoter. The important conclusions are that: (1) transcriptional activity of a variant asp-1 promoter is highest when both XXTGATAAXX sites correspond to the strongest ELT-2 binding sequence, ACTGATAAGA; and (2) transcript levels decrease monotonically as ELT-2 affinity decreases. The shape of the ‘relative transcript levels versus Krel’ response curve has important implications for ELT-2/target gene behaviour in vivo and we therefore explored this more quantitatively.
The two parameters to be derived from the curve shapes of Fig. 5 are: (1) ymax=the maximum relative transcript level that would be obtained at ‘infinite’ ELT-2 concentrations; and (2) the unitless product Kmax×[ELT-2free], where Kmax is the absolute affinity (association constant) of ELT-2 to the most preferred sequence ACTGATAAGA and [ELT-2free] is the normal effective free concentration of ELT-2 protein in vivo (i.e. ELT-2 activity). (Kmax refers to ELT-2 affinity in vivo and may or may not be equivalent to KA used in analyzing the in vitro binding competitions described above. All we are proposing is that relative affinities of different motifs are the same in vivo and in vitro.) Minimizing the sums of the squares of the deviations of the measured data points from the trial-parameterized relationship defined above (see supplementary Materials and Methods) shows that ymax must be in the range of 1.1 to 1.5 but the fit is generally insensitive to choice. The more important conclusion is that the product Kmax×[ELT-2free] must be in the range of 10±5, i.e. clearly greater than 1. Examples of fits to the data are shown in Fig. 5 for the case of ymax=1.3 and the product Kmax×[ELT-2free] chosen as 5, 10 or 15 (solid lines). The implications of these particular parameter values will be discussed below. The dashed lines in Fig. 5 show a similar analysis for a model in which either single or double occupancy of the XXTGATAAXX sites can activate reporter transcription to the extent measured in Fig. 4A (see supplementary Materials and Methods for more details); as expected, this extension produces only a modest change in the calculated binding curve.
AGATAA sequences do not obey the relative transcript level versus Krel relation defined for TGATAA sequences
We are now in a position to resolve a potential paradox that emerges from the above analyses. As just discussed, Fig. 5 shows that the transcriptional activity of reporters controlled by XXTGATAAXX sequences is dominated by the affinity of ELT-2 for these variant motifs. Fig. 2C (and supplementary Materials and Methods) shows that ELT-2 binds to AGATA sequences with close to the same affinity that it binds to the matched TGATA sequence. Yet, the sequence logo shown in Fig. 1 indicates that functional cis-acting regulatory motifs in C. elegans intestinal promoters are strongly favoured to be TGATAA rather than AGATAA. We investigated whether an AGATAA sequence replacing a TGATAA in our transcriptional reporters would show the same transcriptional behaviour if these two sequences had the same affinity for ELT-2. We thus compared the behaviour of two different core motifs, AATGATAAGA and ACAGATAAGA, that were chosen as they have the same relative affinity for ELT-2 measured in vitro (Fig. 2). Two copies of the selected ACAGATAAGA motif were inserted into the appropriate SQRIPT reporters replacing the wild-type TGATAA motifs, transgenic animals were produced and relative transcript levels were measured. The AGATAA-dependent relative transcript levels are plotted on Fig. 5 as the red asterisks. We conclude that asp-1 promoters in which a TGATAA sequence is replaced by an AGATAA sequence with the same ELT-2 affinity do not obey the relative transcript level versus relative ELT-2-binding affinity relationship dynamic defined for TGATAA sites. In fact, the AGATAA-containing promoter approaches inactivity, whereas the promoter containing the TGATAA equal-affinity counterpart approaches maximum transcriptional activity. We conclude that there must be at least one additional layer of specificity beyond simple ELT-2 affinity that controls the transcription of intestinal genes.
Properties of the core TGATAA motifs that drive intestinal gene expression in C. elegans
Among the collection of cis-acting GATA motifs shown experimentally to influence in vivo expression of C.elegans intestinal genes (Fig. 1, Table S1), the most frequent core motif is TGATAA but with significant additional information present in the two base pairs upstream and the two base pairs downstream. Variations in these flanking dinucleotides can lead to a ∼tenfold variation in the binding affinity to ELT-2 (Fig. 2). Variations in the nucleotides or dinucleotides immediately flanking the core binding motifs of other transcription factors have also been shown to modulate interaction affinities (Levo et al., 2015; Schöne et al., 2016; Rudnizky et al., 2018).
As judged by in vivo functional assays (Fig. 1), as well as by computational identification of over-represented promoter motifs (Pauli et al., 2006; McGhee et al., 2007, 2009; Dineen et al., 2018), the most frequent decameric sequence controlling intestinal genes in C. elegans is ACTGATAAGA. This same sequence turns out to have the highest binding affinity to ELT-2 (Fig. 2) but such correspondence is not necessary; there are both bacterial and eukaryotic examples in which evolution appears to have selected lower affinity ‘sub-optimal’ transcription factor binding sites in gene promoters (Sadler et al., 1983; Crocker et al., 2015; Farley et al., 2015). We thus wished to investigate whether the degree to which a particular extended TGATAA motif (not just ACTGATAAGA) is over-represented in intestinal promoters of C. elegans reflects its binding affinity to ELT-2. Table S2 contains a position frequency matrix [PFM; reproduced from McGhee et al. (2009)] collecting over-represented sequence motifs computationally identified in the promoters of intestine-specific genes expressed in embryos, larvae and adults. Each PFM entry was converted to a log-odds ‘statistical weight’ (see Eqn 7-3 by Stormo, 2013), summed over the ten entries corresponding to the 10 bp binding sequences that had their relative ELT-2-binding affinities measured in Fig. 2A,B. This overall statistical weight was then plotted versus the logarithm of the corresponding Krel (i.e. both variables should then be proportional to a free energy). As seen in Fig. S4, the relationship is satisfyingly linear. We interpret this linearity to suggest that cis-acting TGATAA motifs regulating C. elegans intestinal transcription are selected, at least in part, on the basis of their binding affinity to ELT-2: the higher the binding affinity to ELT-2, the more likely it is that the motif is present in intestinal promoters.
We note a potentially interesting feature of the endodermal promoter TGATAA sites in C. elegans: at least a subset of these sites are functional targets of ELT-7 in addition to ELT-2, and possibly, at least in the early embryo, of END-1/END-3 as well (Dineen et al., 2018). Although the current experiments were performed in the absence of ELT-7 and after END-1/3 have decayed, one could imagine that the information-rich gene-controlling sequences shown in Fig. 1 (and Table S1) represent some evolutionary or physiological compromise between different sequence preferences for the four individual endodermal GATA factors. However, we also note that C. elegans endodermal GATA factors appear to possess a remarkable degree of interchangeability; in particular, if placed under the appropriate promoters, both ELT-2 and ELT-7 can individually replace all three of the other endodermal GATA factors (Wiesenfahrt et al., 2016; Dineen et al., 2018). Plausible scenarios have been proposed to explain how this interchangeability could have arisen during evolution (Wiesenfahrt et al., 2016; Maduro, 2020).
Features of SQRIPT
The experimental system that forms the basis of the current study has several advantages over previous methods of defining the relationship between transcription factor binding affinity and target gene activity. These advantages are as follows: (1) outputs of the two reporters are measured directly as transcripts rather than reporter proteins, turnover rates for the two reporter transcripts are likely to be more similar than for two different protein reporters, and overall, the assay measures relative transcription initiation rates, not elongation rates nor transcript stabilities (Fig. 4B); (2) chromatin arrangements over the two highly similar reporter gene sequences can reasonably be expected to be similar, which might not be the case for genes expressing two different protein reporters; (3) the transcription of reporters is regulated at the normal in vivo physiological levels of free ELT-2 protein, an important feature that will be considered below; and (4) the similar treatments and environments of test and control constructs allow reliable normalization. Expanding on this last feature, we suggest that the many unknown parameters associated with the in vivo regulation of transcription, e.g. nucleosome arrangements, histone modifications, biases between in vitro and in vivo affinity measurements, etc., are likely to be the same (or highly similar) for the test and control constructs. The major rationale of the SQRIPT assay is that the effects of these unknown parameters can be assumed to ‘cancel out’, thereby allowing the role of binding affinity in gene transcription to be measured in isolation.
TGATAA motif synergy and activity throughout development
Using the quantitative SQRIPT assay, we showed that the two TGATAA motifs in the asp-1 promoter were neither redundant nor additive but acted in an almost completely synergistic or cooperative manner, i.e. ablation of either one of the two TGATAA motifs lowered reporter expression to a level similar to that observed when both motifs were ablated (Fig. 4A). One molecular mechanism that could explain such synergy is that two (simultaneous) ELT-2/TGATAA-binding events are required to displace a resident inhibitory nucleosome [e.g. Morgunova and Taipale (2017); Zhu et al. (2018)]. Consistent with such a model, the two TGATAA sites in the C. elegans asp-1 promoter are spaced 60 bp apart, well within the span of a single nucleosome core particle. Furthermore, apparently homologous pairs of TGATAA sites, spaced between 54 and 106 bp apart, can be found in asp-1 promoters from related caenorhabditid nematodes (Fig. S5). [We note that, in each of these homologous promoters, one of the TGATAA motifs (ACTGATAAGA) is the sequence that binds most tightly.] Table S1 lists several further examples of C. elegans intestinal promoters with TGATAA sites that have been reported to act synergistically, at least to some degree; the distance between these paired sites ranges from 9 to 65 bp, all well below the size of a nucleosome core. In contrast, we have described the behaviour of the major elt-2 enhancer in which four conserved TGATAA sites are spaced 186, 210 and 235 bp apart and act as if they are largely redundant (Wiesenfahrt et al., 2016). Further experiments will be required to test this cooperative nucleosome displacement model in which TGATAA sites that act synergistically are spaced less than 145 bp apart but TGATAA sites that act redundantly are spaced more than 145 bps apart.
We also used the SQRIPT assay to show that the relationship between promoter TGATAA affinity and reporter transcript levels remains approximately the same between newly hatched larvae and adults (Fig. 4C, Fig. S3). Such invariance suggests that the basic molecular mechanisms relating asp-1 transcription to ELT-2 binding are qualitatively the same in different developmental stages, arguing against a model in which gene transcription later in life adopts a ‘locked-in’ configuration in which individual transcription factors such as ELT-2 have been supplanted by, for example, a stably propagating chromatin structure.
The gene response function relating ELT-2 affinity to asp-1 transcript levels
The most important result in this study is shown in Fig. 5, namely the quantitative relationship between the relative transcript levels produced by a test promoter and the relative ELT-2 association constants (Krel) for the pair of TGATAA sites in this particular promoter. The shape of this curve has important implications for understanding the molecular mechanisms by which ELT-2 interacts with cis-acting promoter motifs to drive intestinal gene transcription. Qualitatively, the free ELT-2 levels in vivo (i.e. [ELT-2free]) cannot be so high that low-affinity sites are saturated (i.e. the curve of Fig. 5 is steep at low Krel). Likewise, the free ELT-2 levels in vivo cannot be so low that high-affinity sites are far from saturation (i.e. the curve of Fig. 5 plateaus at higher Krel). Using a simple thermodynamic model incorporating either complete or partial synergy between the paired TGATAA sites, we estimate that the product of Kmax×[ELT-2free] is ∼10, in which Kmax is the in vivo ELT-2 association constant to the highest affinity XXTGATAAXX DNA sequence. Both parameters, Kmax (1/M) and [ELT-2free] (M), are difficult to measure individually but we suggest that the dimensionless product of Kmax×[ELT-2free] is the more useful parameter to know: it provides a quantitative measure of system responsiveness and the degree to which variants in cis-acting binding sites can be expected to influence associated transcription.
The gene response function shown in Fig. 5 summarizes the manner in which the asp-1 promoter responds in vivo to ELT-2 interaction with the pair of TGATAA sites: binding affinity is paramount. However, ELT-2-binding affinity to a cis-acting motif is not sufficient to determine promoter response because this same relationship does not apply to a core binding site containing AGATAA. Rather, there must be at least one additional criterion applied to ELT-2-binding sites in order to explain promoter behaviour. One candidate for this additional criterion could be an allosteric change induced in ELT-2 conformation by binding to certain sequences (e.g. TGATAA) but not to other sequences (e.g. AGATAA), as suggested for glucocorticoid receptor binding (Meijsing et al., 2009; Schöne et al., 2016). Any free-energy change necessary to drive this postulated conformational change in ELT-2 would be expected to be incorporated into the overall free energy of binding to this particular sequence, and the TGATAA and AGATAA sites being compared were chosen to have equivalent overall affinities. We thus suggest that any additional criterion is more likely to be applied downstream of the initial ELT-2 binding: e.g. a complex of ELT-2 with a TGATAA site might be able to accommodate binding of a particular co-factor but a complex with an equal-affinity AGATAA site cannot. Overall, these considerations reveal the complexities of a sequence logo like that shown in Fig. 1. Within the TGATAA series of core motifs, ‘information’ reflects evolutionary selection for tightness of binding (Fig. S4). However, the sequence logo also incorporates additional information, such as the disfavouring, or essentially vetoing, of AGATAA sites. We note that binding motifs for several additional C. elegans GATA factors, both endodermal (ELT-7 but not END-1/3) and hypodermal (ELT-3, ELT-6 and EGL-18) also appear to be enriched in TGATAA sequences (Shao et al., 2013; Narasimhan et al., 2015); it will be interesting to determine whether these other GATA factors can, like ELT-2, discriminate in vivo against AGATAA sequences independently of binding affinity.
It will be important to define a similar quantitative response curve as shown in Fig. 5 for other transcription factors, both in the C. elegans intestine and elsewhere. Are all transcription factors present at in vivo free concentrations that are ‘above the dissociation constant’ for interacting with their preferred site? Or do different transcription factors operate at different effective in vivo free concentrations that result, in turn, in a different range of in vivo occupancy levels? The Fig. 5 response curve also has implications for attempts to interpret mutations in binding motifs in terms of in vivo consequences, either in the context of evolution or genetic disease. For example, genome-wide association studies regularly identify alterations in candidate transcription factor binding sites associated with human disease (Deplancke et al., 2016; Vockley et al., 2017). Even if, as is customarily assumed, the consequences of such alterations are due to changes in transcription factor binding affinity, and even if, as is often the case, changes in transcription factor binding affinity can be predicted from available position weight matrices, the practical implications for the individual or for the evolving organism are not clear. Whether there are effective changes in the degree of transcription factor occupancy of this mutated site in vivo, with concomitant changes in target gene transcription, will depend upon the free effective concentration (activity) of the particular transcription factor within the affected cells.
MATERIALS AND METHODS
Each XXTGATAAXX variant was embedded within the same 26 bp sequence that contained the distal TGATAA site of the asp-1 promoter, followed by four unpaired C nucleotides, followed by the reverse complement of the initial 26 bp sequence, thereby enabling the formation of a double-stranded hairpin. Oligodeoxynucleotides to be used as probes were 5′-labelled with fluorescein amidite (FAM); probe sequences are presented in the supplementary Materials and Methods. Competitive binding reactions were prepared by mixing full-length ELT-2 protein (purified from baculovirus-infected insect cells) into a solution of 250 µM FAM-labelled hairpin oligonucleotide, variable amounts of unlabelled ‘competitor’ hairpin oligodeoxynucleotide (0 to 4 mM), 10 ng/µl poly(dI-dC).poly(dI-dC) and 1× binding buffer [25 mM Tris-HCl (pH 7.5), 50 mM KOAc, 20 µM MgOAC, 1 mM DTT, 1% NP40 and 100 ng/µl bovine serum albumin]. The quantity of ELT-2 protein per reaction was adjusted in order to shift ∼15% of the probe in the absence of a competitor. Binding reactions were incubated in the dark at room temperature for 20 min, then 10× loading buffer (0.25% Orange G and 20% Ficoll) was added (2 µl per 20 µl reaction) and reactions were loaded onto a 6% polyacrylamide gel prepared with 0.5× Tris Borate EDTA (TBE). Electrophoresis was performed in the dark at 100 V in 0.5× ice-cold TBE for 1 h or until the dye reached the end of the gel. Gels were imaged using a SYBR Green filter and the images were exported as unscaled 8-bit TIF files. Band intensities (shifted=bound; unshifted=free) were quantitated using ImageJ. The relative affinities of ELT-2 binding to …AGATA… and …TGATA… sequences were measured using the Spec-Seq method, closely following the protocol provided by Stormo et al. (2015). The production and sequencing of the degenerate libraries, as well as procedures used to extract relative affinities, are described more fully in the supplementary Materials and Methods.
Production and growth of transgenic C. elegans strains
Site-directed mutagenesis was performed using overlap extension PCR (Ho et al., 1989); two successive rounds were used to mutate the two TGATAA sites in the variant asp-1 promoters. For reporters R1 and R2, respectively, base pair 795 and base pair 840 of the asp-1 coding region (with A of the ATG=1) were changed from A to T; both changes introduced a unique KpnI site without changing protein sequence. Full sequences are provided in the supplementary Materials and Methods.
Transgenic worm strains used in SQRIPT experiments were created by standard gonadal injection of strain JM189 [unc-119(ed4) III; asp-1(tm666) elt-7(tm840) V; elt-4(ca16) X] to produce extrachromosomal multicopy arrays (Mello et al., 1991). Reporter plasmids, as well as the unc-119 rescuing plasmid pDP#MM016B (Maduro, 2015), were each injected at a concentration of 50 µg/ml.
Simultaneous quantification of reporter transcripts (SQRIPT)
Transgenic populations were expanded at room temperature on nematode growth media plates (either 150 mm or 35 mm diameter) covered with a lawn of E. coli OP50. The high transmission rate of unc-119-containing transgenic arrays, combined with the increased health and fecundity of rescued animals, resulted in ∼75% of the harvested animals containing the transgenic array. Worms were washed from unstarved plates and excess bacteria removed either by filtering through Nytex screens or by repeated centrifugations. In a typical experiment, the mass of collected worms, suspended in nuclease-free water, was 100-200 mg. Nuclear run-on transcription (Fig. 4B) was performed as described by Kruesi et al. (2013) but omitting α-32P-CTP from the reaction. Compared with the standard growth procedure just described, worms collected for nuclear run-ons were expanded to ∼threefold greater population sizes and received a final wash with, and were resuspended in, ice-cold nuclear isolation buffer [250 mM sucrose, 10 mM Tris-HCl (pH 7.9), 10 mM MgCl2, 1 mM EGTA, 0.25% NP-40, 1 mM dithiothreitol, 4 U/ml RNAse inhibitor cocktail and protease inhibitors (Roche, used at 1× concentration, as specified by the manufacturer)]. One half of each sample, either worms or nuclei, was used for RNA extraction and the other half was used for DNA extraction. RNA extraction was performed using TRIzol Reagent (Thermo Fisher Scientific, 1556018) following the manufacturer's protocol with minor modifications. DNA extraction was performed by prolonged protease digestion, organic extractions and ethanol precipitation (McGhee et al., 1981).
Reverse transcriptase (RT) PCR was performed using the QuantiTect Reverse Transcription Kit (Qiagen, 205313) following the manufacturer's protocol but using a primer (oBL22, sequence in the supplementary Materials and Methods) specific to the asp-1-coding sequence. Duplicate or triplicate RT reactions were performed for each sample and each reaction contained up to 1 µg of RNA. A 6 µl aliquot of each finished RT reaction was added directly to a PCR reaction (final volume, 60 µl), together with primers oBL21 and oBL22 (sequences in the supplementary Materials and Methods) to amplify the segment from R1 and R2 reporters that contains the inserted KpnI site. The same fragment was amplified in parallel from DNA samples using the same primers. PCR products were purified by spin column and digested for 1 h at 37°C with 4 U of KpnI (up to 500 ng DNA per 10 µl digestion reaction). Digestion products were separated by electrophoresis on an Agilent D1000 ScreenTape in an Agilent 2200 TapeStation (performed by the University of Calgary Core DNA Services). Control experiments showed that a ‘promoterless’ reporter produces only a low level of background transcripts (5-8% of wild-type levels), indicating that introduced plasmids are not massively rearranged upon assembly into the transforming array and that there is minimal ‘readthrough’ transcription from adjacent plasmids incorporated into the array. This ‘no-promoter’ background rate was used to correct all subsequent measurements. Further control experiments (Fig. S6) showed that: (1) there is little PCR amplification bias between the two reporter sequences; and (2) heteroduplexes can form during PCR amplification of the two highly similar reporter sequences but their effect can be easily corrected. Figs S7-S12 illustrate the calculations used to define Krel and the product Kmax × [ELT-2free], as well as the Spec-Seq method used to define ELT-2 binding preferences.
The authors gratefully acknowledge the expert contribution of Barbara Goszczynski, who performed the Spec-Seq analysis described in the supplementary Materials and Methods. We also thank Dr Erin Osborne Nishimura (Colorado State University) for many helpful discussions.
Conceptualization: B.R.L., J.D.M.; Methodology: B.R.L., J.D.M.; Software: J.D.M.; Investigation: B.R.L., J.D.M.; Writing - original draft: B.R.L., J.D.M.; Writing - review & editing: B.R.L., J.D.M.; Supervision: J.D.M.; Project administration: J.D.M.; Funding acquisition: J.D.M.
This work was supported by operating grants from the Canadian Institutes of Health Research and from the Natural Sciences and Engineering Research Council of Canada (67135 and RGPIN/04133-2017, respectively, to J.D.M.). B.R.L. received salary support from the Alberta Children's Hospital Foundation.
Peer review history
The peer review history is available online at https://dev.biologists.org/lookup/doi/10.1242/dev.190330.reviewer-comments.pdf
The authors declare no competing or financial interests.