How morphogen gradients govern the pattern of gene expression in developing tissues is not well understood. Here, we describe a statistical thermodynamic model of gene regulation that combines the activity of a morphogen with the transcriptional network it controls. Using Sonic hedgehog (Shh) patterning of the ventral neural tube as an example, we show that the framework can be used together with the principled parameter selection technique of approximate Bayesian computation to obtain a dynamical model that accurately predicts tissue patterning. The analysis indicates that, for each target gene regulated by Gli, which is the transcriptional effector of Shh signalling, there is a neutral point in the gradient, either side of which altering the Gli binding affinity has opposite effects on gene expression. This explains recent counterintuitive experimental observations. The approach is broadly applicable and provides a unifying framework to explain the temporospatial pattern of morphogen-regulated gene expression.
INTRODUCTION
In many developing tissues, pattern formation depends on the differential regulation of gene expression by morphogen gradients (Kicheva et al., 2012; Rogers and Schier, 2011). To understand tissue patterning, predictive mechanistic models of morphogen-dependent gene regulation are needed.
A simple model postulates that a morphogen activates target genes by directly activating a latent transcriptional activator (Driever et al., 1989). The increase in the activity of the morphogen-regulated transcription factor (MR-TF) results in increased binding to its cis-regulatory elements in target genes, thereby increasing the probability of target gene expression. This model predicts that genes with few or low-affinity binding sites for the MR-TF would only be induced close to the morphogen source, where MR-TF concentration is at its highest. Conversely, genes expressed at a greater distance would have more or higher affinity binding sites. This relationship has been termed the affinity threshold model (Rushlow and Shvartsman, 2012) (Fig. 1A-C).
Support for the affinity threshold model came from experiments dissecting the molecular mechanism of gene regulation in the early Drosophila embryo (Driever et al., 1989; Jiang et al., 1993,, 1992). Subsequent observations, however, were inconsistent with the model (Ochoa-Espinosa et al., 2005; reviewed by Cohen et al., 2013). Moreover, binding sites in the enhancers of target genes for transcription factors (TFs) other than the MR-TF have been shown to influence gene expression (Hong et al., 2008; Ip et al., 1992; Jaeger, 2004; Kanodia et al., 2012; Porcher and Dostatni, 2010). These additional TFs can be expressed uniformly throughout the tissue or controlled by graded signals. This led to the suggestion that the response to a morphogen is the product of the structure of the transcriptional network comprising the target genes. (Balaskas et al., 2012; Hong et al., 2008; Jaeger, 2004; Kanodia et al., 2012; Manu et al., 2009; Porcher and Dostatni, 2010).
It is also unclear how the affinity threshold model would apply to MR-TFs that are bifunctional, acting as transcriptional repressors in the absence of morphogen and as activators otherwise (Barolo, 2002). If activator and repressor bind similarly to the same DNA binding sites, changing binding affinity would affect the probability of bound repressor or activator equally. In Drosophila, Hedgehog regulates the activity of the bifunctional effector Ci (Basler and Struhl, 1994). In the wing disc, its target gene dpp is induced at low Hedgehog concentrations and contains three low-affinity Ci binding sites (Jiang and Hui, 2008), whereas ptc is induced by high Hedgehog levels and contains high-affinity Ci binding sites (Parker et al., 2011). Besides this counterintuitive allocation of binding site affinity, experimentally increasing the Ci binding site affinity in the dpp enhancer decreased its range of expression, which contradicts the affinity threshold model (Parker et al., 2011). To explain this, a differential cooperativity model was proposed, in which there is self-cooperative binding of the repressor but not activator isoform of Ci. The difference in cooperativity would mean that target genes with higher binding site affinity are more sensitive to repressor than activator.
In vertebrates, Sonic hedgehog (Shh), acting through Gli proteins (the Ci orthologues) (Jiang and Hui, 2008), patterns the neural tube by inducing the nested expression of a set of TFs within neural progenitors (Alaynick et al., 2011). The pan-neural transcriptional activator Sox2 provides neural specificity to these Shh target genes (Bailey et al., 2006; Graham et al., 2003; Oosterveen et al., 2013; Pevny and Placzek, 2005). Both the activator and repressor form of Gli bind to the same consensus Gli binding sites (GBS) in the genome (Hallikas et al., 2006; Muller and Basler, 2000; Peterson et al., 2012; Vokes et al., 2007,, 2008) and analysis of GBSs within enhancers of Shh target genes failed to find a positive correlation between binding site affinity and range of gene induction (Oosterveen et al., 2012; Peterson et al., 2012) (Fig. 1D,E).
Differential cooperativity mechanisms (Parker et al., 2011) are unlikely to explain the behaviour of Shh target genes in the neural tube. There is no evidence that Gli repressor (GliR) isoforms bind more cooperatively than Gli activator (GliA) isoforms (Nguyen, 2005). In addition, many Shh target enhancers in the neural tube have a single GBS, thus precluding cooperative binding (Peterson et al., 2012; Vokes et al., 2007). An alternative is that the dynamics of the transcriptional network, which is composed of Gli proteins, uniformly expressed TFs and TFs downstream of Shh signalling, explains the spatial pattern of gene expression in the neural tube (Balaskas et al., 2012).
Here, we develop a mathematical model to test this idea. This reveals a mechanism that relies on the enhancer integrating input from the MR-TF with the other transcriptional inputs. We first demonstrate the logic of this model in the context of a single gene regulated solely by an MR-TF. We then extend the model to include other TFs, using patterning in the ventral neural tube as an example (Balaskas et al., 2012). The model is parameterised using a Bayesian computational technique (Liepe et al., 2010) and reproduces the experimentally observed expression pattern upon changes in Gli binding affinity. We suggest that this mechanism provides a general strategy for the regulation of morphogen-controlled gene responses.
RESULTS
A thermodynamic ensemble model of gene regulation
To explore the function of a bifunctional MR-TF we describe transcriptional regulation using a statistical thermodynamic formulation (Bolouri and Davidson, 2003; Shea and Ackers, 1985; Sherman and Cohen, 2012). In this approach, the thermodynamic states of the cis-regulatory system represent all possible bound configurations of DNA to polymerase and a specified set of TFs. Transcriptionally active states are those in which polymerase is bound at the promoter; inactive states have no polymerase bound. The probability of gene expression, ϕ, is defined by the ratio of probabilities of all transcriptionally active states to all possible states.
This formulation can recapitulate the more familiar Hill function description (based on Michaelis–Menten kinetics) of transcriptional regulation if binding of a single species is considered (see Eq. 2, representing just the binding of polymerase). However, as we discuss below in the context of the combinatorial effect of multiple TFs, the basal levels of transcription in the presence of ubiquitous regulators cannot be ignored. These effects are typically approximated by multiplying or summing separate Hill functions; however, in contrast to the thermodynamic formalism, this does not accurately represent multiple binding at different sites (Sherman and Cohen, 2012).
A model of Gli binding at a single GBS
The numerator in this function represents all the transcriptional states in which polymerase is bound and transcription can be active: polymerase on its own, activator and polymerase bound, or repressor and polymerase bound. The denominator includes the active and the additional inactive states: free DNA (=1), and activator or repressor bound in the absence of polymerase. The constants KP, KA and KR represent the binding affinities of polymerase, activator and repressor, respectively. The cooperative terms cAP>1 and cAR<1 represent activation and repression, respectively.
If GliA and GliR have equal binding affinities for the GBS, then KA=KR≡K. To illustrate the behaviour, we define gradients of GliA and GliR that represent the transduction of an Shh gradient (Fig. 2B). The solid blue line in Fig. 2C represents the probability of gene expression across this gradient, as determined by Eq. 1, for K=1 and assuming A acts as a strong activator (cAP=10) and R a strong repressor (cRP=0.1). The horizontal grey line in Fig. 2C represents basal gene expression.
Predicting the relationship between Gli binding affinity and gene expression
For an increase in binding affinity, K, to result in an increase in gene expression probability, ϕ, requires dφ/dk>0. This will occur when θ>0. Conversely, if θ<0 then dφ/dk<0, and therefore ϕ will decrease when the binding affinity is increased.
As θ depends only on the concentration of Gli isoforms and the strength of activation or inhibition, θ can be viewed as a function of position within the morphogen gradient (independent of K, KP and [P]). For the case described, where A is an activator (cAP>1) and R is a repressor (cRP<1), θ, and therefore dφ/dk, will be positive only in regions of the gradient close to the source where A is high and R is low. Conversely, dφ/dk will be negative further from the source, where R is high and A is low (Fig. 2E).
Simulations confirm that gene expression probability changes when Gli binding affinity is altered (Fig. 2C; supplementary material Fig. S1). As predicted analytically, an increase in binding affinity increases the probability in the regions closer to the source (dφ/dk>0) and decreases probability in regions further from the source (dφ/dk<0). Moreover, at the position in the gradient where θ=0 and hence dφ/dk=0, no change will occur when K is increased. If we insert this condition back into Eq. 1 we obtain ϕ=(KP[P])/(1+KP[P]), which is the basal level (see Eq. 2) in the absence of any TF binding. Thus, there is a ‘neutral point’ in the gradient at which gene expression is fixed at basal levels (solid grey line in Fig. 2C). Close to the source, where gene expression is above basal levels, an increase in GBS binding affinity causes an increase in the probability of gene expression. Further from the neutral point, where gene expression is below basal levels, increasing binding affinity decreases the probability of gene expression. The position of the neutral point is determined only by the concentrations of GliA and GliR and the strength of their cooperative binding with polymerase (Eq. 6) and is independent of the basal level of gene expression.
The effect of binding affinity on enhancer occupancy is illustrated in Fig. 2D. Increasing the affinity of both GliA and GliR binding at all positions in the gradient (compare left and right columns) results in an overall increase in the probability that a GBS will be bound relative to the unbound state. Close to the source of morphogen, an increase in K causes a net increase in GliA bound and hence polymerase occupancy and gene expression probability increases (compare positions 1 and 2); further from the source, a net increase in GliR bound results from higher values of K and polymerase concomitantly decreases (compare positions 4 and 5). At position 3, the neutral point, there is a balance of activation and repression and gene expression is at the basal level. This is not equivalent to having the same amount of GliA and GliR bound at the enhancer: when K is increased, both isoforms of Gli still increase their binding proportionately but there is no net change in the level of gene expression.
Implicit in the thermodynamic model (Shea and Ackers, 1985) is a reciprocity in the binding cooperativity between a TF and polymerase. In supplementary material Section 2, we review an alternative model in which polymerase does not affect the binding or unbinding of Gli. In this model, there is still a neutral point in the gradient, either side of which gene expression probability will increase or decrease when binding affinity is altered. However, the neutral point changes position within the gradient when the basal level of expression changes (supplementary material Fig. S2). For both models it is possible to represent the gene expression probability in a form that is linearly proportional to the relative abundance of GliA to GliR (see supplementary material Section 3).
A second assumption of the model is that TF binding to DNA is fast relative to changes in TF concentration. Hence, the probability of gene expression given by the model is the probability reached at equilibrium with respect to the TF and polymerase concentrations. To determine the dynamic response of these systems prior to steady state, a more complex model comprising the on/off binding rates for each of the different factors would be required.
Thus far this analysis has described models consisting of a single binding site with equal affinity for the activator and repressor. A qualitatively similar result is obtained if the number of binding sites is altered instead of their affinity (see supplementary material Section 1 and Fig. S1B). More complex regulatory systems, in which there is differential binding affinity or cooperativity among multiple binding sites, can also be applied (see supplementary material Section 4 and Fig. S3) (Parker et al., 2011). The key result still holds for these models; close to the source increasing binding affinity will increase expression probability and far from the source it will decrease. Hereafter, we apply the thermodynamic model comprising a single GBS, but the fundamental findings are consistent among these alternative models.
Transcriptional inputs other than the morphogen can determine boundary positions
The conditions for an increase or a decrease in gene expression probability when the binding affinity of Gli (K≡KA=KR) is increased is still the same as previously defined in the absence of S; however, at the ‘neutral point’ (where dφ/dK=0 at the position θ=0) gene expression is at this new basal level.
Fig. 2G shows an example for two solutions of ϕ, one with KS=0.1 (blue), reflecting a gene with low levels of Sox2 binding, and one with KS=10 (purple), reflecting a gene with high levels of Sox2 binding. Increased binding affinity for S increases the basal level of gene expression. Moreover, in both cases, if the binding affinity for Gli is increased from K=1 to K=5 (solid to dashed line) then the predicted increase or decrease in the probability of gene expression is observed either side of the neutral point. Moreover, if we consider, hypothetically, that a threshold probability at which a gene is regarded as ‘on’ occurs at a fixed level (e.g. gene expression is observed at ϕ∼0.25) then it is easy to see how the level of basal expression will contribute to a gene's expression boundary in the gradient (note that this will not generate a sharp boundary of gene expression; we explore this in detail in the next section). If this threshold position is closer to the gradient source than the neutral point (which is defined only by the Gli gradient) then an increase in GBS affinity will cause an expansion in the range of gene expression. This result is relevant to the transgenic assays in which increasing the GBS affinity in enhancers of Shh targets expressed in ventral regions of the neural tube resulted in an expansion in the range of gene expression (Peterson et al., 2012). Conversely, for a gene that has a threshold level of expression that is further from the gradient source than the neutral point, then a contraction in expression is predicted, as observed in more dorsally expressed genes in the neural tube (Oosterveen et al., 2012). The contraction in expression of these latter genes can be explained by the repressive effect of the MR-TF. For these genes, when the morphogen concentration is below some threshold, the repressor form of the MR-TF represses target gene expression below its basal level. Hence, increasing binding affinity of the morphogen transcriptional effector increases the overall repression in regions beyond the neutral point and thereby contracts the domain of gene expression.
For a bifunctional MR-TF the position of a target gene boundary is prognostic of how it will respond to changes in the affinity for the MR-TF. Genes with boundaries close to the source (before the neutral point) are more likely to undergo an expansion in their expression when MR-TF affinity is increased, whereas gene boundaries further from the source (beyond the neutral point) are more likely to contract if affinity is increased.
Patterning within a gene regulatory network
In many morphogen systems, a network of interactions between TFs that are regulated by the morphogen contributes to the spatial pattern of gene expression (Balaskas et al., 2012; Davidson, 2006; Manu et al., 2009). These morphogen-regulated transcriptional networks provide a spatially and temporally varying input into gene regulation (Bolouri and Davidson, 2003) that is distinct from the morphogen effector and any uniform modulators. To explore how a bifunctional transcriptional effector could affect patterning behaviour in this context, we applied it to the transcriptional network that functions in the ventral neural tube (Balaskas et al., 2012; Briscoe et al., 2000; Novitch et al., 2001) (Fig. 3A,B). We used the thermodynamic formulism to describe a previously documented transcriptional network (Balaskas et al., 2012; Panovska-Griffiths et al., 2013). We incorporated a fourth gene (Irx3) that has been shown to function in this network (Fig. 3B) (Balaskas et al., 2012; Briscoe et al., 2000; Dessaud et al., 2007; Novitch et al., 2001). This allowed us to investigate two distinct gene expression boundaries (for Nkx2.2 and Olig2) that have different responses to perturbations in their Gli binding affinity (Oosterveen et al., 2012; Peterson et al., 2012).
We use the same functional form for ϕ as previously described to determine the probability of polymerase occupancy (and therefore gene expression); however, to simplify the model we incorporate uniformly expressed TFs such as Sox2 into the expression for the basal level of polymerase binding for each gene. We assume that GliA acts a strong activator (cAP=10), that GliR acts as a strong repressor (cRP=0) and that the four TFs (Pax6, Olig2, Nkx2.2 and Irx3) are strong repressors, with zero cooperativity terms. For each gene repressed by one or more of these four TFs we include two binding sites. These act equally and independently, without cooperative binding, to repress polymerase binding. The inclusion of two independent binding sites provides the necessary non-linearity to generate sharp boundaries in gene expression patterns (Balaskas et al., 2012; Manu et al., 2009) and is consistent with the multiple binding motifs in experimental data (Oosterveen et al., 2012). We assume that the probability of gene expression, ϕ, is directly proportional to the rate of transcription.
As many of the parameters are unknown, we used established Bayesian methodology (Toni et al., 2009, Liepe et al., 2010; Liepe et al., 2014) to identify the distribution of parameters sets (posterior distributions) that could qualitatively reproduce the wild-type (WT) expression pattern observed in the ventral neural tube (Fig. 3A) (Alaynick et al., 2011; Balaskas et al., 2012), whereby Nkx2.2 is expressed most ventrally, followed by Olig2 and then Irx3, with dorsal Pax6 expression gradually decreasing across the Olig2 domain. We further constrained the parameter search by requiring the model to reproduce all of the observed null mutant phenotypes (Balaskas et al., 2012; Briscoe et al., 2001; Novitch et al., 2001) (Fig. 3D). This included the effect of simultaneous removal of both GliA and GliR [experimentally simulating Shh−/−;Gli3−/− or Gli2−/−;Gli3−/− compound mutants (Litingtung and Chiang, 2000; Persson et al., 2002)]. In these embryos, Nkx2.2 expression is absent but Olig2 expression is observed in ventral regions of the neural tube, indicating that basal (Gli-independent) activation of Olig2 is sufficient for its expression.
We defined a distance function that compares simulated patterns with idealised representations of the WT and mutant patterns described above. The Bayesian method (see Materials and Methods and Fig. 3C for an overview of the process) enabled us to obtain a representation of the parameter space for which the model can approximate the observed phenotypes (i.e. achieving an appropriately low distance score for each of the targets; supplementary material Fig. S4A). We fixed the production and degradation rates of the TFs (α=β=2) in order to investigate the effect of the other parameters in the model. For each of the unknown parameters, which represent the binding affinities of the components and the concentration of polymerase, we searched over a prior parameter space spanning five orders of magnitude, sampling on a log scale between −3 and 2.
A typical simulation with a parameter set drawn from the full posterior distribution is shown in Fig. 3D. Consistent with the in vivo observations (Balaskas et al., 2012; Dessaud et al., 2007), the genes in the WT case are expressed in the temporal sequence: Pax6+Irx3→Olig2→Nkx2.2 in the future p3 (Nkx2.2-expressing) domain (Fig. 3E). The marginal posterior parameter distributions (Fig. 3F; supplementary material Fig. S4B shows the joint posterior distributions) revealed that different strengths of binding are required among the different TFs. For example, Nkx2.2 binds relatively strongly to Olig2 and Pax6, whereas Olig2 binds weakly to Nkx2.2 and Pax6 (the marginal posterior distributions have mean values of KN_O=57.6, KN_P=39.6, KO_N=3.3, KO_P=1.9). This is consistent with previous observations suggesting that Nkx2.2 acts as a stronger repressor than Olig2 or Pax6 in order to avoid oscillations in gene expression levels (Balaskas et al., 2012; Panovska-Griffiths et al., 2013).
In order to establish the effect of changing different parameters, we analysed the change in position of the gene expression boundaries (Fig. 4; see Materials and Methods). For Olig2, decreasing GBS affinity significantly increased the range of its dorsal (Olig2/Irx3) expression boundary with Irx3, while simultaneously shifting the ventral boundary slightly more dorsally (Fig. 4B,F,G). Increasing GBS affinity significantly decreased the range of expression at the dorsal boundary (Fig. 4C,H,I). These shifts in Olig2 boundaries are consistent with the experimental observation that inhibiting the binding of both activator and repressor Gli proteins resulted in a dorsal shift of the Olig2 domain (Oosterveen et al., 2012). Notably, Olig2 expression, in the model, spanned the neutral position (θ=0 at x∼0.35, as denoted by the solid grey circle in Fig. 4A). Thus, its dorsal boundary was located in the region of the gradient where one would predict an increase in gene expression if GBS affinity were decreased (resulting in a dorsal expansion in the Olig2 domain due to the increased repression of Irx3 and Pax6). Moreover, the Olig2 ventral boundary was located in a region where decreasing GBS affinity would result in a decrease in gene expression (allowing the Nkx2.2 domain to expand dorsally due to the derepression at this boundary caused by the reduction in Olig2).
In the case of Nkx2.2, decreasing the affinity restricted Nkx2.2 expression more ventrally (Fig. 4D,J,K), whereas increasing the affinity expanded its expression dorsally (Fig. 4E,L,M). This is also consistent with the experimental data (Oosterveen et al., 2012; Peterson et al., 2012). Together, this analysis demonstrates that changes in GBS affinity of the genes in the network result in the same behaviour as predicted for a solitary gene.
This system also displays hysteresis (Balaskas et al., 2012), such that the steady state is maintained if the signal is reduced once the network has reached steady state (supplementary material Fig. S5). Both the Nkx2.2/Olig2 boundary and the Olig2/Irx3 boundary are robust to significant reductions (up to 80%) in the input signal of GliA once steady state is achieved (supplementary material Fig. S5A-D). By contrast, both boundary positions are highly sensitive to levels of GliA at the start of a simulation (supplementary material Fig. S5E-H).
We also asked how gene expression is affected if the strength of the uniform input is altered (supplementary material Fig. S6A-J). We considered two cases. First, changing the basal signal for all four genes simultaneously (by changing the level of [P]; supplementary material Fig. S6A,B) had relatively small effects on the pattern. This suggests robustness in the system that preserves the pattern when the concentration of a uniform input parameter is changed. By contrast, altering the basal input for individual genes (by changing either KP_Pax, KP_Irx, KP_O or KP_N; supplementary material Fig. S6C-J) significantly altered the pattern of gene expression. For example, a twofold increase in the basal input to Olig2 dramatically expanded the size of its expression domain, whereas a twofold decrease resulted in a dramatic decrease in expression (supplementary material Fig. S6I,J). Thus, altering the number or strength of uniform inputs modifies the spatial pattern of gene expression.
Moreover, as demonstrated previously (Balaskas et al., 2012; Panovska-Griffiths et al., 2013), changing the binding affinities of the TFs forming the cross-repressive network altered the pattern and behaviour of gene expression (supplementary material Fig. S6K-BB). Together, these mechanisms could be exploited in the neural tube to modify the gene expression domains at different anterior-posterior positions, thereby providing a means to generate and fine-tune different patterns of gene expression without changing the morphogen gradient itself. A similar strategy could be exploited during evolution to alter the patterns of gene expression between species. This offers a level of flexibility to pattern formation that could have favoured the use of this type of morphogen system during the evolution of tissue patterning mechanisms.
DISCUSSION
Taken together, the model provides a straightforward way to capture the idea that the activity of target genes is determined by the combined input of the transcriptional effectors. This view of patterning is consistent with the notion that cis-regulatory elements integrate the activity of TFs to determine the probability and/or rate of transcription (Davidson, 2006). Three distinct classes of inputs can be defined: the MR-TF, the activity of which is determined by the distribution of the morphogen in the tissue; uniformly expressed TFs that are active throughout the tissue; and morphogen-controlled target genes that are dynamically regulated downstream of the morphogen. Each of these inputs can comprise multiple individual TFs with either inhibitor or activator function (Fig. 5). The thermodynamic regulation function provides a method for logically combining these different regulatory factors.
The analysis reveals how the cross-regulatory network is able to establish sharp gene expression boundaries (among target genes that, if expressed in isolation, would have a graded or uniform expression profile). By explicitly including the bifunctional forms of Gli we were able to account for the different boundary shifts that arise in different Shh targets when Gli binding affinity is perturbed. Notably, these differences could occur in genes containing a single GBS with the same affinity for the repressor and activator forms of Gli. In particular, the model predicts that the Olig2 expression domain must overlie the neutral point in the gradient in order to explain the different shifts observed in its ventral and dorsal boundaries. Moreover, by using the Bayesian methodology to explore the parameter space we were able to make specific predictions about the relative strength of binding affinities as well the effects of different parameter perturbations.
Previously, the thermodynamic modelling approach has been used to explain how uniformly expressed transcriptional activators can collaborate with latent MR-TFs that are activated by a morphogen (Kanodia et al., 2012). Here we show that the same framework can be extended to describe transcriptional networks regulated by a morphogen, thereby allowing the function and architecture of cis-regulated elements and the dynamics of a gene regulatory network to be explored simultaneously. Thus, this modelling formalism provides a unifying theoretical framework with which to analyse differential gene expression in developing tissues.
MATERIALS AND METHODS
Approximate Bayesian computation (ABC)
Simulations of the gene expression functions were carried out using Mathematica 9.0 (Wolfram Research). The network model was analysed using ABC-SysBio (Liepe et al., 2010; Liepe et al., 2014), an ABC (Toni et al., 2009) suite designed to run in parallel on graphics processing units (GPUs) using the python package cuda-sim (Zhou et al., 2011). The input gradient of GliA and GliR described in Fig. 2 was discretised into 100 positions. At each position the ODEs describing the TF dynamics were solved to steady state (100 hours) for each parameter set. Mutant patterns were generated by setting the levels of production of the relevant TF to zero or, for Gli–/–, by setting the binding affinity of Gli to zero. A distance function in the software compared the WT and mutant simulated patterns with a set of stereotyped targets (Fig. 3D). A point was accrued if the simulated protein concentration differed from the target by more than 0.2 A.U. at each of the 100 discrete positions in the gradient. The total mean distance score was obtained by summing the score for each gene at each position and dividing by the total number of positions scored. Hence, a simulated pattern that perfectly matched a target would score zero and one that completely mismatched the target would score 1. The posterior parameter distribution was estimated using 1000 particles (parameter sets). It uses a sequential Monte Carlo (SMC) algorithm to efficiently search the parameter space [for details, see Liepe et al. (2010)]. Initially, a pilot run was performed in which the target was the WT pattern. The priors were then updated with the posterior distributions obtained from this fit and then the WT and mutant patterns were used together as targets; this was used to speed up the search process (the total time to obtain the approximate posterior distributions was around 3 weeks running on a single GPU). After this time the mean distance scores illustrated in supplementary material Fig. S4A were obtained. The joint and marginal parameter probability distributions based on the weighted particles (supplementary material Fig. S4B) were plotted using ksdensity and meshgrid functions in Matlab (MathWorks).
To determine the effect of parameter perturbations, gene expression boundaries were compared with the WT pattern (Fig. 4; supplementary material Figs S5, S6). A resampled posterior population of 100,000 parameters sets was derived from the final 1000 weighted particles. For each pattern obtained from this set, the gene expression boundaries were defined using the following algorithm that searched for the maximum or minimum positions at which expression levels of the different TFs were higher or lower than each other; specifically, the Nkx2.2 (N)/Olig2 (O) boundary was defined as: mean[max(N>O in region N>I), min(O>N in region O>I)]. The Olig2 (O)/Irx3 (I) boundary was defined as: mean[max(O<I in region O>N), min(I>O in region I>N)]. If no positions in the pattern satisfied the requirements for both the N/O and O/I boundary, or if they were located in reverse order of the WT pattern, then that parameter set was deemed as non-patterning. For each ‘patterning’ solution the distance between the WT and perturbed boundary was determined. The boxplots in Fig. 4 and in supplementary material Figs S5 and S6 showing the distribution of these distances for each boundary were derived using the Matlab boxplot function. The boxes show the median and 25th (q1) and 75th (q3) percentiles, the whiskers extend to q3+1.5(q3–q1), covering approximately 99% of the normally distributed data (outliers are not shown).
Acknowledgements
We are grateful to Anna Kicheva, Vanessa Ribes and Madhav Mani for constructive comments on the manuscript.
Author contributions
M.C. conceived the project, performed the mathematical analysis and computational modelling and wrote the manuscript. K.M.P. performed the mathematical analysis. R.P.-C. performed the mathematical analysis. C.P.B. performed the mathematical analysis and provided technical advice on the computational modelling. J.B. conceived the project and wrote the manuscript.
Funding
This work is supported by funding from the Medical Research Council [U117560541] and Wellcome Trust [WT098325MA and WT098326MA]. C.P.B. gratefully acknowledges funding from the Wellcome Trust through a Research Career Development Fellowship [097319/Z/11/Z]. Deposited in PMC for immediate release.
References
Competing interests
The authors declare no competing financial interests.