Members of the T box family of transcription factors play important roles in early development. Different members of the family exert different effects and here we show that much of the specificity of the Xenopus T box proteins Xbra, VegT and Eomesodermin resides in the DNA-binding domain, or T box. Binding site selection experiments show that the three proteins bind the same core sequence, but they select paired sites that differ in their orientation and spacing. Lysine 149 of Xbra is conserved in all Brachyury homologues, while the corresponding amino acid in VegT and Eomesodermin is asparagine. Mutation of this amino acid to lysine changes the inductive abilities of VegT and Eomesodermin to resemble that of Xbra.
We dedicate this paper to the memory of our friend and colleague Rosa Beddington
INTRODUCTION
Members of the T box family of transcription factors are required for formation of the basic vertebrate body plan and for normal development of organs such as the heart and limbs (Kavka and Green, 1997; Papaioannou and Silver, 1998; Smith, 1999). T box genes are also implicated in human congenital malformations such as Holt-Oram syndrome (Basson et al., 1997; Li et al., 1997), ulnar-mammary syndrome (Bamshad et al., 1997; He et al., 1999) and DiGeorge syndrome (Jerome and Papaioannou, 2001; Lindsay et al., 2001; Merscher et al., 2001), and TBX2 proves to be amplified in a subset of human breast cancers (Jacobs et al., 2000). The founder member of the family, Brachyury, or T, encodes a sequence-specific DNA-binding protein that functions as a transcription activator (Conlon et al., 1996; Herrmann et al., 1990; Kispert and Herrmann, 1993; Kispert et al., 1995a). In mouse, Xenopus, zebrafish and chick embryos, Brachyury is expressed throughout the nascent mesoderm and transcripts are then restricted to the tailbud and notochord (Kispert et al., 1995b; Schulte-Merker et al., 1992; Smith et al., 1991; Wilkinson et al., 1990). Lack of Brachyury function, whether through genetic mutation in mouse (Chesley, 1935; Gluecksohn-Schoenheimer, 1938; Herrmann et al., 1990) and zebrafish (Halpern et al., 1993; Schulte-Merker et al., 1994), or by inhibiting the ability of the protein to activate transcription in Xenopus (Conlon et al., 1996), causes loss of posterior mesodermal structures and impairment of notochord differentiation. Furthermore, mis-expression of Brachyury in prospective ectodermal tissue of the Xenopus embryo causes those cells to activate mesoderm-specific genes and to form mesodermal cell types such as muscle (Cunliffe and Smith, 1992; Cunliffe and Smith, 1994; O’Reilly et al., 1995). Together, these experiments indicate that Brachyury is both necessary and sufficient for normal mesoderm formation.
The first clue that Brachyury is a member of a family of proteins came from the observation that the DNA-binding domain of the protein (now referred to as the T box) shows extensive sequence homology with the product of the Drosophila gene optomotor-blind (Pflugfelder et al., 1992). Since then, over 50 such T box genes have been identified throughout the animal kingdom, and they prove to be expressed in, and to play roles in the development of, multiple cell types (see reviews cited above). Of the many issues raised by this work, one of the most important concerns the question of T box specificity. This is illustrated by results obtained with Tbx4 and Tbx5, two of the most closely related members of the T box family. Tbx4 is expressed at high levels in the hindlimb of the developing vertebrate embryo and Tbx5 in the forelimb (Gibson-Brown et al., 1998; Isaac et al., 1998; Logan et al., 1998; Ohuchi et al., 1998). Mis-expression experiments suggest, remarkably, that limb identity is determined by which of the two T box genes is expressed in the developing limb bud (Logan and Tabin, 1999; Rodriguez-Esteban et al., 1999; Takeuchi et al., 1999). How do the different T box proteins exert these different effects?
In this paper, we address the question of T box specificity by studying three genes expressed during early Xenopus development: Xenopus Brachyury (Xbra) (Smith et al., 1991), Eomesodermin (Ryan et al., 1996) and VegT/Antipodean (Horb and Thomsen, 1997; Lustig et al., 1996; Stennard et al., 1996; Zhang and King, 1996). All three genes are expressed in the mesoderm of the early gastrula and the function of each is required for proper patterning of the Xenopus embryo, with VegT likely to act both maternally and zygotically (Conlon et al., 1996; Horb and Thomsen, 1997; Ryan et al., 1996; Stennard et al., 1999; Zhang et al., 1998). The genes are also necessary for the normal development of other vertebrate species, including mouse and fish (Chesley, 1935; Gluecksohn-Schoenheimer, 1938; Halpern et al., 1993; Herrmann et al., 1990; Russ et al., 2000; Schulte-Merker et al., 1994).
Like Xbra, VegT and Eomesodermin are transcription activators and are capable of activating mesoderm-specific genes in isolated animal pole tissue (this work) (Horb and Thomsen, 1997; Ryan et al., 1996; Tada et al., 1998). However, the types of mesoderm induced by each T box protein differ. In particular, Xbra induces posterior mesodermal cell types and activates posteriorly expressed genes while VegT and Eomesodermin can induce virtually the entire spectrum of mesodermal genes and of mesodermal cell types. In this study, we have used a series of chimeric proteins to investigate the basis of this inductive specificity. Our results show that much of the specificity resides within the T boxes of the proteins, but also that the C-terminal region of Xbra is capable of restricting the inductive abilities of the VegT and Eomesodermin T boxes.
The different inducing activities of Xbra, VegT and Eomesodermin suggest that the proteins might recognise different DNA target sequences. To address this question, we have carried out a series of binding site selection experiments. All three proteins prove to recognise the same core sequence of TCACACCT with some differences in flanking nucleotides. Significantly, however, further rounds of selection tend to select repeats of the core sequence, and the spacing and orientation of the repeats are different for each protein. For example, as reported by Kispert and Herrmann (Kispert and Herrmann, 1993), Brachyury selects the palindromic sequence TCACACCTAGGTGTGA while Eomesodermin frequently selects two direct repeats of the core motif separated by four nucleotides. It is possible that differences such as these underlie the different effects of the different T box proteins. Finally, we show that at least some aspects of specificity are associated with an asparagine residue in the T boxes of VegT and Eomesodermin; mutation of this residue to the lysine present in the equivalent position in Brachyury causes the two proteins to behave more like Xbra.
MATERIALS AND METHODS
Plasmid constructs and RNA synthesis
VegT was a gift from Mary Lou King (Zhang and King, 1996) and Eomesodermin was a gift from John Gurdon (Ryan et al., 1996). pSP64T-Xbra (Cunliffe and Smith, 1992) and pSP64T-Xbra-HA (Tada et al., 1997) have been described previously. The analogous pSP64T-VegT-HA and pSP64T-Eomesodermin-HA constructs were created by PCR; details are available on request. For T box VP16 fusions, amino acids 1-147 of yeast GAL4 were first fused in frame to the T boxes of VegT (amino acids 47-238), Eomesodermin (amino acids 210-469) or Xbra (amino acids 17-227). Each construct was then fused to the transcriptional activation domain of VP16 (amino acids 413-454) via a lambda linker (Brickman et al., 2000). Constructs were cloned into pSVGVP1 for transient transfections and pGEM-3Zf (Promega) for RNA injections.
For T box ‘swap’ constructs (see Fig. 4) XVX and XEX were generated by replacing the T box of Xbra with that of VegT or Eomesodermin, respectively. VXV was generated by replacing the T box of VegT with that of Xbra. Truncations of Xbra, VegT and Eomesodermin (Fig. 2) occurred at amino acids 232, 375 and 578, respectively. Cloning details are available on request. Constructs were cloned into pcDNA3.1 (InVitrogen) for transient transfections and pCR2.1 (InVitrogen) for RNA injections.
Point mutations in Eomesodermin and VegT were generated by PCR. For both proteins, an asparagine residue in the T box (N155 in VegT and N353 in Eomesodermin) was changed to lysine, the amino acid present in the corresponding position in Xbra. Cloning details are available on request. Constructs were cloned into pcDNA3.1 (InVitrogen) for transient transfections and pCR2.1 (InVitrogen) for RNA injections.
All constructs were sequenced and gave proteins of the correct size after in vitro translation (data not shown). RNA from each construct was generated as described (Smith, 1993).
Embryos, microinjection and dissection
Xenopus embryos were obtained by in vitro fertilisation (Smith and Slack, 1983). They were maintained in 10% Normal Amphibian Medium (NAM) (Slack, 1984) and staged according to Nieuwkoop and Faber (Nieuwkoop and Faber, 1975). Xenopus embryos were injected at the one-cell stage with 0.5 ng RNA in 10 nl water. For animal cap assays embryos were dissected in 75% NAM, and caps were cultured in the same medium until early gastrula stage 10.
RNA isolation and RNAase protection assays
RNAase protection assays were carried out as described (Jones et al., 1995). Each RNAase protection shown is representative of at least two independent experiments. Probes were as follows: Xbra (Smith et al., 1991), Xwnt11 (Ku and Melton, 1993), Bix4 (Tada et al., 1998), goosecoid (Cho et al., 1991), chordin (Sasai et al., 1994), Xwnt8 (Christian et al., 1991; Smith and Harland, 1991), Mix.1 (Rosa, 1989), Pintallavis (Ruiz i Altaba and Jessell, 1992) and Xsox17α (Hudson et al., 1997).
DNA gel-shift assays
Proteins used in electrophoretic mobility shift assays (EMSA) were prepared from DNA using the TNT in vitro translation kit (Promega). Binding reactions contained 1 μl of in vitro translated protein, 1× buffer and 20,000 cpm probe in a total volume of 12 μl. Control reactions (data not shown) contained a 100-fold excess of unlabelled specific or nonspecific oligonucleotide. The 1× buffer was either (i) 50 mM KCl, 1 mM EDTA, 20 mM Hepes pH 7.9, 10% glycerol, 100 μg/ml bovine serum albumin (BSA), 1 mM DTT, 0.3 mM PMSF plus Roche Complete minitabs protease inhibitors; or (ii) 60 mM KCl, 15 mM Tris pH 7.5, 7.5% glycerol, 250 μg/ml BSA, 0.05% NP40, 1 mM DTT, 4 mM spermine, 4 mM spermidine and protease inhibitors as above. Complexes were allowed to form at room temperature for 15-20 minutes after addition of probe. Oligonucleotides used in EMSA were annealed for 10 minutes at 88°C and cooled slowly to room temperature; they were then labelled by 3′ filling with 32P-dCTP (3,000 Ci/mmol) using the Klenow fragment (Promega).
Transient transfection analyses
Transient transfection assays were carried out as described (Conlon et al., 1996). Effector constructs are described above. The CAT reporter construct pBLCAT2 (Luckow and Schutz, 1987) was modified such that the sequence TTTCACACCT was inserted upstream of the promoter region (Fig. 2). MLVlacZ was co-transfected as a control for transfection efficiency (Hill et al., 1993).
PCR binding site selection assays
Binding site selection was carried out as described (Pollock and Treisman, 1990) using in vitro translated protein from pSP64T-Xbra-HA, pSP64T-VegT-HA or pSP64T-Eomesodermin-HA. DNA fragments obtained after five or seven rounds of selection were PCR amplified and cloned into the vector MP19. After five rounds, 62 sequences were examined for Xbra, 60 sequences for VegT and 61 sequences for Eomesodermin. After seven rounds, the numbers were 97, 64 and 63, respectively. Previous work has shown that the sequence TCACACCT interacts with T box proteins (Casey et al., 1998; Casey et al., 1999; Kispert and Herrmann, 1993; Tada et al., 1998), and this motif, or variations of it, was observed in all the selected DNA fragments. Further analysis was carried out manually. This revealed that after seven rounds of selection some of the sequenced clones were identical, such that the numbers of different clones studied for Xbra, VegT and Eomesodermin were 92, 42 and 38, respectively.
RESULTS
Different effects of Xbra, VegT and Eomesodermin
Past studies suggest that the T box genes Xbra, VegT and Eomesodermin (Fig. 1A), all of which are expressed in the marginal zone of the Xenopus early gastrula (Fig. 1B-D), have different mesoderm-inducing activities. For example, VegT and Eomesodermin can induce expression of dorsoanterior markers such as goosecoid, while Xbra cannot (Cunliffe and Smith, 1992; Cunliffe and Smith, 1994; O’Reilly et al., 1995; Ryan et al., 1996). To confirm this finding, we have dissected animal pole regions from embryos previously injected with RNA encoding Xbra, VegT or Eomesodermin, cultured these animal caps to the equivalent of the early gastrula stage, and assayed them for expression of a panel of mesodermal and endodermal markers. Our results confirm that Xbra activates its own expression (data not shown), and that of Xwnt11 and Bix4, but cannot induce goosecoid, chordin, Xwnt8 or Mix.1, and it induces Pintallavis and Xsox17α only weakly (Fig. 1E). By contrast, VegT and Eomesodermin induce the expression of all markers tested (Fig. 1E).
These differences between the T box proteins appear to be qualitative rather than quantitative. We have found no concentration of Xbra RNA, for example, that can induce expression of goosecoid (data not shown) (Cunliffe and Smith, 1992; Cunliffe and Smith, 1994; O’Reilly et al., 1995; Tada et al., 1997).
Xbra, VegT and Eomesodermin are transcriptional activators
The results described above show that the inductive effects of Xbra differ from those of VegT and Eomesodermin. As a first step towards understanding these differences, we sought to confirm, as would be inferred from previous work (Casey et al., 1999; Conlon et al., 1996; Horb and Thomsen, 1997; Ryan et al., 1996; Zhang and King, 1996), that all three T box proteins function as transcription activators. To this end, plasmids encoding Xbra, VegT or Eomesodermin were transfected into COS cells along with a reference plasmid and a reporter construct in which the T box binding site derived from the eFGF promoter, TTTCACACCT (Casey et al., 1998), is positioned upstream of a minimal promoter that drives chloramphenicol acetyl transferase (CAT). All three gene products activate CAT activity (Fig. 2). Levels of activation differ between the three T box proteins, but no significance can be attached to this observation at present because their levels of expression and affinities for the target site may differ.
The activation domain of Xbra is contained within the C-terminal half of the protein (Conlon et al., 1996; Kispert et al., 1995a), and removal of the C termini of Eomesodermin and VegT demonstrated that the same is true of these proteins, although VegT did retain some activity (Fig. 2). It is unlikely that the loss of transcriptional activation is due to instability of the truncated proteins, or to loss of a nuclear localisation signal, because a similar truncated version of Xbra is both stable and nuclear (Walter Lerchner and JCS, unpublished work).
T box protein specificity resides in part in the T box
The different inductive effects of Xbra, VegT and Eomesodermin (Fig. 1E) might derive from differences in the T boxes of these proteins or in domains outside of the T boxes. For example, the proteins might activate different genes because their T boxes bind different DNA motifs or they may do so because they recruit different accessory proteins via non T box sequences. To address this question we have created fusion proteins in which the T boxes of the three proteins are fused to the activation domain of VP16 (Fig. 3A). The fusion proteins also contain, at their N termini, the GAL4 nuclear localisation signal; nuclear localisation of Xbra, and perhaps other T box proteins, requires amino acids within the C terminal half of the protein, which has been removed in these experiments (Kispert et al., 1995a). As predicted, all three VP16 constructs behaved as powerful transcription activators when tested with a reporter construct containing the eFGF T box binding site (data not shown).
The inductive effects of the three VP16 constructs resembled those of their parent proteins. For example, Xbra cannot induce expression of goosecoid or chordin, and nor can Xbra-VP16. VegT and Eomesodermin, however, can induce these genes, and so can VegT-VP16 and Eomesodermin-VP16 (Fig. 3C,D). We note that Xbra-VP16 induces higher levels of expression of Pintallavis, Xwnt11 and Bix4 than does Xbra (Fig. 3B). This suggests that the VP16 activation domain has stronger activity than the endogenous Xbra activation domain, and it reinforces the view that the inability of wild-type Xbra to activate expression of goosecoid represents a qualitative difference between Xbra and the other T box proteins, and that the structural basis of this difference resides in the T box.
The Xbra C-terminal domain restricts the activation of target genes
An alternative explanation for the observation that Xbra-VP16 is a more potent activator of target genes than is Xbra, is that the C-terminal domain of Xbra somehow restricts target gene activation. To investigate this possibility, we placed the T boxes of VegT and Eomesodermin within the backbone of Xbra, thereby creating XVX and XEX, respectively (see Fig. 4A). Our reasoning was that non T box sequences of Xbra might restrict the activation of VegT and Eomesodermin target genes such as goosecoid, Pintallavis and chordin. Induction of these genes by the two chimeric proteins is indeed reduced, while activation of Xwnt11 and Bix4 is less affected (Fig. 4B). Thus, sequences outside the Xbra T box can restrict the activation of target genes. As might be predicted, insertion of the Xbra T box into VegT creates a protein whose inducing activity resembles that of Xbra-VP16, in that it cannot activate goosecoid, Pintallavis or chordin but can induce Xwnt11 and Bix4 (Fig. 4C).
Together, our results indicate that much of the biological specificity of the T box proteins Xbra, VegT and Eomesodermin resides within the T box, but that sequences outside the Xbra T box also restrict the activation of target genes.
Xbra, VegT and Eomesodermin bind the same core sequence but prefer double sites with different orientations and spacings
Much of the functional specificity of the T box proteins resides in their T boxes. It is possible that the different T boxes recognise different DNA sequences, and we have investigated this idea by carrying out PCR-based binding site selection experiments.
Binding site selection experiments were carried out essentially as described by Pollock and Treisman (Pollock and Treisman, 1990), using HA-tagged versions of Xbra, VegT and Eomesodermin. After five rounds of selection, we found that Xbra, VegT and Eomesodermin selected the same core sequence of TCACACCT with some differences in flanking nucleotides (Fig. 5). Of these differences, the most marked was the frequent selection by Xbra of a guanine nucleotide 5 bases 3′ of the core sequence, and a concomitant preference for a T 5′ of this guanine nucleotide and a T or a C 3′ of the G (Fig. 5A and see Discussion). VegT and Eomesodermin had no preferred nucleotide at this position (Fig. 5B,C). However, we have been unable to design sequences that are specific for particular T box proteins in electrophoretic mobility shift assays.
Many of the sequences identified after five rounds of selection contained two core motifs. To quantitate this observation, we required that both motifs should contain at least six of the eight nucleotides of the core sequence TCACACCT. According to this criterion, double sites occurred in 14.5% of selected Xbra sequences, 38.5% of selected VegT sequences and 53.5% of Eomesodermin sequences. Double sites were observed much more frequently, however, after seven rounds of selection, with the corresponding figures being 39.2, 87.5 and 96.8%, respectively. Analysis of these sequences revealed very strong preferences for particular orientations and spacings of the two core sequences. In agreement with Kispert and Herrmann (Kispert and Herrmann, 1993), double sites selected by Xbra are usually palindromic, with the two core sequences arranged in opposite orientations (Table 1) and with no intervening nucleotides (Table 2). Although double sites selected by VegT were also frequently palindromic, these sites are in the opposite orientations compared with those selected by Xbra (Table 1), and they are almost invariably separated by four nucleotides instead of being immediately juxtaposed (Table 2). Finally, sites selected by Eomesodermin are either in the same orientation as those observed with Xbra, or are arranged as direct repeats (Table 1). The spacing in the former case is usually four nucleotides, but three and five nucleotides are often observed. The spacing in the latter case is usually five nucleotides, but a four nucleotide spacing is also common.
The abilities of Xbra, VegT and Eomesodermin to interact with oligonucleotides containing one or two core motifs were investigated in electrophoretic mobility shift assays. Xbra, unlike VegT and Eomesodermin, interacted only very weakly with oligonucleotides containing just a single motif (data not shown). In this respect, it contrasts with proteins comprising just the Xbra T box, which interact strongly with a single motif (Casey et al., 1998). This apart, we were unable to demonstrate any specificity of the T box proteins for oligonucleotides containing just a single motif.
By contrast, electrophoretic mobility shift assays do suggest that the different T box proteins display preferences for different paired motifs. Typical results are presented in Fig. 6, and the data from over 20 such experiments are summarised in Table 3. The palindromic sequence selected by Xbra (→←) interacted only with Xbra and VegT, with optimum binding of Xbra occurring in the presence of EDTA (data not shown). The higher mobility of the VegT complex suggests that this T box protein may bind to the →← site almost exclusively as a monomer. By contrast, the ←NNNN→ sequence selected by VegT interacts with all three T box proteins, with the existence of lower-mobility forms of VegT and (to a much lesser extent) of Eomesodermin, suggesting that binding may occur as a dimer. It is surprising that Xbra recognises this site in electrophoretic mobility shift assays, because no ←NNNN→ sites were selected during the binding site selection procedure (Table 1). Finally, the two Eomesodermin sites, →NNNN→ and →NNNNN← do not bind Xbra but do interact with both VegT and Eomesodermin (Fig. 6). VegT interacts to form predominantly a high mobility complex, again suggesting that it binds the Eomesodermin sites as a monomer. These results are summarised in Table 3.
Mutation of a single amino acid can change T box protein function
Our data show that the functional specificities of Xbra, VegT and Eomesodermin reside, in large part, in their T boxes. In an effort to identify amino acids that might determine specificity, we examined the sequences of the T boxes of Brachyury, VegT and Eomesodermin from a variety of species (Fig. 7A). Superposition of the VegT and Eomesodermin sequences onto the crystal structure of the Xbra T box (Fig. 7B) revealed only two predicted protein-DNA contact points, positions 149 and 214, at which the sequence of Brachyury differs from that of VegT or Eomesodermin. The amino acid substitution at position 214 is a conserved change replacing the alanine in Xbra with a glycine in VegT and Eomesodermin. However, the amino acid substitution at position at 149 is a much more dramatic substitution, which replaces a basic lysine residue in Brachyury with the neutral polar residue asparagine in VegT and Eomesodermin. This residue comes at the end of a stretch of highly conserved amino acids that are predicted to form a pleated sheet structure. Lysine 149 is conserved in all Brachyury homologues and contacts the phosphate backbone of the DNA (Fig. 7B).
To investigate the significance of this amino acid in T box functional specificity, the asparagine of VegT and Eomesodermin was mutated to lysine. The effects of these mutant VegT and Eomesodermin constructs prove more to resemble those of Xbra (Fig. 7C), although we also note a general reduction in inducing activity. These results suggest that part of the specificity of T box proteins resides in K149 of Xbra, whose equivalent residue in VegT and Eomesodermin is an asparagine.
DISCUSSION
T box proteins are transcription factors that control the specification and morphogenesis of many cell types during vertebrate and invertebrate development (Kavka and Green, 1997; Papaioannou and Silver, 1998; Smith, 1999). In vertebrates, at least three members of the T box family – Brachyury, VegT and Eomesodermin – are involved in the induction and patterning of the mesoderm (Chesley, 1935; Gluecksohn-Schoenheimer, 1938; Griffin et al., 1998; Herrmann et al., 1990; Horb and Thomsen, 1997; Kimmel et al., 1989; Lustig et al., 1996; Russ et al., 2000; Ryan et al., 1996; Stennard et al., 1996; Zhang et al., 1998; Zhang and King, 1996). Although all three proteins contain T box domains and are expressed in the marginal zone of the embryo, previous studies and our present results show that they play different roles in mesoderm induction and patterning. Mis-expression of Xbra in animal pole explants induces expression of the mesodermal markers Xwnt11 and Bix4 but not markers of anterior or dorsal mesoderm such as goosecoid, Pintallavis or chordin. By contrast, mis-expression of either VegT or Eomesodermin is able to induce expression of all these markers. We have used this observation (Fig. 1) as the basis of an in vivo assay to identify determinants of T box specificity.
T box specificity resides in large part in the T box
Our experiments show that all three T box proteins function as activators of transcription (Fig. 2). We have taken advantage of this observation to construct chimeric proteins comprising the Xbra, VegT or Eomesodermin T box fused to the VP16 activation domain. Expression of these constructs in Xenopus embryos reveals that the specificity of the three proteins resides in the T box (Fig. 3).
One significant qualification to this conclusion is that sequences outside of the Xbra T box restrict the inducing activities of VegT and Eomesodermin (Fig. 4). The mechanism by which this occurs is unknown, but it may be significant that full-length Xbra binds DNA rather poorly, while the T box domain alone binds strongly (see below)(Casey et al., 1998).
DNA binding specificity
To investigate the molecular basis of T box specificity, binding site selection experiments were carried out. As described above, after five rounds of selection all three proteins selected predominantly single sites, defined by the core motif TCACACCT. This represents half of the palindromic sequence previously identified by Kispert and Herrmann (Kispert and Herrmann, 1995a). There were no dramatic differences between the sequences selected by the three proteins, save the frequent selection by Xbra of a G positioned 5 nucleotides 5′ of the core motif (Fig. 5). The significance of this observation is not clear, although it may represent the first step towards the selection of a palindromic sequence: the →← sequence selected by Xbra contains a G at the same position relative to the first core motif (Fig. 6A). Consistent with this suggestion, we observe that in single Xbra sites such G residues are frequently flanked (in 26% of cases) by two Ts, creating the triplet TGT, which is also present in the palindromic sequence selected by Xbra. In addition, we note that 23% of the G residues are flanked by T and C, giving the triplet TGC. If these observations do provide a clue as to the preference of Xbra for particular DNA sequences, they suggest that the G positioned five nucleotides downstream of the core motif are particularly important, followed by a 3′ T and then a 5′ T. This suggestion does not explain, however, the frequent occurrence (28% of cases) of the triplet CGA; the interaction of Xbra with DNA clearly requires further study.
A further two rounds of selection resulted in the isolation of a large number of paired T box binding motifs. The results of these experiments are summarised in Table 2 and Table 3, which show that different T box proteins prefer different types of paired motifs and suggest that they bind some sites as dimers and some as a monomer. For example, VegT appears to bind the two sites selected by Eomesodermin (→NNNN→ and →NNNNN←) as a monomer, while Eomesodermin appears to bind as a dimer.
These observations provide a basis for T box protein specificity, and it will be of great interest to elucidate the structures of Xbra, VegT and Eomesodermin on their respective sites. We note, however, that the enhancer of no natural T box target gene has yet proved to contain the motifs summarised in Table 3. For example, the enhancers of the Xenopus genes eFGF and Xnr1 contain just the motif TCACACCT (Casey et al., 1998; Hyde and Old, 2000), and although Bix4 contains three tandem motifs TGACACCT, TCACACCT and TCACACGT, the spacings between the motifs are 16 and nine nucleotides respectively (Tada et al., 1998).
T box target genes have also been identified in Ciona intestinalis, where the tropomyosin-like gene responds directly to Brachyury (Di Gregorio and Levine, 1999). Here, three Brachyury recognition sequences have been identified, one of which (Ci-Bra #3) is identical to the sequence identified in the enhancers of Xenopus eFGF and Xnr1. The other two, Ci-Bra Prox and Ci-Bra Dist, comprise two motifs, with the proximal element arranged as inverted repeats and the distal element arranged as tandem repeats. In neither case, however, do the motifs correspond exactly to the sequences isolated in our binding site selection experiments or those of Kispert and Herrmann (Kispert and Herrmann, 1993). Additional experiments are necessary to define the extent to which T box proteins can tolerate departures from the ‘perfect’ sites.
Finally, we note that the properties of the T box proteins Brachyury, TBX1 and TBX2 have recently been studied (Sinha et al., 2000). TBX1, like Brachyury, binds DNA as a dimer, while TBX2 appears to bind the same sequence as a monomer. This observation is reminiscent of the interactions of VegT and Eomesodermin with the →NNNN→ and →NNNNN← sites mentioned above. Also of interest is the fact that TBX2, unlike TBX1 and Brachyury, is a transcriptional repressor. Together with our results, these observations provide further insight into the functional specificities of the T box proteins.
A single amino acid can define the activity of T box proteins
Our data indicate that the different inducing activities of Xbra, VegT and Eomesodermin are mostly defined by their T boxes. Comparison of the presumed protein-DNA contact points of the three proteins, based on the crystal structure of the Xbra T box (Muller and Herrmann, 1997), suggested that lysine 149 of Xbra might be important in defining functional specificity. In support of this idea, mutation of the corresponding asparagine residue in VegT and Eomesodermin to lysine caused the modified proteins to behave more like Xbra, in that they could not induce high levels of Pintallavis or chordin and they could not activate goosecoid at all (Fig. 7C). Interestingly, exchange of a neutral polar residue for a basic amino acid also changes the DNA binding specificity of Drosophila homeodomain proteins (Hanes and Brent, 1989; Treisman et al., 1989). For example, replacing the neutral polar glutamine residue at position 9 in the recognition helix of Bicoid with the lysine found in the equivalent position of Antennapedia changes the specificity of Bicoid to that of Antennapedia (Hanes and Brent, 1989).
The mechanism by which a single amino acid substitution might change the specificity of the T box proteins is unclear. This difficulty is compounded because position 149 of Xbra contacts the phosphate backbone of DNA and is not predicted to make a base-specific contact. Indeed, our results show that Xbra, VegT and Eomesodermin select the same core sequence (Fig. 5). One possibility is that position 149 affects the affinity of protein-DNA interactions, but this is unlikely because even the highest levels of Xbra fail to activate anterior markers such as goosecoid (Cunliffe and Smith, 1992; Cunliffe and Smith, 1994; O’Reilly et al., 1995; Tada et al., 1997). Another suggestion is that position 149 of Xbra might alter target specificity through protein-protein interactions, as occurs in Sox proteins (Kamachi et al., 2000) and homeobox proteins (Chariot et al., 1999; Mann, 1999). Consistent with this proposal, it was recently demonstrated that the transcriptional activity of the T box protein Tbr-1 is altered by its association with the guanylate kinase CASK/LIN-2 (Hsueh et al., 2000). Moreover, classical genetic studies carried out on the mouse Brachyury allele TC are consistent with the presence of a Brachyury interacting protein (MacMurray and Shin, 1988). However, no interacting protein has been yet identified for Xbra, VegT or Eomesodermin. We plan to search for such proteins and to carry out structural analyses of T box proteins.
Acknowledgements
This work was supported by the Medical Research Council and the British Heart Foundation. E. S. C. was a Hitching-Elion Fellow. We thank Steve Smerdon for discussions on T box structure, Tim Mohun and Surendra Kotecha for help with binding-site selections, Caroline Hill for advice and help with band-shifts, and Josh Brickman for donation of constructs and help with experimental design. We are also grateful to Bob Duronio for critical comments on the manuscript.