The vertebrate retina is generated by retinal progenitor cells (RPCs), which produce >100 cell types. Although some RPCs produce many cell types, other RPCs produce restricted types of daughter cells, such as a cone photoreceptor and a horizontal cell (HC). We used genome-wide assays of chromatin structure to compare the profiles of a restricted cone/HC RPC and those of other RPCs in chicks. These data nominated regions of regulatory activity, which were tested in tissue, leading to the identification of many cis-regulatory modules (CRMs) active in cone/HC RPCs and developing cones. Two transcription factors, Otx2 and Oc1, were found to bind to many of these CRMs, including those near genes important for cone development and function, and their binding sites were required for activity. We also found that Otx2 has a predicted autoregulatory CRM. These results suggest that Otx2, Oc1 and possibly other Onecut proteins have a broad role in coordinating cone development and function. The many newly discovered CRMs for cones are potentially useful reagents for gene therapy of cone diseases.
During development, retinal progenitor cells (RPCs) divide to give rise to an extremely complex and highly organized tissue. The retina is composed of six major types of neurons that are born in a stereotypical and overlapping order, conserved across vertebrates (Cepko, 2014). How RPCs produce such a diversity is a question of interest, both for our understanding of the development of a complex tissue, and for therapeutic applications. Retinal diseases involving cones are common, with many genetic lesions leading to the dysfunction, and sometimes the death, of these photoreceptors (Campochiaro and Mir, 2018; https://sph.uth.edu/retnet/). As cones are the type of photoreceptor that we use to initiate color and daylight vision, their loss can be devastating. A better understanding of the gene regulatory networks (GRNs) that lead to cone genesis and cone gene regulation may benefit cell or gene therapeutic approaches.
Many RPCs are multipotent, capable of giving rise to combinations of many types of retinal cells, as shown by lineage studies in several species (Livesey and Cepko, 2001). These observations have raised the question of whether all RPCs are equivalent (Cepko, 2014). In mice, RPCs that express the basic helix-loop-helix (bHLH) transcription factor (TF) Olig2 were shown to be terminally dividing, producing only cones and horizontal cells (HCs) embryonically, and only rods and amacrine cells postnatally (Hafler et al., 2012). Zebrafish cones and HCs also were shown to share a common lineage, through live imaging of RPCs expressing a reporter for the cone gene, thyroidhormone receptor beta (thrb) (Suzuki et al., 2013). Similarly, we have previously characterized a cis-regulatory module (CRM) for Thrb, designated CRM1, expressed in chick RPCs biased to the production of HCs and photoreceptors (Emerson et al., 2013; Schick et al., 2019).
We discovered that ThrbCRM1 is directly co-regulated by the TFs Otx2 and Oc1 (also known as Onecut1) (Emerson et al., 2013). Genetic studies in mice showed that Onecut genes (Oc), of which there are three in mice, are required beyond photoreceptors, as Oc1 and Oc2 are involved in retinal ganglion cell and HC development (Sapkota et al., 2014). Null mutations of Otx2 showed that it is required for both rod and cone development in mice (Nishida et al., 2003). Interestingly, Otx2 is also required for bipolar cell (BP) and HC fates (Koike et al., 2007; Nishida et al., 2003; Sato et al., 2007), even though it is not expressed in HCs, and HCs do not share specific features with photoreceptors.
To further investigate the molecular mechanisms that restrict an RPC to the production of cones and HC, we used ThrbCRM1 as an entry point. Genome-wide methods were used to examine the chromatin status and transcriptomes of ThrbCRM1+ RPCs versus those of ThrbCRM1− RPCs. Thousands of DNA sequences were differentially open in ThrbCRM1+ RPCs, with motifs for Otx2 and Oc TFs showing the most differential enrichment within these regions. The activity of many of the predicted CRMs with these TF binding sites (TFBSs) was tested in retinal tissue, which showed them to be active in early cones and/or the cone/HC RPCs. The CRM activity of most of these sequences was found to be dependent on the binding sites of both Otx2 and Oc TFs, and thousands of other predicted binding sites for Otx2 and Oc TFs were validated via chromatin binding assays using antibodies to Otx2 and Oc1. In addition, we searched for the CRMs that regulate Otx2 and Oc1 and found that Otx2 likely regulates itself. These findings show that Otx2 and Oc TFs coordinate the development of cones, likely via the direct regulation of multiple genes important for the development and function of this cell type.
Integrating RNA expression and open chromatin to discover CRMs for cone development
In order to perform a genome-wide search for potential CRMs for developing cones, we used ATAC-seq to profile the open chromatin regions in early embryonic chick retinal tissue. To enrich for RPCs that were producing cones, and for newly postmitotic cells fated to be cones, freshly explanted embryonic day (E) 5 retinas were co-electroporated with the ThrbCRM1-GFP plasmid and the broadly expressed plasmid CAG-mCherry. After 2 days in culture, the electroporated cells were sorted by fluorescence-activated cell sorting (FACS) into ThrbCRM1+/CAG-mCherry+ (referred to as ThrbCRM1+) and ThrbCRM1−/CAG-mCherry+ (referred to as ThrbCRM1−) cell populations. We processed them for ATAC-seq to identify regions of chromatin that were differentially open in ThrbCRM1+ cells relative to those of ThrbCRM1− cells (Fig. 1A). Profiles were highly reproducible among the three and two replicates for the ThrbCRM1+ and ThrbCRM1− cells, respectively, and we combined the replicates for the comparison of their chromatin accessibility.
Overall, open chromatin profiles were highly concordant between ThrbCRM1+ and ThrbCRM1− cells (Fig. 1A; Table S1). We subtracted the aligned ATAC-seq peak reads of ThrbCRM1− from those of ThrbCRM1+ to generate a difference profile (Fig. 1A). Peaks were then called using this profile to nominate regions either enriched in ThrbCRM1+ or ThrbCRM1− cells (Table S1). We intersected these peaks with the original sets of peaks for each condition, to identify them as ThrbPos high, ThrbNeg high or shared peaks (Fig. 1A,B; Table S1). Although ThrbPos high peaks and ThrbNeg high peaks were more open in ThrbCRM1+ and ThrbCRM1− cells, respectively, these regions tended to be accessible in both populations (Fig. 1B).
We compared these ATAC-seq profiles with the RNA-seq data available from the same cell populations (Buenaventura et al., 2018) (Fig. 1A; Table S2). The Binding and Expression Target Analysis (BETA) (Wang et al., 2013) was used to examine the expression of genes within a window of 100 kb of an ATAC-seq peak (Fig. 1C; Table S2). Overall, ThrbPos high peaks were more likely to be open around genes that were upregulated in ThrbCRM1+ cells relative to ThrbCRM1− cells. Conversely, ThrbNeg high peaks were associated with genes that were upregulated in ThrbCRM1− cells. The shared peaks were open around genes that were expressed in both populations at similar levels (Fig. 1C; Table S2).
We then searched for enriched TFBS motifs in differential peaks that might contribute to the regulation of the differentially expressed genes. We identified motifs predicted in both ThrbPos high and ThrbNeg high enriched peaks (Table S3), and filtered out the cognate TFs that were not expressed in these cells (Fig. 1D). Only a few predicted motifs were differentially enriched in each population. Interestingly, motifs for Otx2 and Oc, the TFs regulating Thrb via the ThrbCRM1 enhancer, were among the most differential predicted TFBSs, enriched strongly within ThrbPos high peaks (Fig. 1D). The binding sites for the proneural TF Neurog2 and for the TF Gtf2ird1 were more enriched in ThrbPos high peaks. Gtf2ird1 has been shown to modulate photoreceptor gene expression, to pattern cone opsin expression in cooperation with Thrb and to be crucial for photoreceptor function (Masuda et al., 2014). A motif for the TF Neurod1, important in cone development (Liu et al., 2008) and highly upregulated in ThrbCRM1+ cells, was found in both ThrbPos and ThrbNeg high peaks (Fig. 1D). TFBSs that were enriched in ThrbCRM1− cells included motifs for Sox4, Zic, Sp, Pax2, Nfe2l1, or E2f cell cycle genes. Motifs for the progenitor genes, Vsx2 and Rax, which are expressed at a higher level in ThrbCRM1− cells, were predicted in both cell populations (Fig. 1D).
Identification of Otx2 and Oc1 regulatory elements using ATAC-seq
Multiple differential ATAC-seq peaks were found around genes more highly expressed in the ThrbCRM1+ cells (Fig. 1C). We first compared the open chromatin profiles of the ThrbCRM1+ and ThrbCRM1− cells at the Otx2 and Oc1 loci. We selected peaks of open chromatin near Otx2 that were specifically enriched in ThrbCRM1+ cells (Fig. 1A). To test whether these regions indeed have regulatory activities, we cloned the corresponding regions into the Stagia3 reporter plasmid driving GFP and placental alkaline phosphatase (PLAP) (Billings et al., 2010). The plasmids were electroporated along with a co-electroporation control (CAG-mCherry) into E5 chick retinas, which were then cultured for 2 days. Alkaline phosphatase (AP) staining was performed to assess the activity of the potential CRMs. Two CRMs near the Otx2 gene induced strong AP signal (Fig. 2A) and were designated Otx2 CRM E and F (Figs 1A and 2A).
To assess CRM activity at cellular resolution, GFP expression was examined in tissue sections of electroporated retinas. Otx2 CRM E and F were active in cells on the apical side of the retina, with the morphology and location of developing photoreceptor cells. The majority of these cells also were positive for visinin (72±5% for Otx2 CRM E, n=4, and 69%±15% for Otx2 CRM F, n=5; mean±s.d.) (Fig. 2B,C), a marker of photoreceptors (Hatakenaka et al., 1985; Yamagata et al., 1990) with a slightly later onset of expression than Thrb (Trimarchi et al., 2008). In addition, Otx2 CRM F showed expression in a minor population of cells localized more basally (Fig. 2C), where newborn HCs or cycling RPCs can localize (Baye and Link, 2008; Edqvist and Hallböök, 2004). The majority of the cells expressing GFP from Otx2 CRM E (95±2%, n=3) or CRM F (87±3%, n=3) contained Otx2 protein, as detected using immunohistochemistry (IHC). The most basal Otx2 CRM F-positive cells contained Otx2 protein expressed at lower levels (Fig. 2B,C). Using a Td-Tomato reporter for Otx2 CRM E, we also observed that all Otx2 CRM E-positive cells were positive for Otx2 CRM F (100%, n=2) (Fig. S1A). These observations are consistent with these CRMs acting as enhancers for the Otx2 gene primarily within RPCs and developing photoreceptor cells.
Using deletion analysis, we refined the minimal DNA sequence necessary for Otx2 CRM E and F activity (Fig. S1B). A 161 bp sequence for region E (named Otx2 CRM E2.1) and a sequence of 410 bp for region F (Otx2 CRM F2) were able to drive AP activity (Fig. 2D; Fig. S1B,C). The TRANSFAC prediction algorithm was used to search for potential TFBSs within the Otx2 CRM E2.1 sequence. Four clusters of TFBSs were identified, which were deleted and tested in the reporter assay (Fig. S1D). The deletion of three out of the four regions resulted in a significant decrease in expression compared with the wild-type (WT) sequence (Fig. 2D). One of the regions had a predicted TFBS for Otx2, which lost almost all expression upon deletion (del3) (Fig. 2D). Interestingly, we recently found an enhancer for the murine Otx2 gene, named Otx2 O5, which is active in developing cones (Perez-Cervantes et al., 2020), as well as in mature BPs (Chan et al., 2020). In BPs, it showed a similar autoregulation by Otx2 (Chan et al., 2020). When we blasted the chick Otx2 CRM E2.1 sequence to the mouse genome, it was conserved with the mouse Otx2 O5 sequence, and we found that the crucial TFBS deleted in Otx2 CRM E2.1 (del3) corresponded to the TFBS crucial for the mouse Otx2 O5 activity (Fig. 2E).
Using predictions from the ATAC-seq peaks, we also identified regions with CRM activity at the Oc1 locus (Fig. 3A). Multiple peaks were tested (n=13), with nine of them showing AP activity (Fig. 3B; Fig. S2A). Although Oc1 CRM B, J, K and L have not been previously described as potential CRMs, the other CRMs were recently reported in the chick retina (Patoori et al., 2020). The activity of the regions more open in ThrbCRM1+ cells that showed the strongest AP staining was then examined using GFP expression within tissue sections (Fig. 3C). These CRMs marked cells in various locations along the radial axis, perhaps marking cells at various developmental stages, e.g. possibly leaving M phase and migrating vitreally, as is the case for HCs. Oc1 CRM A- and L-positive cells were mostly found within the basal region of the retina, where HCs are found at this stage (Edqvist and Hallböök, 2004). The activity of the Oc1 CRM A was usually seen in cells expressing the Oc1 protein (77±1%, n=2), and the Lhx1 protein (65±1%, n=2), a marker of HCs (Liu et al., 2000), whereas expression from the Oc1 CRM L colocalized primarily with Oc1 (79±2%, n=2). It also colocalized with Lhx1, but so rarely that quantification was unreliable (n=3). A minority of cells positive for these two Oc1 CRMs had the morphology of early photoreceptors (Fig. 3C). Oc1 CRM B, J and K were active in cells found in the apical region, also resembling photoreceptors. In these cells, we could not detect Oc1 protein (Fig. 3C), consistent with a previous report that the gene is turned off as cells leave the RPC state and become photoreceptors (Wu et al., 2012). We also looked at the cellular activity of the Oc1 CRM G, which showed a similar chromatin accessibility between ThrbCRM1+ and ThrbCRM1− cells. As described recently for an overlapping CRM, Oc1 CRM ACR8 (Patoori et al., 2020), the activity of Oc1 CRM G activity was found excluded from ThrbCRM1+ cells in what could be other types of RPCs (Fig. S2B). In addition, we found expression driven by Oc1 CRM G in HCs and cells morphologically resembling retinal ganglion cells (Fig. S2B). Taken together, this set of CRMs, nominated by ATAC-seq peaks, revealed potential regulatory elements for Otx2 and Oc1, operating at overlapping early stages during the development of cones and HCs.
Identification of CRMs active in developing cones
We then used the differential ATAC-seq profiles to search for CRMs of other genes related to photoreceptor development, which showed enriched expression in the ThrbCRM1+ population (Fig. 4). Choices were made based upon our interest in genes investigated previously in our lab and those of other groups, and/or expression at an early time point in cone development, which might indicate regulation in cells transitioning from the cone/HC RPC state.
We tested several differential ATAC-seq peaks at the Blimp1 locus and three CRMs were found to induce AP activity (Fig. S3A). The chick Blimp1 CRM A peak was able to drive reporter activity and has conservation with the mouse sequence from B108 (Wang et al., 2014), which we found is also active in the chick retina (Fig. S3A). The Blimp1 CRM C also had activity, which was further refined to a fragment of 420 bp, called Blimp1 CRM Cb1 (Fig. S3B). We then looked at the cellular activity of the sequence C and its sub-sequence Cb1, co-electroporating the sequences along with a tdTomato reporter driven by ThrbCRM1 (ThrbCRM1-tdTomato). A strong correlation suggested that both elements were active in cones, as suggested by the presence of visinin in these cells (Fig. 4A; Fig. S3C).
We also identified multiple differential ATAC-seq peaks at the Nr2e3 locus (Fig. S3D). Two of these elements (Nr2e3 CRM C and D) overlapped with recently described potential CRMs (Jean-Charles et al., 2018). Nr2e3 CRM A and F were positive in cones, as shown by their overlap with ThrbCRM1-tdTomato activity and colocalization with visinin protein (Fig. 4A). Nr2e3 CRM A showed broader activity, as it was active in cells negative for ThrbCRM1 and visinin (Fig. 4A).
We tested four ATAC-seq peaks enriched in the ThrbCRM1+ cells that are near the Rbp4 gene, using the AP reporter assay in retinal explants. Three peaks were positive for AP expression (Fig. S3E). Rbp4 CRM A and D led to a strong enrichment of GFP expression in cells of the photoreceptor layer that also labeled with expression of ThrbCRM1-tdTomato and IHC for visinin (Fig. 4A). Cells positive for Rbp4 CRM C were more basal and rarely showed co-localization with visinin (Fig. S3G).
Two regions near Gngt2 were identified as more open in ThrbCRM1+ cells, and were found to drive AP activity (Fig. S3F). A recent study in the mouse retina showed that a reporter for mouse Gngt2 (mGngt2Enh1) showed activity in both cones and rods (Jean-Charles et al., 2018). Gngt2 CRM A in the chick showed conservation with mGngt2Enh1. An overlap between ThrbCRM1-tdTomato and Gngt2 CRM B suggested that the peaks for Gngt2 were active in cells that were cones (Fig. 4A).
Rxrg RNA is also strongly enriched in ThrbCRM1+ cells. Rxrg is a partner of Thrb, and is required for proper cone opsin regulation (Roberts et al., 2005). We identified a potential CRM near the promoter of Rxrg. An enhancer, Rxrg208, active in chick photoreceptors and HCs, was previously identified within this peak (Blixt and Hallböök, 2016) (Fig. 4B). We then asked whether ThrbCRM1 and Rxrg208 were active in the same cells. Two Stagia3 plasmids with these two CRMs were co-electroporated and found to have a strong overlap in their activity, likely in cones and HCs, as well as their progenitor cells (Fig. 4C). Although cells rarely showed expression driven only by ThrbCRM1, a larger number of cells showed expression driven only by Rxrg208. This pattern suggests that these cells might represent early HCs, which may express Rxrg but not Thrb, and/or different subtypes of cones (Enright et al., 2015).
Rxrg and Thrb are part of the same regulatory network
As Rxrg and Thrb CRMs showed activity in an overlapping population, we searched for potential upstream TFs of Rxrg208. TFBSs for both Otx2 and Oc were predicted by TRANSFAC within the Rxrg208 sequence (Fig. 5A). To assess the necessity of the Otx2 and Oc binding sites within the Rxrg208 CRM, we cloned into Stagia3 the region corresponding to the Rxrg208 sequence with or without Oc and Otx2 binding sites (Fig. 5B). Although the WT fragment showed strong AP activity, the absence of the binding site for either Oc or Otx2 led to almost no expression (Fig. 5B). These findings were confirmed by FACS analyses using the GFP readout of the Stagia3 plasmid (Fig. S4A,B). The necessity of these sequences was further tested by mutating the core part of the motif for both TFs and these mutants also exhibited dramatically reduced expression (Fig. 5B). These results suggest that both Rxrg and Thrb might be regulated by the same upstream TFs, which could facilitate their regulation of cone gene expression and patterning, possibly as heterodimers (Onishi et al., 2010; Roberts et al., 2005).
Otx2 and Oc regulate multiple other genes of the GRNs for cones
Because the TFBSs for Otx2 and Oc were enriched within the ATAC-seq ThrbCRM1+ high peaks (Fig. 1D), we asked whether the two TFs might have a broad role in cone development. We analyzed the co-occurrence of the binding sites for both Otx2 and Oc within the differential ATAC-seq peaks genome-wide, and the distance between them. The co-localization of binding sites for both TFs was more common within the ThrbCRM1+ high peaks compared with the ThrbCRM1− high peaks or the shared peaks (Fig. 5C; Fig. S4C,D). We analyzed their co-localization within 100 bp of each other, relative to the closest differentially expressed genes between ThrbCRM1+ and ThrbCRM1− cells, or within a 10 kb or 300 kb distance of their putative targets defined by the BETA analysis (Fig. 1C). Three such differential ATAC-seq peaks associated with the most differentially expressed genes included the CRMs ThrbCRM1, Rxrg208 and a predicted CRM for Sik1. Multiple other candidate CRMs were nominated using this analysis (Table S4). To assess whether these predicted CRMs were active, we tested 12 of these sequences. They were chosen as they were related to retinal development, and/or were associated with differentially expressed genes, and/or had a very short distance between the predicted Otx2 and Oc binding sites. We tested the WT sequence and fragments with a deleted Oc site for their ability to drive AP expression (Fig. 5D). Ten WT sequences showed AP activity. Seven CRMs showed dependence upon the Oc binding site, and three had no reliance upon the Oc binding site (Fig. 5D). We tested the effect of deleting the Otx2 binding site for the fragments showing a dependence on the Oc binding site (Fig. 5E). Compared with the WT sequence (Fig. 5D), we observed an almost total loss of AP expression upon deletion of the Otx2 binding site. FACS analysis also showed reduced expression upon deletion of Oc or Otx2 binding sites for the majority of these elements (Fig. S4E).
As Neurod1 is an important regulator of photoreceptor development, we further analyzed the activity of the Neurod1 CRM A. We co-electroporated Neurod1 CRM A with ThrbCRM1-tdTomato. The activity of Neurod1 CRM A showed a strong co-localization with the activity of ThrbCRM1, likely in cones and HCs (Fig. 5F). In addition, we electroporated the Sall1 CRM A, as Sall1 was shown to be regulated by thyroid hormone and to be expressed in chick cones (Enright et al., 2015; Liu et al., 2007). Sall1 CRM A was active in cones, seen by the morphology of the GFP-positive cells and IHC for the Rxrg protein (Fig. S4F).
To ask whether Otx2 and Oc might regulate the genes identified above, we looked for expression of these potential target genes in cells expressing Otx2 and Oc1. We used single-cell RNA (scRNA-seq) data from the chick retina at a corresponding developmental stage (Ghinia Tegla et al., 2020). We subdivided the cells based upon expression of Otx2 and Oc1, and found that most of the potential target genes with a reasonable detection rate showed expression in at least some of the cells also expressing both Otx2 and Oc1, relative to those cells that did not express Otx2 and Oc1 (Fig. 6).
Together, these data strongly suggest that Otx2 and Oc have a larger role than previously recognized in the regulation of genes in the cone/HC RPC and/or newly postmitotic and differentiating cones and HCs.
Otx2 and Oc1 proteins co-localize on chromatin throughout the genome
In order to test for direct binding by Otx2 and Oc on endogenous sites within chick retinal chromatin, the CUT&RUN assay (Skene and Henikoff, 2017) was carried out using antibodies against Otx2 and Oc1 (Table S5). Antibodies against the repressive histone mark H3K27me3 and non-specific IgG were used as positive and negative controls, respectively (Fig. S5). To validate the CUT&RUN results, a motif analysis using HOMER was carried out for both sets of peaks (Heinz et al., 2010) (Fig. 7). The top predicted motifs for the Otx2 CUT&RUN were for Otx2, GSC and CRX motif (Fig. 7A). These three homeobox TFs share very similar motifs for their binding. Interestingly, motifs for Cux2 and HNF6, which are the expected motifs for Oc, also were found in the most enriched Otx2-bound sequences, further suggesting that Otx2 and Oc are often found together. Other top predicted motifs were for other homeobox TFs Lhx2/3, Nxk6.1 and Isl1, and, interestingly, the bHLH genes Neurod1, Neurog2, Atoh1, Ascl1, Ptf1a and Olig2 (Fig. 7A; Table S6). HOMER de novo predicted motifs confirmed an enrichment for motifs similar to Otx, bHLH and CUT genes (Fig. 7B). In addition, a motif for nuclear receptors, matching Thrb, was predicted (Table S6). When looking at the motifs enriched within the DNA sequences bound by Oc1, Cux2 and HNF6 were the top known predicted motifs (Fig. 7C). Next were the same homeobox TFs detected as enriched in the sites bound by Otx2, which have very similar binding sites as Otx2 (Fig. 7A,C; Table S6), which again suggests that binding sites occupied by Otx2 and Oc are often found together. We could also find enrichment for the bHLHs identified among the Otx2-bound sequences. A major difference between Otx2-bound and Oc1-bound sequences was the higher enrichment for Sox gene motifs for Oc1 (Fig. 7A,C). The de novo motif prediction for Oc1 binding included those related to CUT, Sox, homeoboxes and bHLH genes (Fig. 7D; Table S6).
As predicted from the motif analysis above, Otx2 and Oc1 CUT&RUN peaks showed a very large overlap within the genome (6274 peaks) (Table S5). The list of sites identified as being bound by both Otx2 and Oc1 using CUT&RUN (Table S5) was compared with the list of CRMs suggested to be regulated by these TFs (Table S4). About one-third of them showed actual binding of Otx2 and Oc1. The CUT&RUN binding data and the results from deletion of the binding sites for Otx2 and Oc from the CRM assay (Fig. 5) were compared (Fig. 7E-I; Table 1). The sequence of the ThrbCRM1 element, which is dependent upon both the Otx2 and Oc binding sites, confirmed binding by both Otx2 and Oc1 (Emerson et al., 2013) (Fig. 7E). Otx2 and Oc1 binding also was seen for the CRMs for Rxrg, Sik1, Neurod1 and Esrrg, all of which showed a loss of activity upon deletion of the Otx2 and Oc binding sites (Figs 5B,D,E and 7E). However, some CRMs that partially lost activity upon deletion of the Oc binding site showed no significant enrichment for binding by Oc1, though they did show Otx2 binding (Fig. 7F). For other CRMs, a change in AP activity was not obvious after the deletion of the Oc binding site, though subtle changes cannot be ruled out, but they were bound by both TFs (Figs 5D and 7G). An additional case was shown by the Sall1 CRM A. Although the deletion of Otx2 and Oc predicted binding sites led to a loss of activity (Fig. 5D,E), the binding of these two TFs was not significantly enriched using CUT&RUN (Fig. 7H). Other CRMs that did not show a major change in AP activity when the Oc binding site was deleted, also, predictably, did not show binding by Oc1 (Fig. 7I).
The genome-wide CUT&RUN assay for Otx2 and Oc1 binding expanded considerably our previous list of candidate CRMs with co-occurrence of the TFBSs, defined by known motifs for Otx2 and Oc. When intersecting the binding data with the ATAC-seq ThrbPos high or ThrbNeg high peaks, we identified several additional potential CRMs regulated by Otx2 and Oc at the Thrb, Rxrg, Otx2, Oc1, Sik1, Neurod1 or Sall1 loci, as well as hundreds of other potential CRMs for other genes. The CUT&RUN binding site data also were inspected for the CRMs shown in Figs 2–4 (Fig. S6). All of the regions tested for activity at the Otx2 locus, except for Otx2 CRM G, showed enrichment for Otx2 binding, further suggesting that Otx2 auto-regulates its expression (Fig 2D,E; Fig. S6A) (Chan et al., 2020). Some CRMs for Oc1 also showed Oc1 binding, consistent with auto-regulation (Fig. S6B). CRMs for Rbp4 were bound by Otx2 but not Oc1 (Fig. S6C). Similarly, the Gngt2 CRM A and B showed Otx2 binding only (Fig. S6D). The Blimp1 CRM A and Nr2e3 CRM F were bound by both Otx2 and Oc1 (Fig. S6E,F).
In this study, we explored the molecular underpinnings of the formation of two early retinal cell types, with a focus on cone photoreceptors, a vulnerable cell type that is crucial for color vision. We comprehensively examined the chromatin landscape relative to differential RNA expression, as well as tested many potential CRMs, in cone/HC RPCs and their newly postmitotic progeny. We identified crucial sequences within these CRMs by making mutations of predicted binding motifs for Otx2 and Oc TFs. Binding by these TFs was further confirmed by genome-wide chromatin binding assays. The data from these unbiased approaches show that Otx2 and Oc likely directly regulate a broad range of genes expressed in these early cell types, in many cases acting on sites that are located relatively close together. These data are in keeping with expression analyses and functional tests that show the necessity of Otx2, and in some cases, Oc genes, in human stem cell cultures (Welby et al., 2017), and in mice and chicks (Koike et al., 2007; Nishida et al., 2003; Sapkota and Mu, 2015; Sato et al., 2007) for cone and HC development. The prediction that they directly regulate a broad range of genes important for the development and function of these cell types provides a mechanism for their key roles in development. It is worth noting that Oc2 may contribute as well. Oc2 is expressed in some of the same cells as Oc1 or Otx2 (Ghinia Tegla et al., 2020), and binds to a similar site as Oc1 (Furuno et al., 2008; Harada et al., 1995; Lannoy et al., 1998). Although the CUT&RUN data presented here used an antibody reported to be specific to Oc1 (Klimova et al., 2015), this does not rule out a role for Oc2 as well.
It was interesting to see that the chromatin profiles of ThrbCRM1+ and ThrbCRM1− cells were very similar. The ThrbCRM1− population is presumed to be composed primarily of RPCs and their newly postmitotic daughters, as electroporation preferentially targets these cell types (Matsuda and Cepko, 2004). A fraction of these cells, however, might become ThrbCRM1+, and thus may show some aspects of chromatin structure that overlap with those of ThrbCRM1+ cells. In addition, they might not have activated the ThrbCRM1 reporter at a high enough level to be selected by FACS. High-level expression from this reporter appears to require a multimerized CRM1 sequence (Souferi and Emerson, 2019), which we have noted sometimes suffers deletions during growth of the plasmid in bacteria. The opposite pattern, with peaks that are high in ThrbCRM1− cells also open in ThrbCRM1+ cells, could be due to a lag in the closing of chromatin in certain areas only after cells start to differentiate. Analysis of later time points could address this possibility. Some of the CRMs identified in this study, including those that could serve as negative regulatory elements at some stages/cell types, also could be used to dissect the stages of differentiation of cones and HCs.
Despite the potentially imperfect separation of ThrbCRM1+ and ThrbCRM1− cells, the integration of the differential open chromatin peaks identified by ATAC-seq pointed to potential CRMs for genes preferentially expressed in the ThrbCRM1+ population. This included several new CRMs near the Otx2 and Oc1 genes that complement our recent analyses of their regulation in the mouse retina, where we used non-coding RNA and chromatin profiling to nominate CRMs (Chan et al., 2020; Perez-Cervantes et al., 2020). The ATAC-seq differential peaks also provided for a prediction of TFBS motifs enriched in these CRMs, with experimental tests of activity validating these predictions in most cases. These series of experiments led to the appreciation of a broader role for Otx2 and Oc, potentially as direct regulators of genes involved in early retinal development.
To directly assess the binding of Otx2 and Oc1, a genome-wide CUT&RUN was used. This assay showed that the two TFs indeed often bind within the same regulatory regions. Direct binding of both TFs near multiple genes important in cone development, including Thrb, Sall1, Neurod1 and Rxrg, was found. Interestingly, Rxrg is a well-described partner of Thrb, as the two proteins establish cone opsin patterning, possibly as heterodimers (Onishi et al., 2010; Roberts et al., 2005). We showed that a previously described CRM, Rxrg208, is active in terminally dividing RPCs expressing Thrb. ThrbCRM1 and Rxrg208 were shown to share a much conserved DNA sequence with nearly overlapping binding sites for Otx2 and Oc necessary for their CRM activity. In addition to these two regulators of cone development, a CRM near the Neurod1 gene showed a strong co-localization of its activity with the activity of the ThrbCRM1 and was found to be bound by Otx2 and Oc1. This Neurod1 CRM activity was dependent upon Otx2 and Oc binding sites. Neurod1 is required for the maintenance of expression of Thrb in cones (Liu et al., 2008) and for survival of photoreceptors in mice (Morrow et al., 1999). Furthermore, the motif for Neurod1 binding showed enrichment within the regions bound by Otx2 and Oc1, genome wide, suggesting that Neurod1 likely plays a broad role in photoreceptor development, in collaboration with these homeobox TFs. Additional investigation of functional interactions between Otx2 and Oc TFs, e.g. studies asking whether they bind to each other and/or other TFs, as well as their binding patterns, e.g. the importance of the distance between their binding sites, may lead to a deeper understanding of how they play important roles in the development of multiple, distinct retinal cell types.
We found that Otx2 can bind to several CRMs near the Otx2 gene, and Oc1 similarly can bind to regions near Oc1. The predicted auto-regulation may enable the patterns observed as cells differentiate, where Otx2 stays on and Oc1 goes off in cones, and vice versa in HCs (Bernard et al., 2014; Nishida et al., 2003; Wu et al., 2012). There is likely a role for negative regulation as well. Otx2 auto-regulation appears to be via the CRM E in early chick retina, as we report here, and via Otx2 CRM 05 in mouse BP cells, as we reported recently (Chan et al., 2020), with these two CRMs sharing highly conserved sequences. Otx2 auto-regulation may be a theme for this gene, as it has been proposed in other contexts as well (Matsumoto et al., 2020; Wortham et al., 2014). Otx2 auto-regulation, as well as its direct binding to CRMs near cone genes, might suggest that Otx2 is a terminal selector gene (Hobert and Kratsios, 2019) for photoreceptors. However, the fact that Otx2 is also expressed at a high level in BP cells, and is required for their development, suggest that its role as a terminal selector may require an additional factor(s) for specificity in the genes that it regulates in different cell types.
In addition to the role of Otx2 in directly regulating transcription, it has been proposed to act as a pioneer factor, regulating access to CRMs within chromatin, notably through collaboration with Neurod1 (Boulay et al., 2017; Buecker et al., 2014). Similarly, Onecut TFs have been proposed to act as pioneer factors, as their misexpression led to the generation of neurons from fibroblasts (van der Raadt et al., 2019), as has been shown to follow overexpression of Neurog2 (Zhang et al., 2013). The similar phenotypes of neuron induction following Oc1 and Neurog2, and the collaboration of Otx2 with Neurod1, are interesting findings in light of our CUT&RUN motif analysis, in which binding sites for combinations of these TFs are overrepresented. Moreover, the notion of pioneer activity of Otx2 and Oc factors fits in well with their roles in retinal development. As Oc genes are not expressed in cones, and Otx2 is not expressed in HCs, yet mouse knockouts show a role for both factors in cone and HC development (Nishida et al., 2003; Sapkota et al., 2014; Sato et al., 2007), an early role, perhaps as pioneer factors in RPCs, is supported. As cells exit mitosis to differentiate, the roles of these two TFs may change, and may include some cross-repression (Ghinia Tegla et al., 2020) as well as positive auto-regulation.
Diseases affecting cone photoreceptors, such as retinitis pigmentosa and age-related macular degeneration (AMD), affect the lives of millions of people. The data presented here bring a greater understanding of the processes that might lead to the development of cell-based therapies to treat retinal disease. The engraftment of stem-cell derived cones, the generation of cones from endogenous stem cells, and/or the use of cones for in vitro screens of potential therapies, would all benefit from an efficient production of cones. Cones are also a common target for gene therapy, and vectors that direct expression specifically to cones are beneficial for this approach. The CRMs reported here likely will expand the repertoire of CRMs for cone expression, in terms of levels, timing, and/or specificity. Furthermore, there is quite a bit of heterogeneity among individuals with the same retinal disease gene. As multiple genes characterized in our study have been associated with retinal diseases, their associated CRMs, or the motifs identified here, can form the basis of a study for causal variants that might affect the expression of a nearby disease gene (Cherry et al., 2020).
MATERIALS AND METHODS
Chick embryo procedures were approved by the Institutional Animal Care and Use Committee at Harvard University.
Electroporation & AP staining
E5 chick retinas were dissected and electroporated ex vivo as previously described (Billings et al., 2010; Matsuda and Cepko, 2004), with 5×50 ms 22.5 V pulses and 950 ms intervals, using a NEPA21 type II Nepagene electroporator. The electroporation chamber was modified as previously described (Montana et al., 2011). DNA was diluted in PBS with 6 μg total for the CRM plasmids, and 3-3.75 μg for the control plasmids, in 60 μl. After 2 days in culture in 10% fetal bovine serum (FBS), 45% DMEM, 45% F12 and 100 U/ml penicillin, retinas were fixed at room temperature (RT) for 30 min in 4% formaldehyde and washed three times in PBS. After fixation, at least two biological duplicates were processed for AP staining at RT (Billings et al., 2010), using 20 μl/ml of NBT/BCIP solution in NTM buffer (pH 9.5), and developed for 4-8 h, or processed for IHC. The coordinates (Gallus gallus genome galgal5) or sequences of the regions tested in this study are found in Table S7. Plasmids were constructed using gBlock or PCR to clone the CRM sequences in the Stagia3 backbone (Billings et al., 2010).
IHC and imaging
Fixed retina were frozen in 30% sucrose/PBS as previously described (Cherry et al., 2011) and 20-30 μm sections were prepared for staining (Emerson and Cepko, 2011). Blocking solution was 0.3% Triton X-100 in 1×PBS. Primary antibodies used in this study were: chicken anti-GFP (1:1000, Abcam, AB13970), rabbit anti-mCherry (1:1000, Abcam, 167453), rabbit anti-Otx2 (1:200-400, Proteintech, 13497-1-AP), mouse anti-Rxrg (1:50, Santa Cruz Biotechnology, sc-365252), rabbit anti-Oc1 (1:100, Santa Cruz Biotechnology, sc-13050), mouse anti-Lhx1 [1:30, Developmental Studies Hybridoma Bank (DSHB), 4F2-c], mouse anti-visinin (1:250, DSHB, 7G4), rabbit anti-Blimp1 (1:1000, GenScript, A01647). Retinas were washed with PBS, mounted with Fluoromount-G (SouthernBiotech). Retina explants were imaged on a Leica M165FC microscope. Retinal section images were acquired using a Zeiss LSM780 inverted confocal microscope. Images were processed with ImageJ to adjust brightness and contrast.
FACS sorting for ATAC-seq
E5 retina were electroporated with ThrbCRM1 and CAG-mCherry plasmids and cultured ex vivo for 2 days. Retina were dissociated as previously described (Shekhar et al., 2016) using DAPI as a dead cell stain. Cells from retina not electroporated, or electroporated with only ThrbCRM1 or CAG-mCherry were used to determine background. Cells from two to three retinas were collected and sorted as ThrbCRM1+/mCherry+ and ThrbCRM1−/mCherry+ using a BD FACS Aria machine, and processed for ATAC-seq protocol.
FACS sorting for CRM quantification
Flow cytometry was performed on a Cytek XP11 flow cytometer using its 488 and 561 nm lasers. Cellular debris was gated out from downstream analysis via a FSC-H versus SSC-H plot. Cell doublets and multiplets were subsequently gated out via a FSC-H versus FCS-W plot. Singlet cells were compared between experimental and control conditions to determine the placement of cell populations of interest in fluorescent plots. Cells fluorescing in a particular channel are demonstrated by a shift along the fluorescent axis, compared with non-fluorescing control cells. Gates denoting fluorescently-positive cells were derived from these experimental versus control sample comparisons.
ATAC-seq libraries for each condition were prepared using the standard protocol (Buenrostro et al., 2013). We used ∼50,000 cells per condition, from retina cultured 2 days ex vivo. The first replicate was performed and sequenced separately. The second and third replicates were composed of cells sorted on the same day from the same pool of dissociated retinas, prepared for ATAC-seq and sequenced in parallel.
We performed the CUT&RUN according to the protocol from (Meers et al., 2019) and available at dx.doi.org/10.17504/protocols.io.zcpf2vn. E5 WT retina were dissociated as previously described (Shekhar et al., 2016) to collect ∼1,300,000 cells. After ConA-coated magnetic bead binding to cells, the cells were split into six tubes for antibody incubation at 4°C overnight. Antibodies used were: rabbit anti-H3K27me3 (C36B11) (1:100, Cell Signaling Technology, 9733T), rabbit anti-mouse IgG H&L (1:100, Abcam, ab46540), rabbit anti-Oc1 (1:100, Santa Cruz Biotechnology, sc-13050) and rabbit anti-Otx2 (1:200, Proteintech, 13497-1-AP). The chromatin digestion and release step was following the protocol option 1: Standard CUT&RUN. Each sample was washed in 300 μl Dig-wash buffer, then we added 6 μl of 100 mM CaCl2 for chromatin digestion and release and samples were incubated on ice. After 25 min 100 μl was collected and added to a new tube containing 2× STOP Buffer, 100 μl was removed after 30 min and the remaining 100 μl was digested for 45 min before stopping digestion and proceeding with the protocol. Libraries for cells digested for 25 and 30 min were then prepared using the NEBNext Ultra II DNA Library Prep kit (E7645S; New England Biolabs), following the protocol available at dx.doi.org/10.17504/protocols.io.wvgfe3w, except for samples treated with IgG and H3K27me3 antibodies that were prepared following New England Biolabs protocol, as recommended.
ATAC-seq data analysis
ATAC-seq data sets were aligned to the Gallus gallus genome galgal5 using bowtie2 (Langmead and Salzberg, 2012) with -X 1000. MACS2 (Feng et al., 2012) was used to identify the ATAC-seq enriched regions. Parameters -B --SPMR --nomodel --extsize 146 were used while peak calling. MACS2 bdgcmp (-m subtract) was used to calculate the subtraction between two conditions; Peaks with P-value<=1e−30 that identified by MACS2 bdgpeakcall based on the subtracted ATAC-seq signal between conditions were defined as the differential peaks. This approach was used as we found that some ThrbPos high peaks partially overlapped ThrbNeg high peaks, and within such regions, differential enrichment in sub-regions might not be called differential by MACS2. Preliminary analyses of CRM activity within differential sub-regions of such peaks showed that some of these regions displayed specific CRM activity.
RNA-seq data analysis
Selected peaks activating and repressive function analysis
We used BETA pipeline (Wang et al., 2013) with parameter --d.f. 0.05 to predict different classes of peak activating and repressive function. Regulatory potential for each gene was calculated as . All peaks (k) near the transcription start site (TSS) of the gene (g) within 100 kb are considered. Δ is the exact distance between a binding site and the TSS proportional to 100 kb (Δ=0.1 means the exact distance=10 kb). P-values listed in each graph were calculated by Kolmogorov–Smirnov test to measure the significance of the upregulated genes group or downregulated genes group relative to static genes group.
MDSeqPos (X. Shirley Liu laboratory, Harvard T.H. Chan School of Public Health, MA, USA) was applied to identify binding motifs based on ThrbPos high peaks and ThrbNeg high peaks. All binding sites were trimmed or extended to 600 bp and centered at the center of the peak regions. Motifs were ranked by the z-score that was calculated from MDSeqPos, only the top 70 motifs with expression in at least one condition were collected. The list of motifs identified in each condition can be found in Table S3. The enrichment significance in each class is shown in Fig. 1D. Values shown in Fig. 1D are negative z-score output from MDSeqPos: the higher the value, the more significant the motif enriched, value=0 represents the lack of enrichment.
Motif co-occurrence analysis
FIMO (tool from the MEME suite, version 5.0.3, with parameter --thresh 1e-3 --max-stored-scores 2000000) (Grant et al., 2011) was used to search all hits of Otx2 and Oc across the whole genome. Bedtools (shuffleBed) (Quinlan and Hall, 2010) was used to get the random peaks with the same feature of ATAC-seq peaks from the whole genome. Fisher's-exact test was used to calculate the P-values between ThrbPos/ThrbNeg high peaks and shuffled peaks.
The reads from the CUT&RUN experiments were aligned to the Gallus gallus genome galgal5 using bowtie2 (Langmead and Salzberg, 2012) with the parameters: --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700 and processed with SAMtools (Li et al., 2009) to remove duplicates and MT reads. We generated BigWig coverage tracks using deepTools bamCoverage (Ramírez et al. 2016) with the following parameters: -binSize 1 --normalizeUsing RPGC --effectiveGenomeSize 1046932099. BigWig for all datasets were visualized in IGV Genome browser. Peaks were called using MACS2, with the IgG control of the corresponding digestion time as the background, using the parameter −g 1.03e9. Peaks for Otx2 CUT&RUN that were run in parallel with 25 and 30 min digestion were merged, as was done for Oc1 peaks, using the bedtools merge command (Quinlan and Hall, 2010). To compare how many peaks were bound by both Otx2 and Oc1, we used bedtools intersect with parameter −f 0.5 −F 0.5 −e. Motif analyses of the CUT&RUN merged peaks was performed with HOMER (Heinz et al., 2010), using the parameter -size 50 -p 10.
We used single-cell RNA data from the chick retina (GEO Accession number: GSE142244) (Ghinia Tegla et al., 2020) for our analysis. The WT sample, GSM4223661 was used for the analysis. Based on the processed UMI count matrix, the cells were subdivided into Otx2 only, Oc1 only, Otx2 and Oc1 double-positive and double-negative cell populations. Log normalization was applied to the dataset to derive the average expression of each subset.
We thank Bess Miller for her help with the Blimp1 CRM screening, as well as the Microscopy Resources on the North Quad (MicRoN) core, the Department of Immunology's Flow Cytometry Facility, the BWH Flow Cytometry and the Biopolymers Facility cores at Harvard Medical School for their service and support. We thank Nikki Kong for her generous gift of Protein A-MNase. We thank the Cepko and Tabin lab members for valuable discussions and advice, and for sharing reagents.
Conceptualization: N.L., C.C.; Methodology: N.L., S.W., C.C., C.L.; Software: N.L., S.W., C.L.; Validation: N.L., S.W., C.L.; Formal analysis: N.L., S.W., C.L.; Investigation: N.L., M.G., J.C.; Resources: N.L., S.W., C.L.; Data curation: N.L., S.W., C.L.; Writing - original draft: N.L., C.C.; Writing - review & editing: N.L., S.W., C.L., M.G., P.J.P., C.C.; Visualization: N.L., S.W., C.L.; Supervision: N.L., P.J.P., C.C.; Project administration: N.L., C.C.; Funding acquisition: N.L., P.J.P., C.C.
Support was provided by fellowships from the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (151974 and 171521) and the Human Frontier Science Program (to N.L.), the National Institutes of Health (C.C., N.L., C.L.) (EYO 29771 and HD032443) and the Howard Hughes Medical Institute (C.C.). Deposited in PMC for release after 12 months.
The datasets generated during this study are available in the GEO database with the accession numbers GSE151948.
The authors declare no competing or financial interests.