ABSTRACT
The invasive trophoblast cell lineages in rat and human share crucial responsibilities in establishing the uterine-placental interface of the hemochorial placenta. These observations have led to the rat becoming an especially useful animal model for studying hemochorial placentation. However, our understanding of similarities or differences between regulatory mechanisms governing rat and human invasive trophoblast cell populations is limited. In this study, we generated single-nucleus ATAC-seq data from gestation day 15.5 and 19.5 rat uterine-placental interface tissues, and integrated the data with single-cell RNA-seq data generated at the same stages. We determined the chromatin accessibility profiles of invasive trophoblast, natural killer, macrophage, endothelial and smooth muscle cells, and compared invasive trophoblast chromatin accessibility with extravillous trophoblast cell accessibility. In comparing chromatin accessibility profiles between species, we found similarities in patterns of gene regulation and groups of motifs enriched in accessible regions. Finally, we identified a conserved gene regulatory network in invasive trophoblast cells. Our data, findings and analysis will facilitate future studies investigating regulatory mechanisms essential for the invasive trophoblast cell lineage.
INTRODUCTION
Hemochorial placentation is a reproductive strategy used by some mammals, including mouse, rat and human (Roberts et al., 2016). This type of placentation involves establishment of a uterine-placental interface characterized by trophoblast cells of extra-embryonic origin breaching the maternal vasculature (Roberts et al., 2016). Trophoblast cells are the parenchymal cells of the placenta (Red-Horse et al., 2004; Soares et al., 2018; Knöfler et al., 2019). Their origins can be traced to the trophectoderm of the early embryo and the initial cell differentiation event during embryogenesis (Gardner and Beddington, 1988; Rossant, 2001). Trophoblast cells differentiate into a range of specialized lineages (Gardner and Beddington, 1988; Soares et al., 2018; Knöfler et al., 2019). Among the specialized trophoblast cell lineages are invasive trophoblast (generic term) or extravillous trophoblast (EVT, a human/primate-specific term). These cells exit the placenta and enter the uterine compartment where they transform the vasculature and immune environment into a structure that ensures placental and fetal viability and growth (Soares et al., 2018; Knöfler et al., 2019; Turco and Moffett, 2019). Failures in invasive trophoblast/EVT cell differentiation and function result in a range of pregnancy-related diseases such as pre-eclampsia, intrauterine growth restriction and preterm birth (Brosens et al., 2011; 2019). Deep trophoblast cell invasion and uterine transformation are characteristic features of rat and human placentation sites (Pijnenborg and Vercruysse, 2010; Pijnenborg et al., 2011; Soares et al., 2012; Shukla and Soares, 2022). Identification of potential regulatory mechanisms controlling cellular constituents of the rodent and human uterine-placental interface have emerged from single-cell RNA-sequencing (scRNA-seq) (Nelson et al., 2016; Liu et al., 2018; Suryawanshi et al., 2018; Vento-Tormo et al., 2018; Sun et al., 2020; Marsh et al., 2022; Scott et al., 2022). Conserved sets of transcripts have been identified in rat invasive trophoblast and human EVT cells (Scott et al., 2022). These insights have led to the identification of candidate regulators of invasive trophoblast and EVT cell lineages, and dissection of their biological relevance using trophoblast stem (TS) cells and rat models (Muto et al., 2021; Varberg et al., 2021; Kuna et al., 2023). Such experimentation has advanced the field but on its own is an inefficient strategy for defining gene regulatory networks that drive invasive trophoblast/EVT cell lineage development and function.
Gene regulatory networks can be accessed through genome-wide analysis of the chromatin landscape (Ong and Corces, 2012; Tuteja et al., 2014; Yadav et al., 2018; Peñalosa-Ruiz et al., 2019). Indeed, insights into the hierarchical regulation of rodent and human trophoblast cell development have been achieved through deep sequencing of histone modifications defining gene activation and repression states (Rugg-Gunn et al., 2010; Chuong et al., 2013; Tuteja et al., 2016; Schoenfelder et al., 2018; Xu and Kidder, 2018; Kwak et al., 2019; Lee et al., 2019; Starks et al., 2021; Zhang et al., 2021). The integration of transcriptome and chromatin accessibility datasets has also been used as an effective tool to elucidate gene regulatory networks in trophoblast tissue and cells (Nelson et al., 2017; Starks et al., 2019).
In this article, we interrogated the chromatin landscape of invasive trophoblast cells isolated from the uterine-placental interface of the rat using single-nucleus assay for transposase-accessible chromatin-sequencing (snATAC-seq). These datasets were integrated with scRNA-seq datasets from rat and human invasive trophoblast/EVT cells (Scott et al., 2022), as well as ATAC-seq from EVT cells (Varberg et al., 2023 preprint), to identify conserved gene regulatory networks controlling the invasive trophoblast cell lineage.
RESULTS
Identification of chromatin accessibility profiles in cell types of the rat uterine-placental interface
We generated snATAC-seq profiles from gestation day (gd) 15.5 and 19.5 uterine-placental interface tissue of the rat to determine chromatin accessibility of its cellular constituents. These datasets were integrated with scRNA-seq profiles obtained from the same tissues (Scott et al., 2022).
After quality control and preprocessing (Figs S1 and S2), we obtained 25,321 and 14,388 high-quality nuclei in the gd 15.5 and gd 19.5 samples, respectively (Table S1). Next, snATAC-seq data was integrated with scRNA-seq data (Scott et al., 2022) to identify cell populations based on the relationship between accessibility and gene expression profiles (Stuart et al., 2019) (Fig. 1A). Clusters and chromatin accessibility profiles of invasive trophoblast, natural killer, macrophage, endothelial and smooth muscle cells were identified (Table S1).
Chromatin accessibility profiles of cell populations at the uterine-placental interface. (A) UMAP of snATAC-seq profiles at gestation day (gd) 15.5 (n=3 pregnancies) and 19.5 (n=3 pregnancies) showing cell identities obtained by transferring labels from scRNA-seq data. (B) Dot plots showing known markers of cell types generally have higher accessibility within 2000 bp of the transcription start sites (TSS) and in a higher percentage of nuclei than in other cell populations. Dot sizes correspond to the percentage of nuclei in each cell population that were open around the TSS; colors correspond to the levels of predicted gene activity. (C) Stack bar plots showing that cell type-specific open chromatin peaks were most often distal to the TSS. For distribution of distances for each individual cell type, see Fig. S4. (D) Bar plots showing the number of open chromatin peaks specific to a cell population, and the number of nearest genes to cell-specific open chromatin peaks. Cell-specific open chromatin peaks are open chromatin peaks differentially accessible in the cell population compared with all other cell populations [adjusted P≤ 0.05, average log2(fold change)≥log2(1.5)].
Chromatin accessibility profiles of cell populations at the uterine-placental interface. (A) UMAP of snATAC-seq profiles at gestation day (gd) 15.5 (n=3 pregnancies) and 19.5 (n=3 pregnancies) showing cell identities obtained by transferring labels from scRNA-seq data. (B) Dot plots showing known markers of cell types generally have higher accessibility within 2000 bp of the transcription start sites (TSS) and in a higher percentage of nuclei than in other cell populations. Dot sizes correspond to the percentage of nuclei in each cell population that were open around the TSS; colors correspond to the levels of predicted gene activity. (C) Stack bar plots showing that cell type-specific open chromatin peaks were most often distal to the TSS. For distribution of distances for each individual cell type, see Fig. S4. (D) Bar plots showing the number of open chromatin peaks specific to a cell population, and the number of nearest genes to cell-specific open chromatin peaks. Cell-specific open chromatin peaks are open chromatin peaks differentially accessible in the cell population compared with all other cell populations [adjusted P≤ 0.05, average log2(fold change)≥log2(1.5)].
These analyses are based on an assumption that there is a significant correlation between gene expression level (scRNA-seq data) and chromatin accessibility (snATAC-seq data) (Stuart et al., 2019). Therefore, as a quality control step for the snATAC-seq cluster labeling, we calculated the Spearman correlation between gene expression and chromatin accessibility profiles. We obtained moderate but significant correlations (0.44≤ρ≤0.54, P<2.2e-16) in all cell populations (Fig. S3), which agrees with previous studies carried out at both single-cell and tissue levels (Starks et al., 2019; Pervolarakis et al., 2020; Merrill et al., 2022). Moreover, we observed that established marker genes for each cell population are generally more accessible in the respective cell population (Fig. 1B), demonstrating we have obtained high-quality clustering and cluster annotation.
We further performed differential accessibility analysis on both gestation days to identify the most accessible peaks in each cell type (defined as cell type-specific peaks). The distance distribution of cell type-specific peaks to the nearest gene transcription start site (TSS) showed that, in general, most of the cell type-specific peaks are distal to the TSS (>5 kb) (81.65% at gd 15.5; 75.16% at gd 19.5) (Fig. 1C, Fig. S4). Moreover, we observed that the invasive trophoblast cell population had the highest number of cell type-specific peaks of the major cell types analyzed, despite being of lower abundance than some other cell types (Fig. 1D). Related to this, the invasive trophoblast cell population had the most gene-associated accessible chromatin among the cell types identified.
Identification of invasive trophoblast cell-regulated genes using cell type-specific chromatin accessibility profiles
After the observation that invasive trophoblast cells had the most cell type-specific peaks, we next checked the number of peaks associated with each gene at each gestation day. At both gestational days, there were many genes associated with at least two open regions (808 and 349 genes at gd 15.5 and 19.5, respectively) (Fig. 2A). Next, we investigated the differences in expression levels of transcripts linked to two, three, four or five peaks using the average expression level obtained from the scRNA-seq data. In general, we observed an increasing trend of expression level when a transcript is associated with more peaks. Furthermore, we observed that although gd 15.5 expression levels were significantly different as the number of associated peaks increased, at gd 19.5, transcript expression profiles were not significantly different when more peaks were associated with a transcript after a cut-off of 3 (Fig. S5). Therefore, we partitioned transcripts into two groups for the next analyses: at least three peaks or fewer than three peaks. At both gestation days, genes with more than three peaks had significantly higher expression than genes with fewer than three peaks (P=7.029e-09 and 1.374e-08 at gd 15.5 and 19.5, respectively) (Fig. 2B), suggesting that, in general, genes with at least three trophoblast-specific peaks are more active within the cell population and could have important functional roles in trophoblast cells. However, there are notable exceptions, including Prl7b1 (Table S2), which has fewer than three peaks but is expressed specifically and at one of the highest levels in the invasive trophoblast cell lineage.
Analysis of chromatin accessibility profiles can identify regulatory regions for genes defining the invasive trophoblast cell population. (A) Histograms of the number of invasive trophoblast-specific (iTB-specific) peaks per gene showing that many genes had at least one peak. The x-axis shows the number of peaks per gene; the y-axis shows the number of genes. (B) Boxplots of transcript expression associated with iTB-specific peaks showing that genes with at least three peaks had significantly higher expression than genes with fewer than three peaks. Expression is plotted in a log10(average expression+10−5) scale. Statistical analyses were performed using Wilcoxon rank sum tests at a significance level of 0.05. Boxes indicate interquartile range, lines indicate median, and whiskers range from the minimum to the maximum values. (C) Examples of iTB-specific genes with at least three associated peaks at both gd 15.5 and 19.5. For each subplot, the first row is composed of five tracks of normalized accessibility, corresponding to five cell types. The right-most columns show the predicted gene activity using chromatin accessibility within 2000 bp of the TSS. The middle and bottom rows include two tracks corresponding to gene location (middle) and open chromatin peak locations (bottom). norm., normalized. (D) Boxplots of the number of conserved ATAC-seq peaks in EVT cells and rat invasive trophoblast cells. Rat genes with at least three invasive trophoblast cell-specific peaks had significantly more EVT cell ATAC-seq peaks than rat genes with fewer than three invasive trophoblast cell-specific peaks. Boxes indicate interquartile range, lines indicate median, and whiskers range from the minimum to the maximum values. ATAC-seq peaks in EVT cells were obtained from Varberg et al. (2023 preprint). Statistical analyses were performed using Wilcoxon rank sum tests (*P<0.05).
Analysis of chromatin accessibility profiles can identify regulatory regions for genes defining the invasive trophoblast cell population. (A) Histograms of the number of invasive trophoblast-specific (iTB-specific) peaks per gene showing that many genes had at least one peak. The x-axis shows the number of peaks per gene; the y-axis shows the number of genes. (B) Boxplots of transcript expression associated with iTB-specific peaks showing that genes with at least three peaks had significantly higher expression than genes with fewer than three peaks. Expression is plotted in a log10(average expression+10−5) scale. Statistical analyses were performed using Wilcoxon rank sum tests at a significance level of 0.05. Boxes indicate interquartile range, lines indicate median, and whiskers range from the minimum to the maximum values. (C) Examples of iTB-specific genes with at least three associated peaks at both gd 15.5 and 19.5. For each subplot, the first row is composed of five tracks of normalized accessibility, corresponding to five cell types. The right-most columns show the predicted gene activity using chromatin accessibility within 2000 bp of the TSS. The middle and bottom rows include two tracks corresponding to gene location (middle) and open chromatin peak locations (bottom). norm., normalized. (D) Boxplots of the number of conserved ATAC-seq peaks in EVT cells and rat invasive trophoblast cells. Rat genes with at least three invasive trophoblast cell-specific peaks had significantly more EVT cell ATAC-seq peaks than rat genes with fewer than three invasive trophoblast cell-specific peaks. Boxes indicate interquartile range, lines indicate median, and whiskers range from the minimum to the maximum values. ATAC-seq peaks in EVT cells were obtained from Varberg et al. (2023 preprint). Statistical analyses were performed using Wilcoxon rank sum tests (*P<0.05).
In addition, we compared transcripts that have at least three open regions with transcripts that have invasive trophoblast cell cluster-specific expression (invasive trophoblast cell marker transcripts), previously determined from the scRNA-seq data (Scott et al., 2022) at each gestation day. At gd 15.5, 57 of the 274 genes with at least three peaks were also markers of the invasive trophoblast cell cluster (P=5.29e-08); at gd 19.5, 39 of the 103 genes with at least three were markers of the invasive trophoblast cell cluster (P=6.79e-06). These markers included genes with known trophoblast functions [Tfap2c (Auman et al., 2002; Werling and Schorle, 2002; Kuckenberg et al., 2012), Ets2 (Yamamoto et al., 1998) and Cited2 (Withington et al., 2006; Imakawa et al., 2016; Kuna et al., 2023)], and genes known to be prominently expressed in invasive trophoblast cells (Prl5a1; Ain et al., 2003) (Fig. 2C). Of note, although some of these markers (Cited2 and Ets2) have similar activities around their promoter regions in all cell types, they have multiple associating peaks that are specific to the invasive trophoblast cell cluster.
To determine whether transcripts that have multiple associated peaks in rat invasive trophoblast cells also possess multiple associated peaks in human EVT cells, we incorporated open regions (ATAC-seq peaks) identified in EVT cells into our analysis (Varberg et al., 2023 preprint). First, we associated the EVT cell open regions to genes. Then we compared the number of EVT cell peaks associated with genes that have either at least three or fewer than three peaks in rat invasive trophoblast cells (Table S2). We observed that, at both time points, genes with at least three peaks in rat invasive trophoblast cells had significantly more peaks in human EVT cells than genes that had fewer than three peaks in rat invasive trophoblast cells (P<2.2e-16) (Fig. 2D).
Identification of TFs enriched in invasive trophoblast cell-specific peaks
We first defined multiple peak sets that TFs could bind: invasive trophoblast cell-specific peaks identified at both gestation days (named ‘common peaks’, consisting of 1242 peaks) (Table S3); peaks differentially accessible at gd 15.5 (named ‘gd 15.5-specific peaks’, 51 peaks); and peaks differentially accessible at gd 19.5 (named ‘gd 19.5-specific peaks’, 194 peaks) (Table S4). Using gene ontology (GO) analysis, we observed that only common peaks were enriched for relevant biological functions such as ‘cell-cell adhesion’ (FDR=1.16e-04), ‘positive regulation of cell migration’ (FDR=0.012) and ‘female pregnancy’ (FDR=0.017) (Fig. 3A, Table S3), whereas differentially accessible peaks at both stages were not enriched for any processes. This result suggests the following: in the invasive trophoblast cell population, regulatory elements are consistently accessible at gd 15.5 and 19.5 to regulate biologically important genes; using both stages to select common peaks may increase confidence; and stage-specific peaks may be noise. We therefore proceeded with motif enrichment analysis in the common peaks.
Motif analysis identifies transcription factor (TF) combinations regulating invasive trophoblast cell functions. (A) Gene ontology analysis using genes associated with common open chromatin peaks. A term is enriched if its FDR <0.05, its enrichment rate ≥1.5 and the number of observed genes ≥5. Selected terms are shown; for a full list of enriched terms, see Table S3. (B) Representative motifs for enriched TF families found in common open chromatin peaks. Motifs for the top two most highly expressed TFs in each family are shown. If multiple motifs are enriched that correspond to the same TF, then motifs with the highest fold change are shown. See the mapping of motifs to TFs in Table S3. A motif is considered enriched if its hypergeometric adjusted P≤0.05 and it fold change ≥1.5. The P-values were adjusted with the Benjamini-Hochberg procedure. Only motifs corresponding to genes with expression levels of at least 0.5 at both gd 15.5 and gd 19.5 were used in the downstream analysis. (C) Heatmap of hypergeometric adjusted (adj.) P-values showing that some TF family pairs share a significant number of target genes and binding locations. The P-values were adjusted with the Benjamini-Hochberg procedure. Representative motif names (as in B) were used for TF family names. Significance level was 0.05. Blue scale, adj. P-values when testing for significance of shared genes; dark red scale, adj. P-values when testing for significance of shared binding locations. (D) Histogram for number of TF families per common open chromatin peak showing that most open chromatin peaks had at least two TF families predicted to be bound, while there were some common open chromatin peaks with only one TF family predicted to be bound. The x-axis shows the number of TF families per peak; the y-axis shows the number of peaks.
Motif analysis identifies transcription factor (TF) combinations regulating invasive trophoblast cell functions. (A) Gene ontology analysis using genes associated with common open chromatin peaks. A term is enriched if its FDR <0.05, its enrichment rate ≥1.5 and the number of observed genes ≥5. Selected terms are shown; for a full list of enriched terms, see Table S3. (B) Representative motifs for enriched TF families found in common open chromatin peaks. Motifs for the top two most highly expressed TFs in each family are shown. If multiple motifs are enriched that correspond to the same TF, then motifs with the highest fold change are shown. See the mapping of motifs to TFs in Table S3. A motif is considered enriched if its hypergeometric adjusted P≤0.05 and it fold change ≥1.5. The P-values were adjusted with the Benjamini-Hochberg procedure. Only motifs corresponding to genes with expression levels of at least 0.5 at both gd 15.5 and gd 19.5 were used in the downstream analysis. (C) Heatmap of hypergeometric adjusted (adj.) P-values showing that some TF family pairs share a significant number of target genes and binding locations. The P-values were adjusted with the Benjamini-Hochberg procedure. Representative motif names (as in B) were used for TF family names. Significance level was 0.05. Blue scale, adj. P-values when testing for significance of shared genes; dark red scale, adj. P-values when testing for significance of shared binding locations. (D) Histogram for number of TF families per common open chromatin peak showing that most open chromatin peaks had at least two TF families predicted to be bound, while there were some common open chromatin peaks with only one TF family predicted to be bound. The x-axis shows the number of TF families per peak; the y-axis shows the number of peaks.
After the enrichment tests, filtering and TF family grouping, we identified 11 TF families that were enriched in the common peaks (Fig. 3B, Table S3). We observed that some of the TF families have known roles in regulating trophoblast biology. For example, TFAP2C motifs were enriched with the highest fold change in the common peaks. TFAP2C is a member of the AP-2 TF family and is a known regulator of the trophoblast cell lineage in both mouse and human (Kuckenberg et al., 2012; Kaiser et al., 2015; Soares et al., 2018). We further confirmed the enrichment of the TFAP2C binding sites by comparing the rat open regions containing TFAP2C motifs with TFAP2C chromatin immunoprecipitation (ChIP)-seq peaks from differentiated mouse TS cells (Lee et al., 2019). We found that out of the 439 rat peaks with TFAP2C motifs, 208 (47.38%) overlapped TFAP2C peaks in differentiated mouse TS cells, which was significant (P=0.009). Additionally, all 11 TF families enriched in the rat peaks were enriched in EVT cell ATAC-seq peaks (Varberg et al., 2023 preprint) (Table S3). These comparisons provide evidence for the validity of the computationally based binding site predictions.
To determine whether TF functions could be predicted using the binding sites, we carried out functional enrichment analysis on the genes associated with common peaks where the binding sites of TF families were found. We observed four families with at least one term enriched (Table S3), two of which were enriched for important invasive trophoblast functions: ‘NR2F6, Pparg::Rxra’ (thyroid hormone receptor-related factors, RXR-related receptors family and nuclear receptors with C4 zinc-fingers class), which is enriched for ‘positive regulation of cell migration’ and ‘vasculature development’; and ‘TFAP2C’ (the AP-2 family and basic helix-span-helix factors class) enriched for ‘cell-cell adhesion’, ‘positive regulation of cell motility’ and ‘vasculature development’. Many of these observed terms agree with previous findings about roles of the families in trophoblast cell functions (Barak et al., 1999; Auman et al., 2002; Werling and Schorle, 2002).
Next, we investigated which TF families were associated with the same target genes. We observed multiple pairs of TF families that shared a significant number of overlapping target genes, such as: ‘TCF4’ (E2A-related factors family and basic helix-loop-helix factors class) and ‘SNAI1’ (more than three adjacent zinc finger factors family and C2H2 zinc finger factors class) (adjusted P=3.64e-55); ‘JUNB, FOSL2::JUN’ (FOS-related factors, JUN-related factors family and basic leucine zipper factors class) and ‘CREB3, Creb5’ (CREB-related factors family and basic leucine zipper factors class) (adjusted P=9.35e-41); and ‘TFAP2C’ (AP-2 family and basic helix-span-helix factors class) and ‘TCF4’ (E2A-related factors family and basic helix-loop-helix factors class) (adjusted P=7.88e-06) (Fig. 3C, blue scale). Overall, this analysis highlights TF families that share common target genes.
We also checked whether TF family pairs occurred in the same common peaks more than expected by chance. We found six pairs of TF families significantly over-represented together, including ‘TCF4’ (E2A-related factors) and ‘SNAI1’ (more than 3 adjacent zinc finger factors family) (adjusted P=3.09e-58), ‘CREB3, Creb5’ (CREB-related factors family) and ‘JUNB, FOSL2::JUN’ (FOS-related factors and JUN-related factors family) (adjusted P=1.36e-45), and ‘TFAP2C’ (AP-2 family) and ‘TCF4’ (E2A-related factors family) (adjusted P=4.67e-04) (Fig. 3C, dark red scale). Each TF family that is part of the over-represented pairs has been individually connected to the regulation of trophoblast cell function. For example, TCF4 and SNAI1 are regulators of trophoblast cell differentiation and motility (Meinhardt et al., 2014), and trophoblast invasion (E. Davies et al., 2016), respectively. Moreover, most of the common peaks were bound by at least two TF families (Fig. 3D). This analysis suggested that TF families can bind in the same locations to interact and regulate cell type-specific functions. TFs can also bind individually to act in their regulatory roles.
Identification of conserved, invasive trophoblast cell-specific regulatory regions using network analysis
We aimed to create a network that would provide information on TFs that activate gene expression through conserved cis-regulatory regions. We first determined that 264 of the common peaks in rat invasive trophoblast cells were conserved in human EVT based on overlap with EVT ATAC-seq data (Table S5). Of note, a significantly higher proportion of common peaks was conserved compared with gd 19.5-specific peaks (P=1.039e-05), and only 11 gd 15.5-specific peaks were conserved, further justifying the use of common peaks in the network analysis. Next, because of the moderate correlation between ATAC-seq and RNA-seq signal (see ‘Identification of chromatin accessibility profiles in cell types of the rat uterine-placental interface’ section above), we determined whether conserved common peaks also had signal for H3K27ac, a histone modification associated with active enhancers, by using ChIP-Seq data from Varberg et al. (2023 preprint). Indeed, we found there was a significant overlap (P<2.2e-16) of conserved common peaks and EVT H3K27ac peaks (Table S5), indicating that the conserved common peaks are likely to contribute to activating associated gene expression. Finally, we established the network by compiling several datasets: (1) conserved common peaks, (2) motifs enriched within these regions and (3) conserved genes that exhibited invasive trophoblast cell-specific expression, according to the scRNA-seq analysis (Scott et al., 2022), at both gd 15.5 and 19.5. The resulting network had 11 source nodes, corresponding to 11 TF families, and 34 target genes (Fig. 4A, Table S5).
Network analysis predicts candidate genes and their distal regulatory elements that govern invasive trophoblast cell functions. (A) Analysis of a network of TF families and target genes, highlighting candidate genes and their distal regulatory elements underlying invasive trophoblast cell functions. Rectangular nodes: TF families with representative motif names (as in Fig. 3A). Round nodes: target genes. The darker the color, the higher the node in-degree centrality. Directed edges are symbolized by arrows and indicate that peaks with the predicted TF families regulate the target genes. (B) Chromatin accessibility tracks of a candidate invasive trophoblast (iTB) cell-specific distal element associated with the Cited2 gene in the rat genome (rn6). A region of interest is highlighted in light blue. norm., normalized. (C) Locations of the candidate region, ATAC-seq peaks in EVT cells and TFAP2C ChIP-seq peaks in the human genome (hg38). A region of interest is highlighted in light blue.
Network analysis predicts candidate genes and their distal regulatory elements that govern invasive trophoblast cell functions. (A) Analysis of a network of TF families and target genes, highlighting candidate genes and their distal regulatory elements underlying invasive trophoblast cell functions. Rectangular nodes: TF families with representative motif names (as in Fig. 3A). Round nodes: target genes. The darker the color, the higher the node in-degree centrality. Directed edges are symbolized by arrows and indicate that peaks with the predicted TF families regulate the target genes. (B) Chromatin accessibility tracks of a candidate invasive trophoblast (iTB) cell-specific distal element associated with the Cited2 gene in the rat genome (rn6). A region of interest is highlighted in light blue. norm., normalized. (C) Locations of the candidate region, ATAC-seq peaks in EVT cells and TFAP2C ChIP-seq peaks in the human genome (hg38). A region of interest is highlighted in light blue.
In this network, there are six genes with an in-degree centrality of at least five. These six genes were associated with invasive trophoblast cell-specific peaks predicted to be bound by TFs connected to at least five TF families. These genes were Plk2 (linked with five TFs), Scap (linked with five TFs), AABR07027306.1 (PHACTR1 human ortholog, linked with six TFs), Pcdh12 (linked with six TFs), Galnt6 (linked with six TFs) and Col4a1 (linked with seven TFs) (Fig. 4A). Pcdh12, Plk2, Scap and Col4a1 have previously been linked to the regulation of placental development (Ma et al., 2003; Bouillot et al., 2006; 2011; Schenke-Layland et al., 2007; Oefner et al., 2015; Okae et al., 2018; Li et al., 2021). Although Phactr1 and Galnt6 have not been directly implicated in trophoblast cell biology, they have been shown to regulate migration and invasion of cancer cells (Song et al., 2020; Herman et al., 2021; Gao and Zheng, 2022). Despite some known or implicated roles of these genes in placenta, little is known about the mechanisms by which they are regulated in specific cell types, including their associated cis-regulatory elements and the transcription factors that might bind to them. Further analysis of the involvement of these genes in the regulation of the invasive trophoblast cell lineage, as well as the regulatory mechanisms governing these genes in the invasive trophoblast cell population, is merited. Regulatory elements and the enriched motifs associated with these genes as well as all other target genes in the network can be found in Table S5.
Moreover, other target genes in the network and their distal elements could also be important for regulating invasive trophoblast cell functions. For example, Cited2, a gene required for trophoblast cell differentiation, placental development and regulation of invasive trophoblast/EVT cells (Withington et al., 2006; Moreau et al., 2014; Imakawa et al., 2016; Kuna et al., 2023), was predicted to be regulated by a distal peak where TFAP2C and STAT3 motifs were found (Fig. 4A,B). This peak (chr1:12808761-12809434) also overlapped with a TFAP2C ChIP-seq peak from differentiated mouse TS cells (Lee et al., 2019) (Fig. 4C), suggesting that it may be bound in vivo.
Together, the target genes, regulatory elements and TFs we identified will be candidates for future experiments to interrogate gene regulatory networks controlling invasive trophoblast cells.
DISCUSSION
The invasive trophoblast cell lineage is an evolutionary adaptation that facilitates viviparity in mammals possessing hemochorial placentation (Pijnenborg et al., 1981). Invasive trophoblast cells acquire migratory behavior, penetrate the uterine parenchyma and serve a transformative role on cellular constituents, ensuring a successful pregnancy outcome (Soares et al., 2018; Knöfler et al., 2019; Turco and Moffett, 2019). The root cause of many obstetric complications is predicted to be a failure in invasive trophoblast cell-guided uterine transformation (Brosens et al., 2011; 2019). Surprisingly, existing knowledge of gene regulatory networks controlling development and function of the invasive trophoblast cell lineage is modest. In this article, we sought to provide new insights into the regulation of the invasive trophoblast cell lineage. Our efforts focused on the rat, a species possessing deep intrauterine trophoblast cell invasion with similarities to human placentation and amenable to testing hypotheses pertaining to the invasive trophoblast cell lineage in vivo (Soares et al., 2012; Shukla and Soares, 2022). We integrated snATAC-seq and scRNA-seq (Scott et al., 2022) datasets from the rat uterine-placental interface with the goal of gaining insight into gene regulatory networks controlling the invasive trophoblast cell lineage. Chromatin accessibility profiles for each of the cellular constituents of the uterine-placental interface were determined. An in-depth analysis of invasive trophoblast cells led to the identification of invasive trophoblast cell specific genes, TFs and TF target genes. A correlation was established between the presence of invasive trophoblast cell-specific open chromatin and gene expression. Using DNA motif binding enrichment and network analysis, we predicted TF pairs and cis-regulatory elements linked to invasive trophoblast cell genes. The efforts led to the recognition of conservation between rat and human invasive trophoblast cell lineages and predictions of distal regulatory elements within the invasive trophoblast cell lineage.
Our approach of relating open chromatin to gene expression profiles is not perfect. Gene regulatory regions can regulate multiple genes (McLean et al., 2010) and can be located considerable distances from the gene they regulate (Lin et al., 2022). We observed that most open chromatin regions were distal to genes. Moreover, the open chromatin-gene association rule we used, together with the stringent requirement for conserved regulatory regions and genes, contributed to the inference of a relatively small and manageable network of TFs and target genes. This contributed to a straightforward network analysis that enabled the prediction of relevant interactions. Other computational methods, such as co-accessibility analysis, which employs chromatin accessibility profiles to predict interactions of cis-elements (Pliner et al., 2018), represent a complementary approach. Although our network construction method involved using only conserved open regions and conserved target genes, this does not negate the merits of investigating TFs and target genes inferred with species-specific elements.
Candidate TFs driving gene regulation in invasive trophoblast cells were identified through their expression in invasive trophoblast cells and through the presence of corresponding TF DNA-binding motifs associated with invasive trophoblast cell-specific genes. The most striking TF families linked to the invasive trophoblast cell lineage exhibit conservation in human EVT cells (Varberg et al., 2023 preprint) and have been previously implicated in trophoblast cell biology (Rossant et al., 2001; Hemberger et al., 2020). Most interestingly, many of the invasive trophoblast cell relevant TFs are implicated in early phases of trophoblast cell lineage development or the differentiation of other trophoblast cell lineages. For example, mouse mutagenesis has demonstrated indispensable roles for Tfap2c, Cdx2, Ets2 and Pparg in trophoblast cells and placentation that precede the appearance of the invasive trophoblast cell lineage (Chawengsaksophak et al., 1997; 2004; Yamamoto et al., 1998; Barak et al., 1999; Auman et al., 2002; Werling and Schorle, 2002). Some of these TFs were predicted to regulate the same genes based on the motif enrichment analysis, and all of these TFs had a high degree of connectivity with each other in the network we present. Previous studies have determined that TFs can work in combination to regulate trophoblast cell lineages, but different TF partnerships are implicated in the regulation of distinct processes (Latos et al., 2015; Latos and Hemberger, 2016; Hemberger et al., 2020). Re-use of trophoblast lineage associated TFs in the regulation of invasive trophoblast cells is intriguing but creates experimental challenges. Future in vivo investigation will necessitate the establishment of conditional mutagenesis rat models specific to the invasive trophoblast cell lineage. Such efforts will be facilitated by the integration of single-nucleus chromatin accessibility and single-cell gene expression profiles reported here. Unique TF combinations at gene regulatory domains and/or the recruitment of unique sets of co-regulators may prove crucial to invasive trophoblast cell biology.
We identified genes with high in-degree centrality through network analysis, some of which have been suggested to be involved in placental development. However, little is known about their roles and regulatory mechanisms in the invasive trophoblast cell lineage. For example, Plk2−/− mice have a diminished labyrinth zone due to decreased cell proliferation (Ma et al., 2003). Scap knockout in pericytes of the mouse placental labyrinth layer leads to embryonic lethality and abnormal placenta structure (Li et al., 2021). Nevertheless, the specific functions of these genes in invasive trophoblast cells are unknown. The gene candidates, regulatory elements and TFs identified with the network analysis in this article will serve as a useful resource for forming and testing of hypotheses related to the regulation of invasive trophoblast cell lineage.
The uterine-placental tissue used in generating the snATAC-seq and scRNA-seq data contains invasive trophoblast cells that have exited the placenta and entered the uterus, and thus represent a differentiated cell type. We did not observe any evidence for multiple types of differentiated invasive trophoblast cell types, nor did we detect evidence for invasive trophoblast cell progenitors. This latter population of progenitor cells should reside in the junctional zone of the rat placenta or the EVT cell column of the human placenta. Thus, the present analysis is biased towards characterization of a mature invasive trophoblast cell population. Consequently, the invasive trophoblast cell gene signature, including TFs, may best represent requirements for maintenance of the invasive trophoblast cell state. Comparisons of these rat invasive trophoblast cell chromatin and gene expression profiles with human EVT cell populations isolated from first trimester tissues (Liu et al., 2018; Suryawanshi et al., 2018; Vento-Tormo et al., 2018; Sun et al., 2020; Marsh et al., 2022; Scott et al., 2022; Varberg et al., 2023 preprint) or derived from human TS cells (Okae et al., 2018; Varberg et al., 2023 preprint) have some inherent limitations. Elucidation of single-cell multi-omic profiles for the junctional zone will provide valuable information regarding derivation of the invasive trophoblast cell lineage and further insights into conservation of this important developmental process.
The datasets and analyses presented in this article represent a framework for constructing hypotheses relevant to establishing a gene regulatory network that controls the invasive trophoblast cell lineage. A research approach can now proceed involving identification of candidate conserved regulatory pathways, evaluating the importance of the regulators using TS cell models and testing critical hubs within the pathways using relevant in vivo rat models.
MATERIALS AND METHODS
Animals
Holtzman rats were originally purchased from Envigo. Rats were maintained on a 14 h light/10 h dark cycle with open access to food and water. Timed pregnancies were obtained by mating adult males (>10 weeks of age) and adult females (8-12 weeks of age). Pregnancies were confirmed the next morning by the presence of sperm in a saline vaginal lavage and defined as gd 0.5. Protocols for research with animals were approved by the University of Kansas Medical Center (KUMC) Animal Care and Use Committee.
Cell isolation from tissue
Uterine-placental interface tissue (also called metrial glands) was dissected from gd 15.5 (n=3 pregnancies) and 19.5 rat placentation sites (n=3 pregnancies) as previously described (Ain et al., 2006; Scott et al., 2022) and put in ice-cold Hank's balanced salt solution (HBSS). Tissues were minced into fine pieces with a razor blade and digested in Dispase II (1.25 units/ml, D4693, Sigma-Aldrich), 0.4 mg/ml collagenase IV (C5138, Sigma-Aldrich) and DNase I (80 units/ml, D4513, Sigma-Aldrich) in HBSS for 30 min. Red blood cells were lysed using ACK lysis buffer (A10492-01, Thermo Fisher Scientific), rotating at room temperature for 5 min. Samples were washed with HBSS supplemented with 2% fetal bovine serum (FBS, Thermo Fisher Scientific) and DNase1 (Sigma-Aldrich), and passed through a 100 μm cell strainer (100ICS, Midwest Scientific). Following enzymatic digestion, cell debris was removed using MACS Debris Removal Solution (130-109-398, Miltenyi Biotec). Cells were then filtered through a 40 μm cell strainer (40ICS, Midwest Scientific) and cell viability was assessed, which ranged from 90 to 93%.
Nuclei isolation, library preparation and sequencing
Cells were isolated from gd 15.5 and 19.5 uterine-placental interface tissue as described above, and nuclei were isolated from the cell suspension according to the 10X Genomics Nuclei Isolation protocol. Briefly, cells were washed with HBSS supplemented with 2% FBS (Thermo Fisher Scientific) and cell number determined. Approximately 500,000 cells were centrifuged, and 100 µl 10X Genomics Nuclei Isolation Lysis Buffer was added. The suspension was incubated for 3 min, then 10X Genomics Nuclei Isolation Wash Buffer was added. Cells were passed through a 40 µm cell strainer and centrifuged. Cells were resuspended in 50 µl chilled 10X Genomics Nuclei Isolation Buffer. Single nuclei were captured using the Chromium Controller into 10X barcoded gel beads. Libraries were generated using a Chromium Next GEM Single Cell ATAC Library & Gel Bead Kit v1.1 (10X Genomics) and sequenced in a NovaSeq6000 sequencer at the KUMC Genome Sequencing Core.
snATAC-seq preprocessing
Read alignment to the rat genome (Rnor 6.0, Ensembl 98; Cunningham et al., 2019), primary peak calling and feature quantification were performed using Cell Ranger Software (version 4.0.0). Quality control steps and downstream analyses were performed using the R package Signac (version 1.1.1) (Stuart et al., 2019). Unless otherwise reported, default parameters were used. We identified accessible regions using the CallPeaks() function in Signac, which uses model-based analysis for ChIP-seq (MACS) (Zhang et al., 2008). Parameters used for the analyses were nuclei with a total number of fragments in peaks ranging from 1000 to 20,000, greater than 15% of reads in peaks and an enrichment ratio higher than 1.5 at transcription start sites (Fig. S1). We normalized across samples and across peaks using term frequency-inverse document frequency, which is implemented through RunTFIDF() in Seurat. We used method =3, which computes log(term frequency)×log(IDF), due to great sparsity in the feature matrix and strong count outliers (Fig. S2). All features are retained to perform dimension reduction with singular value decomposition (SVD). Normalization with term frequency-inverse document frequency followed by SVD is also known as latent semantic indexing (LSI) (Cusanovich et al., 2015). We also investigated the correlations between sequencing depth and LSI components (using the DepthCor() function) as well as ranking the LSI components using the percentage of variance [using the ElbowPlot() function]. As a result, we kept LSI components 2 to 20 for gd 15.5 replicates, and LSI components 2 to 10 for gd 19.5 replicates (Fig. S2). Replicates for each time point were then merged using the Merge() function in Seurat.
snATAC-seq clustering
To identify cell clusters for each time point, we used K-nearest neighbor (KNN) graphs with retained significant LSI components and the smart local moving algorithm (Waltman and Van Eck, 2013), which was implemented through the Seurat functions FindNeighbors() and FindClusters(). The clusters were then visualized with uniform manifold approximation and projection (UMAP).
scRNA-seq and snATAC-seq integration – label transferring
To transfer cluster labels from our corresponding scRNA-seq data, we used the FindTransferAnchors() and TransferData() functions in the Seurat package (version 4.1.0) (Butler et al., 2018). Briefly, this process uses canonical correlation analysis for initial dimension reduction, then identifies cell neighborhoods with KNNs and mutual nearest neighbors (MNNs). The correspondences between cells were referred to as ‘anchors’. Next, the anchors were given scores and weights to eliminate incorrect correspondences and to define the association strengths between cells and anchors. Finally, anchor classification and anchor weights were used to transfer labels from scRNA-seq to snATAC-seq data.
To check the correlation between snATAC-seq and scRNA-seq profiles in each cell population, we first estimated the chromatin accessibility profiles around transcription start sites, referred to as the gene activity, using the Signac function GeneActivity(). Then Spearman correlation and its statistical significance were calculated using the R function cor.test() (stats package version 4.0.2; http://www.r-project.org).
Analysis of cell population-specific peaks
The FindAllMarkers() function was used with cell identities transferred from scRNA-seq data and the fragment counts in peaks, to compare chromatin accessibility profiles between cell types for each gd. We used a logistic regression framework with a latent variable of the total number of fragments in peaks to account for the difference in sequencing depths. A peak is considered more accessible in a cell population (and hence specific) if it has an adjusted P≤0.05 and an average log2(fold change) ≥log2(1.5).
Rat peaks were associated with the nearest gene (according to the start position) on the same chromosomes using the Signac function ClosestFeature() with the underlying genome annotation from Ensembl 98 (Cunningham et al., 2019). This association rule was also used when the distance distribution of peaks to transcription start sites was calculated with the R package ChIPseeker (Yu et al., 2015). ATAC-seq peaks in EVT cells (Varberg et al., 2023 preprint) were associated with the single nearest genes with the maximum distance of 1000 kb around the TSS using GREAT (Genomic Regions Enrichment of Annotations Tool) (McLean et al., 2010). Rat genes were mapped to their one-to-one human orthologs using gene mapping from Ensembl 98.
To assess changes in the expression level of transcripts with different numbers of associated peaks or differences in the numbers of EVT peaks between two gene groups, we used the Wilcoxon rank sum test, implemented with the R function wilcox.test() (stats package version 4.0.2; http://www.r-project.org). To test the significance of overlap between genes with at least three peaks and invasive trophoblast cell markers, we used the hypergeometric test with the R function phyper() (stats package version 4.0.2; http://www.r-project.org) using options lower.tail=TRUE. In all tests, the significance level used was 0.05.
Common peaks, differentially accessible peaks between gestational days, peak mapping across species and conserved peaks
Common invasive trophoblast cell-specific peaks between the two gd were obtained using bedtools intersect (version 2.27.1) (Quinlan and Hall, 2010). Regions between the two gd were considered common if at least 50% of the base pairs overlapped.
Differential accessibility tests of invasive trophoblast cell-specific peaks between the two gd were carried out using Seurat function FindMarkers() with peaks in the invasive trophoblast cell clusters at each gd and the fragment counts in peaks. A logistic regression framework with a latent variable of the total number of fragments in peaks was used to account for the difference in sequencing depths. A peak was considered differentially accessible at a gd (and hence gd-specific) if it had an adjusted P≤0.05 and an average log2(fold change) ≥log2(1.5).
To analyze peak functions, we associated the peaks with the nearest genes using the Signac function ClosestFeature() (described above). The gene lists were then used to carry out gene ontology analysis with Webgestalt (version 2019) (Liao et al., 2019) with the rat genome. A term was considered enriched if its FDR<0.05, its enrichment rate≥1.5 and the number of observed genes≥5.
To compare peaks across species (rat, mouse and human), all peak sets were converted to human coordinates (hg38) using LiftOver (default settings) (Hinrichs et al., 2006). Bedtools intersect (version 2.27.1) (Quinlan and Hall, 2010) was used to identify conserved peaks, which were defined as peaks that overlapped with ATAC-seq peaks in EVT cells (Varberg et al., 2023 preprint) by at least 1 bp. To compare the proportion of conserved peaks between common and gd 19.5-specific peak sets, we used prop.test() (stats package version 4.0.2; http://www.r-project.org) with option alternative=“greater”.
To compare ATAC-seq peak sets with peaks from H3K27ac data (Varberg et al., 2023 preprint), bedtools intersect (version 2.27.1) (Quinlan and Hall, 2010) was used so that ATAC-seq peaks overlap with H3K27ac peaks by at least 1 bp. Next, the significance of the overlap was determined using Fisher's exact test, with the option alternative=“greater” and a significance level of 0.05.
Motif analysis with common peaks
To identify enriched motifs in common peaks, we used the Homo sapiens, Mus musculus and Rattus norvegicus motif databases from JASPAR (version 2020) (Fornes et al., 2020). A BSgenome object for Rattus norvegicus, which is necessary to add motif information to Seurat objects, was built using the Bsgenome R package (version 1.58.0) (Hervé, 2020) and genome sequences were obtained from Ensembl 98 (Cunningham et al., 2019). We used the gd 19.5 coordinates of the common peak sets as input, then generated a set of 50,000 background sequences with matched length and GC content distribution using the Seurat function MatchRegionStats(). For each motif, we calculated a fold change as the percentage the motif is observed in the input sequences divided by the percentage it is observed in the background. A motif is considered enriched if its hypergeometric adjusted P≤0.05 and fold change ≥1.5. The P-values were adjusted with the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995).
To identify motif groups, we first mapped enriched motifs for all three organisms to their corresponding TFs using TF-motif mapping information from the JASPAR database, then retained only TFs with an expression level of at least 0.5 at both gd using the scRNA-seq data. Next, we grouped TFs according to their protein families, also obtained from the JASPAR database.
To compare the predicted binding sites of the protein TFAP2C with previously published data from Lee et al. (2019), we accessed the TFAP2C ChIP-seq data generated from differentiated TS cells through the GEO ID GSM3019344. A rat peak with TFAP2C motifs was defined to agree with mouse TFAP2C ChIP-seq peaks if they overlapped by at least 1 bp, as assessed with bedtools intersect (version 2.27.1) (Quinlan and Hall, 2010). The significance of the overlap was determined using Fisher's exact test, with the option alternative=“greater” and a significance level of 0.05.
To carry out functional enrichment of target genes of the enriched TF families, we used Webgestalt (version 2019) (Liao et al., 2019) with the rat genome. A term was considered enriched if its FDR<0.05, its enrichment rate≥2 and the number of observed genes≥5.
To test for over-representation of shared genes and shared binding locations, we used hypergeometric tests with the R function phyper() (stats package version 4.0.2; http://www.r-project.org) using options lower.tail=FALSE. Correction for multiple testing was carried out using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). Significance level was set at 0.05.
Network inferences and analyses with conserved common peaks
In our networks, an edge between a TF family and a gene means the gene is the nearest one to conserved common peaks with the enriched motifs of the family. Source nodes in the network were TF families named with representative motifs. Target genes were marker genes of the invasive trophoblast cell clusters at both gd and were conserved in EVT cells according to the scRNA-seq data (Scott et al., 2022). The network was visualized and analyzed with Cytoscape (Shannon et al., 2003).
Acknowledgements
We thank the Research IT group at Iowa State University (http://researchit.las.iastate.edu) for providing servers and IT support, and members of the Tuteja and Soares laboratories for their valuable discussions.
Footnotes
Author contributions
Conceptualization: H.T.H.V., R.L.S., K.I., M.J.S., G.T.; Formal analysis: H.T.H.V., R.L.S., K.I.; Investigation: H.T.H.V., R.L.S., K.I., M.J.S., G.T.; Writing - original draft: H.T.H.V., R.L.S., K.I., M.J.S., G.T.; Writing - review & editing: H.T.H.V., R.L.S., K.I., M.J.S., G.T.; Supervision: M.J.S., G.T.; Funding acquisition: M.J.S., G.T.
Funding
This study was supported by the National Institutes of Health National Research Service (HD104495 to R.L.S.), by the National Institutes of Health (HD020676, ES029280, HD099638 and HD105734 to M.J.S.; HD104033 to M.J.S. and G.T.; and HD096083 to G.T.) and by the Sosland Foundation (M.J.S.). G.T. is a Pew Scholar in the Biomedical Sciences, supported by The Pew Charitable Trusts. The views expressed are those of the author(s) and do not necessarily reflect the views of the funding agencies. Open Access funding provided by Iowa State University of Science and Technology. Deposited in PMC for immediate release.
Data availability
The snATAC-seq datasets we generated have been deposited in GEO under accession number GSE227943. All code used for the analyses are available in Zenodo (doi:10.5281/zenodo.8125124) and at https://github.com/Tuteja-Lab/MetrialGland-snATAC-seq/tree/snATAC_v1.0.0.
References
Competing interests
The authors declare no competing or financial interests.