The morphology of the flowering plant is established during early embryogenesis. In recent years, many studies have focused on transcriptional profiling in plant embryogenesis, but the dynamic landscape of the Arabidopsis thaliana proteome remains elusive. In this study, Arabidopsis embryos at 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages were collected for nanoproteomic analysis. In total, 5386 proteins were identified. Of these, 1051 proteins were universally identified in all developmental stages and a range of 27 to 2154 proteins was found to be stage specific. These proteins could be grouped into eight clusters according to their expression levels. Gene Ontology enrichment analysis showed that genes involved in ribosome biogenesis and auxin-activated signalling were enriched during early embryogenesis, indicating that active translation and auxin signalling are important events in Arabidopsis embryo development. Combining RNA-sequencing data with the proteomics analysis, the correlation between mRNA and protein was evaluated. An overall positive correlation was found between mRNA and protein. This work provides a comprehensive landscape of the Arabidopsis proteome in early embryogenesis. Some important proteins/transcription factors identified through network analysis may serve as potential targets for future investigation.

Flowering plants repeat their diploid phase beginning from double fertilization when one sperm fuses with the central cell to develop into endosperm, while another one fuses with the egg cell to develop into embryo (Dresselhaus et al., 2016; Goldberg et al., 1994). In Arabidopsis, the fertilized egg cell, also called the zygote, asymmetrically divides into an apical cell that serves as the origin of the embryo proper, and a basal cell as the origin of the suspensor. The apical cell undergoes one round of longitudinal division to form the 2-cell embryo proper, and one more longitudinal division to form the 4-cell embryo proper. After that, transverse divisions occur to produce the 8-cell embryo proper. With periclinal divisions, eight outer protodermal cells are produced. This stage is called 16-cell or dermatogen. The ground tissue initials form at the early globular (32-cell) stage and subsequent divisions generate endodermis and cortex at the late globular stage (referred to here as the globular stage), during which the root apical meristem and the shoot apical meristem are initiated. At the heart stage, root apical meristem and shoot apical meristem have been established, and shoot apical meristem is flanked by the cotyledon primordia. Therefore, the heart-stage embryo is bilaterally symmetrical. Cellular development from the zygote to the heart stage is defined as early embryogenesis (also called morphogenesis) (Armenta-Medina and Gillmor, 2019; Dresselhaus and Jurgens, 2021; Dresselhaus et al., 2016; Goldberg et al., 1994; Lau et al., 2012; Palovaara et al., 2016; Yoshida et al., 2014).

High-throughput omics studies have enriched our knowledge on gene expression regulation in plant embryogenesis. Regulation of gene expression may happen at various levels. At the chromatin level, Bouyer et al. (2017) and Kawakatsu et al. (2017) found that DNA methylation is dynamically changed and RNA-directed DNA methylation is crucial to plant embryogenesis. At the transcription level, Nodine and Bartel (2012) unveiled the equal contribution of the maternal and paternal genomes to the transcriptome during the initial stage of embryogenesis. Zhao et al. (2019) described a two-step maternal-to-zygotic transition process and the equal contribution of parental genomes at the elongated zygote stage. In addition, Zhao et al. (2020) demonstrated that equal parental contribution to the transcriptome may also result in different cell fate. Controversially, an earlier study by Autran et al. (2011) showed that the maternal genome is the primary contributor to the transcriptome even at the globular stage, probably as a result of maternal contamination in the sample. In rice, zygotic genome activation was initiated soon after fertilization, although with asymmetric parental contribution (Anderson et al., 2017). Similarly, zygotic genome activation occurs shortly after fertilization in maize (Chen et al., 2017). Using transgenic labelling and affinity purification of nuclei, Palovaara et al. (2017) generated a cell type-specific expression atlas of the early Arabidopsis embryo. Very recently, Kao et al. (2021) described gene expression variation at single-nucleus resolution in Arabidopsis globular stage embryos. Except for the study in the transcriptome, the dynamics of microRNAs in Arabidopsis embryogenesis (Plotnikova et al., 2019) and a 24-nt siRNA in rice zygotes (Li et al., 2022) were unveiled. However, at the translation level, the proteome in plant early embryogenesis remains unexplored. One major challenge for proteome studies in plant early embryogenesis is the difficulty of isolating embryos, which are embedded deeply into the seed coat and surrounded by maternal tissue. Although a method for efficient isolation of Arabidopsis early embryos has been established in recent years (Raissig et al., 2013; Zhou et al., 2019a), other challenges remain as mass spectrometry technology generally requires a large amount of protein sample for detection. Encouragingly, the advancement of instruments and improvement of sample preparation techniques make nanoproteomics and even single-cell proteomics feasible (Labib and Kelley, 2020; Yi et al., 2017). As executors of the biological process, proteins are more biologically relevant to the phenotype compared with mRNA. The mRNA level does not always accurately reflect the protein level (Fortelny et al., 2017). Researchers have found that gene regulatory networks constructed by co-expression of mRNAs were rather different from those constructed by co-expression of proteins (Walley et al., 2016). Therefore, it is necessary to investigate the dynamic proteome during plant early embryogenesis in order understand this important process at the translation level.

In this work, we conducted nanoproteomics analysis using Arabidopsis embryos at different stages to uncover dynamic changes in the proteome during early embryo development. Through a combined analysis of transcriptome and proteome data, the relationship between mRNA and protein during plant early embryogenesis is revealed.

Isolation of quality Arabidopsis embryos for nanoproteomics

To prepare the samples, we manually collected Arabidopsis embryos at 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages under the microscope. Stages were classified based on hours after pollination, as well as the embryo morphology (Fig. 1A, Table S1). The absence of contaminating maternal tissue was confirmed using a brightfield microscope (Fig. S1). The viability of embryos was validated by fluorescein diacetate staining (Fig. 1A). Isolated embryos were used to perform nanoproteomics with three biological replicates for each stage; the number of embryos used can be found in Table S1. We adopted the ‘Match Between Run’ strategy whereby a spectral library was prepared and then nanoproteomics samples were run together with the spectral library sample (Tyanova et al., 2016). There were 11,597 proteins and 92,926 peptides in the spectral library (Fig. S2). To examine the data quality of nanoproteomics samples, principal component analysis (PCA) was conducted (Fig. 1B). It was found that biological replicates of each stage were clustered together in the PCA plot, indicating the reliability of our nanoproteomics data. The reliability of nanoproteomics data was further confirmed by Pearson's correlation coefficient (PCC), which was over 0.86 between biological replicates (Fig. S3). In addition, the PCA plot showed distinct developmental trajectories.

Fig. 1.

Overview of the proteome in Arabidopsis early embryogenesis. (A) Confocal images illustrating embryos collected at the 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages. Fluorescein diacetate staining confirmed the viability of embryos. BF, brightfield. Scale bar: 20 µm. (B) PCA illustrating the reproducibility of the proteomics analysis. The same colour represents biological replicates at each stage. Blue arrows illustrate the developmental trajectory. (C) The overlap of proteins that were identified in each stage. The red ellipse represents the common proteins identified among all stages. Red rectangles represent SEPs at each stage. (D) Bar graph showing the number of SEPs identified in C and identified by chance. Red bars represent the number of proteins identified in our experiment and grey bars represent the median of the number of proteins identified by chance (permutation test). The P-values were generated from the permutation test.

Fig. 1.

Overview of the proteome in Arabidopsis early embryogenesis. (A) Confocal images illustrating embryos collected at the 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages. Fluorescein diacetate staining confirmed the viability of embryos. BF, brightfield. Scale bar: 20 µm. (B) PCA illustrating the reproducibility of the proteomics analysis. The same colour represents biological replicates at each stage. Blue arrows illustrate the developmental trajectory. (C) The overlap of proteins that were identified in each stage. The red ellipse represents the common proteins identified among all stages. Red rectangles represent SEPs at each stage. (D) Bar graph showing the number of SEPs identified in C and identified by chance. Red bars represent the number of proteins identified in our experiment and grey bars represent the median of the number of proteins identified by chance (permutation test). The P-values were generated from the permutation test.

Qualitative and quantitative analysis of proteins during Arabidopsis early embryogenesis

Even with limited embryo samples, 7649 protein groups with 28,466 peptides were identified in total. To improve accuracy, only those proteins identified in two out of three replicates were selected. In total, 5386 proteins were identified in at least one stage, and 1051 proteins were identified in all stages (Fig. 1C). Detailed information can be found in Table S1 and Fig. S4A. Intensity-based absolute quantification (iBAQ), which considers all the peptides that a gene can theoretically produce, resulting in fewer missing values, was used to quantify the identified protein (Tyanova et al., 2016). The iBAQ value was normalized by dividing the total intensity within samples to avoid the bias caused by different sampling amounts. Consistent with previous research (Mergner et al., 2020), protein abundance spanned over a large range with a minimum intensity of 3.08×103 to a maximum intensity of 3.03×109 at the 2/4-cell stage, for instance (Fig. S4B). Previous research (Bassal et al., 2020) indicated that proteins with a higher number of transcripts and with gene length between ∼1500 bp and ∼3500 bp are most likely to be identified. This was also true for our data (Fig. S4C,D).

Different features between stage exclusive proteins and universal proteins

As shown in Fig. 1C, in total 80, 27, 31, 58, 126 and 2154 proteins were only identified at 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages, respectively (Table S2). We called these proteins ‘stage-exclusive proteins’ (SEPs). We performed a permutation test to estimate whether the number of SEPs in each stage was detected by chance. The result showed that the number of SEPs in the heart stage was significant, whereas the number of SEPs in other stages was not (Fig. 1D). Gene Ontology (GO) analysis was conducted to characterize these SEPs further. At the 2/4-cell stage, regulation of chromatin organization (ELF8, VEL1, AT5G50780), lipid modification (LOX2, RHD4, SPP1) and cellular response to DNA-damage stimulus (RPA70B, LINC2, REV7) were enriched for genes encoding 2/4-cell-stage SEPs. GO terms enriched for genes encoding SEPs at the 16-cell, 32-cell and globular stages were related to regulation of transport, organelle localization, microtubule organization, etc. Consistent with the phenomenon that embryos turn green at the heart stage, chlorophyll metabolic process (e.g. HEMD, NTRC, SOX, FLU), plastid organization (e.g. EMB93, EMB2458, EMB269) and thylakoid membrane organization (e.g. CAS1, GP ALPHA 1), etc., were enriched for genes encoding heart-stage SEPs. Meanwhile, starch metabolic (e.g. NTRC, SS2, MFP1) was found to be enriched with genes encoding heart-stage SEPs, corresponding to the need for nutrient storage for seed maturation, and the GO term of embryo development ending in seed dormancy (e.g. EMB2761, EMB2762, ALE2) was also enriched (Fig. 2A).

Fig. 2.

The characteristics of proteins identified during Arabidopsis early embryogenesis. (A) GO analysis of SEPs in 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages by Metascape (P<0.01). No enrichment GO terms were identified for 8-cell SEPs. (B) GO network of proteins that were identified in all stages. GO terms with a similarity >0.3 are connected by edges. Terms were selected with the best P-values from each of the 20 clusters [−log10(P-value) are given at the end of selected terms]. Each node represents an enriched term and is coloured according to its cluster ID. Node size is scaled by the number of genes in each term. (C) The percentage of protein intensity for the ten most abundant proteins in each stage. Gene symbols are marked beside the bar and ordered by the percentage of protein intensity from the bottom to the top. All the black gene symbols were annotated as ribosome-related proteins.

Fig. 2.

The characteristics of proteins identified during Arabidopsis early embryogenesis. (A) GO analysis of SEPs in 2/4-cell, 8-cell, 16-cell, 32-cell, globular and heart stages by Metascape (P<0.01). No enrichment GO terms were identified for 8-cell SEPs. (B) GO network of proteins that were identified in all stages. GO terms with a similarity >0.3 are connected by edges. Terms were selected with the best P-values from each of the 20 clusters [−log10(P-value) are given at the end of selected terms]. Each node represents an enriched term and is coloured according to its cluster ID. Node size is scaled by the number of genes in each term. (C) The percentage of protein intensity for the ten most abundant proteins in each stage. Gene symbols are marked beside the bar and ordered by the percentage of protein intensity from the bottom to the top. All the black gene symbols were annotated as ribosome-related proteins.

By contrast, the universal proteins were more related to maintaining fundamental cellular activities, for example ribonucleoprotein complex biogenesis and assembly, ATP metabolic process, cellular amino acid biosynthesis. In addition, DNA conformation change (e.g. MCMs, GRPs) was identified as a GO term for universal proteins, indicating that transcriptional regulation may play important roles in plant early embryogenesis (Fig. 2B).

ZAR1 is strongly expressed in early embryogenesis

Interestingly, the top ten most abundant proteins represented ∼40% of the identified proteome in 2/4-cell, 8-cell, and 16-cell stages, ∼30% in 32-cell and globular stages, but only 13.5% in the heart stage (Fig. 2C). ZYGOTIC ARREST 1 (ZAR1) is a plasma membrane-localized leucine-rich repeat receptor-like kinase (LRR-RLK). Homozygous function mutation of ZAR1 is embryo lethal and fails to properly make the first asymmetric division of the zygote (Yu et al., 2016). ZAR1 showed the highest abundance in 2/4-cell, 8-cell, 16-cell, 32-cell and globular stages, accounting for 30.74%, 28.21%, 29.17%, 18.35% and 13.15% of the proteome, respectively (Fig. 2C, Fig. S5A). This indicates that ZAR1 may play an important role in Arabidopsis early embryogenesis after the first division of the zygote. High abundance of ZAR1 protein was detected during the 2/4-cell and globular stages, whereas the transcript of ZAR1 dramatically increased at the 8-cell stage. Although the mRNA remained at a relatively high level, the ZAR1 protein decreased sharply at the heart stage (Fig. S5A). The dynamic change of ZAR1 transcript and ZAR1 protein suggested that post-transcriptional and post-translational mechanisms may regulate ZAR1/ZAR1 expression during Arabidopsis embryogenesis.

The heterogeneity of ribosomal proteins during early embryogenesis

We then explored the top 100 most abundant proteins in each stage. In total, 148 proteins were identified, with 59 (40%) proteins found in all stages (Fig. S5B, Table S3). Previous research showed that ribosome proteins were most commonly identified and that stem cells retain a high level of ribosome biogenesis activity (Bassal et al., 2020; Saba et al., 2021; Weis et al., 2015). Therefore, it seems reasonable that 105 out of 148 proteins identified in this work were annotated as ribosome-related proteins (Fig. 2C, Table S3). Two distinct clusters were identified in the heatmap, one at higher abundance and another at lower abundance at the heart stage, suggesting heterogeneity of ribosomal proteins during early embryogenesis (Fig. S5C). However, the corresponding transcripts showed higher expression in the 8-cell or 16-cell stages, indicating an asynchronization between transcription and translation during developmental stages (Fig. S5C). For the top 100 most abundant transcripts for each stage, 120 were identified in total. In addition, 82 (68%) out of the 120 transcripts identified were commonly found at all stages (Fig. S5E, Table S3), suggesting that highly expressed transcripts were more stable than proteins during Arabidopsis early embryogenesis. Previous research (Mergner et al., 2020) showed a relatively low overlap between the 100 most abundant proteins and transcripts, whereas we found that this overlap was 40-50% in all stages (Fig. S5D).

The dynamics of the proteome during early embryogenesis

To investigate dynamic changes in the proteome during Arabidopsis early embryogenesis, a soft clustering R package named Mfuzz (Kumar and Futschik, 2007), was used to generate eight clusters (Fig. 3A); the number of proteins in each cluster is given in Fig. 3B.

Fig. 3.

Dynamics of the proteome during Arabidopsis early embryogenesis. (A) Fuzzy C Means clustering. After excluding SEPs, the remaining normalized 2911 proteins were clustered by Fuzzy C Means. Membership value: red>purple>green>yellow. (B) GO enrichment of the eight clusters shown in A in biological processes (P<0.05). The number of proteins in each cluster is given next to the cluster number.

Fig. 3.

Dynamics of the proteome during Arabidopsis early embryogenesis. (A) Fuzzy C Means clustering. After excluding SEPs, the remaining normalized 2911 proteins were clustered by Fuzzy C Means. Membership value: red>purple>green>yellow. (B) GO enrichment of the eight clusters shown in A in biological processes (P<0.05). The number of proteins in each cluster is given next to the cluster number.

Based on Fuzzy C-means clustering, GO analysis was carried out to characterize the identified proteins. Each cluster showed distinct features in terms of biological process, molecular function, and cell component (Fig. 3B, Table S4). Arabidopsis embryos produce auxin, which plays crucial roles in the formation of ground and vascular tissue at the 32-cell or globular stages, from the outmost layer epidermis precursor cells at the 16-cell stage. Consistent with this, plant epidermis morphogenesis was enriched in cluster 7, in which protein abundance increased from the 32-cell stage. The auxin-activated signalling pathway was also enhanced from the 16-cell to the 32-cell stage and an trend to increase was observed in cluster 2, indicating that auxin signalling plays important roles during embryogenesis. (Fig. 3B). Some proteins, for example class III HD-ZIP transcription factors related to adaxial-abaxial polarity, were expressed at the globular stage. Similarly, adaxial-abaxial pattern specification was enriched in cluster 4, in which expression of these proteins peaked at the globular stage. Interestingly, the abundance of these adaxial-abaxial-related proteins decreased, although adaxial-abaxial polarity becomes obvious in the heart stage (Fig. 3B). Likewise, the root and shoot apical meristem are initiated before the heart stage, whereas the embryonic meristem initiation was enriched in cluster 3, in which the abundance of related proteins remained constant until they were significantly downregulated at the heart stage (Fig. 3B). Notably, proteins related to the maintenance of shoot apical meristem were enriched in the heart SEPs (Table S2). In addition, proteins in cluster 3 were also associated with mitochondrion organization and chloroplast maturation (Fig. 3B). The observation of enriched GO terms relating to chromatin remodelling in cluster 1, positive regulation of chromatin organization in cluster 4, and chromatin organization in heart SEPs (Fig. 2C) suggested that chromatin was highly dynamic especially during globular-to-heart stage transition. Furthermore, ribosome biogenesis was highly enriched in clusters 5 and 7, an indication of active translation during embryo development.

Reshaping of the proteome in subcellular categories during early embryogenesis

Proteins translated by ribosomes are either retained in the cytosol or they can be transported within or outside the cell to exert their functions. To investigate the subcellular location of proteins identified, we first employed SUBA4 (Heazlewood et al., 2007), a web-based tool, to annotate these proteins. As some proteins had multiple locations, only those unambiguously annotated proteins were used for further analysis. The total intensity of proteins in the same category was divided by that of all proteins to obtain the percentage of each subcellular protein. The highest percentage of protein intensity was found in the cytosol, with around 50% of total proteins for all stages. The percentage was higher in 32-cell (59.1%), globular (55.2%) and heart (60.6%) stages than that in 2/4-cell (46.6%), 8-cell (50.7%) and 16-cell (48.6%) stages (Fig. 4A, Table S4). The percentage of protein intensity in mitochondrion was similar for the 2/4-cell and globular stages (∼2-3%), but increased at the heart stage (7.1%),a pattern similar to chloroplast protein. Although a small percentage of protein intensity was detected in the nucleus, there was a growing trend from the 2/4-cell stage (4.4%) to the heart stage (10.7%). Surprisingly, a high percentage of protein intensity was found in the plasma membrane before the heart stage, especially in the 2/4-cell (31.4%), 8-cell (28.5%) and 16-cell (29.4%) stages. Only 1% of plasma membrane protein intensity was found in the heart stage (Fig. 4A, Table S4). The high plasma membrane protein intensity at the 2/4-cell to the globular stage can be explained by the extremely high abundance of ZAR1 protein, which was unambiguously annotated as a plasma membrane-located protein (Figs 2C, 4C, Table S5).

Fig. 4.

The dynamics of proteins in subcellular categories. (A) The percentage of total protein intensity of each subcellular category. Proteins' subcellular locations were annotated by SUBA4. Only unambiguously annotated proteins were retained to calculate the percentage. (B) The dynamic change of the number of proteins identified in the cytosol, plasma membrane, nucleus, mitochondrion and plastid. *P<0.05; **P<0.01 (Z-test). (C) The intensity of proteins in plasma membrane and plastid. *P<0.05; **P<0.01 (Mann–Whitney U-test). ZAR1 from the 2/4-cell to the heart stage is marked with red. iBAQ values were normalized to total protein intensity in the sample. The lower edge of the box represents the first quartile (Q1), the horizontal line the median, and the upper edge of the box the third quartile (Q3). The whiskers extend from the edges of box to show the range of the data. They extend no more than 1.5*IQR (IQR=Q3−Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

Fig. 4.

The dynamics of proteins in subcellular categories. (A) The percentage of total protein intensity of each subcellular category. Proteins' subcellular locations were annotated by SUBA4. Only unambiguously annotated proteins were retained to calculate the percentage. (B) The dynamic change of the number of proteins identified in the cytosol, plasma membrane, nucleus, mitochondrion and plastid. *P<0.05; **P<0.01 (Z-test). (C) The intensity of proteins in plasma membrane and plastid. *P<0.05; **P<0.01 (Mann–Whitney U-test). ZAR1 from the 2/4-cell to the heart stage is marked with red. iBAQ values were normalized to total protein intensity in the sample. The lower edge of the box represents the first quartile (Q1), the horizontal line the median, and the upper edge of the box the third quartile (Q3). The whiskers extend from the edges of box to show the range of the data. They extend no more than 1.5*IQR (IQR=Q3−Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

However, the percentage of protein number (the number of proteins in each subcellular location out of the total number of proteins) at each stage is different from the percentage of protein intensity. For example, a significant decrease of cytosol protein numbers was found in the heart stage compared with the globular stage. The percentage increased from the 2/4-cell stage to the 16-cell stage in the nucleus. After that, it declined gradually, although a gradual increase was observed from the 16-cell stage to the heart stage in the plastid (Fig. 4B, Fig. 5C). Although the percentage of protein number in the plasma membrane significantly increased in the heart stage, the intensity of plasma membrane proteins became smaller compared with earlier stages (Fig. 4C). The dynamic percentage change of protein intensity and protein number suggests that a reshaping of the proteome likely takes place during plant early embryogenesis (Fig. 4, Fig. S6). In addition, the significant change of proteins in the cytosol, plasma membrane, nucleus and plastid during the globular-to-heart stage transition, together with the separation by PCA of the proteome profile at the heart stage, suggest that a dramatic molecular and cellular change may occur during the globular-to-heart stage transition (Figs 1B, 4B).

Fig. 5.

Identification of TFs and TRs during Arabidopsis early embryogenesis. (A) TFs that were identified at each stage. In total, 143 TFs were identified with 16 were identified at all stages (red), 26 were not identified in the heart stage but were identified in some of the other stages (light green), 38 were identified in the heart stage and some of the other stages (purple) and 63 were only identified in the heart stage (khaki). The order of gene symbols is the same as in the heatmap (and the same as in B). Proteins discussed in the text are marked in blue. The iBAQ value was normalized (as in B). (B) TRs that were identified in each stage. In total, 103 TRs were identified with ten were identified in all stages (orange), nine were not identified in the heart stage but identified in some other stages (pink), 27 were identified in the heart stage and some of the other stages (grey) and 57 were only identified in heart stage (black). (C) The STRING protein-protein interaction network of TFs identified in the heart stage with high confidence. The size of the nodes was scaled by degree and a deeper colour means that the node (protein) connects with more other nodes (same in D). (D) The STRING protein-protein interaction network of TRs identified at the heart stage with high confidence.

Fig. 5.

Identification of TFs and TRs during Arabidopsis early embryogenesis. (A) TFs that were identified at each stage. In total, 143 TFs were identified with 16 were identified at all stages (red), 26 were not identified in the heart stage but were identified in some of the other stages (light green), 38 were identified in the heart stage and some of the other stages (purple) and 63 were only identified in the heart stage (khaki). The order of gene symbols is the same as in the heatmap (and the same as in B). Proteins discussed in the text are marked in blue. The iBAQ value was normalized (as in B). (B) TRs that were identified in each stage. In total, 103 TRs were identified with ten were identified in all stages (orange), nine were not identified in the heart stage but identified in some other stages (pink), 27 were identified in the heart stage and some of the other stages (grey) and 57 were only identified in heart stage (black). (C) The STRING protein-protein interaction network of TFs identified in the heart stage with high confidence. The size of the nodes was scaled by degree and a deeper colour means that the node (protein) connects with more other nodes (same in D). (D) The STRING protein-protein interaction network of TRs identified at the heart stage with high confidence.

Transcriptional regulation during Arabidopsis early embryogenesis

Transcriptional factors (TFs) and transcription regulators (TRs) are vital to the transcriptional regulation of gene expression. We, therefore, set out to examine the profiles of TFs and TRs during Arabidopsis early embryogenesis.

In total, 143 TFs were identified. Among them, 16 were universally found in all stages. The well-known BBM (an AP2-domain transcription factor, also named PLT4, a member of the PLT family, which specify and maintain stem cell niche during embryogenesis; Aida et al., 2004) was identified in all stages, whereas PLT1 was identified in the 32-cell and heart stages, and PLT2 was identified in the 2/4-cell and heart stages (Fig. 5A). A recent study described that nucleolar histone deacetylases (HDTs) regulate plant reproductive development and that mutation of HDT1 and HDT2 resulted in defects of embryogenesis at the heart stage (Luo et al., 2022). Coincidently, high levels of HAD3 (also named HDT1) and HD2C (also named HDT3) were identified in all developmental stages in our work (Fig. 5A). LEAFY COTYLEDON 1 (LEC1), which is necessary and sufficient for embryo processes (Tian et al., 2020), was identified from the 8-cell to the heart stage.

Because a large portion (82%) of TFs were identified in the heart stage, we used these TFs to construct a TF network. LEC1 was found to be the most important in the network, where it directly interacted with lots of other proteins, such as BBM, NF-YCs and REF6 (Fig. 5C). NF-YB6 (also named LEC1-like, L1L) was also a very important TF in the network, with interacting partners that were similar to those of LEC1. By contrast, EMB2746, a chloroplast-localized Rnase J, mutation of which results in defective embryos after the globular stage owing to impaired chloroplast development (Chen et al., 2015), was linked to CPSF30 (also a member of the Rnase J family) and ASIL1/ASIL2. The mutations of ASIL1/ASIL2 have been shown to result in early chlorophyll fluorescence at the globular stage although no morphology change was observed (Willmann et al., 2011).

Ten out of the 103 identified TRs were universally found in all stages (Fig. 5B). However, TRs have been less studied in embryogenesis compared with TFs. The most highly expressed universal TR is AT4G11400, described as ARID/BRIGHT DNA-binding, ELM2 domain, and myb-like DNA-binding domain-containing protein in Araport11 (Cheng et al., 2017). Similarly, over 90% of the TRs were identified in the heart stage. We then constructed a network using the TRs identified in the heart stage (Fig. 5D). Two ARID-HMG proteins (AT1G76110 and AT1G04880), both containing an AT-rich DNA-binding domain (ARID) and high mobility group (HMG) box motif to modulate chromatin structure (Roy et al., 2016), are potential hub proteins because they were frequently connected with other TRs, including the SWI/SNF subfamily (BRM, SYD, CHR23), the ISWI subfamily (CHR11, CHR17), the CHD subfamily (PKL) and the INO80 subfamily (INO80) (Kang et al., 2020) (Fig. 5D). However, their roles in early embryogenesis need to be further investigated.

High positive across-gene correlation between mRNA and protein

To unravel the relationship between mRNA and protein in early embryogenesis, RNA-sequencing (RNA-seq) libraries using the same embryo samples as used for nanoproteomics was constructed. The quality of RNA-seq data was confirmed by PCA plot and PCC heatmap (Fig. S3A,B). Genes with transcripts per million (TPM)>1 in three biological replicates in each stage and proteins identified in all stages were used to calculate the across-gene correlation (correlation between mRNA and protein of different genes at the same stage; Buccitelli and Selbach, 2020). Because noise from biological and technical variance may occur in the detection of proteins and mRNAs, we used Spearman's correction to reduce the noise. The correlation between protein and mRNA was calculated (Csárdi et al., 2015). The average corrected Spearman correlation coefficient (SCC) was 0.75, with a minimum of 0.710 at the globular stage and a maximum of 0.779 at the 16-cell stage (Fig. 6A,B, Fig. S7A). We considered this a high correlation compared with those observed in other species, such as mouse (<0.2, PCC), (Gao et al., 2017), Drosophila [0.41, SCC] (Becker et al., 2018; Casas-Vila et al., 2017), Xenopus (0.42, SCC) (Peshkin et al., 2015). Similar to Drosophila embryogenesis, we also observed a trend of a slightly higher transcriptome-proteome correlation when calculating the correlation between earlier-stage mRNAs and later-stage proteins (Fig. 6A,B). For example, the SCC was 0.728 for mRNAs and proteins at the 2/4-cell stage, whereas it was 0.786 for mRNAs in the 2/4-cell stage and proteins at the 32-cell stage. This phenomenon may be partially explained by the delay of protein translation.

Fig. 6.

High positive correlation between mRNA and protein. (A) Heatmap of the across-gene correlation between mRNA and protein; n=1035±5. The correlation was calculated by corrected Spearman's correlation coefficient (SCC) (as in B,C). The red rectangle marks the highest correlation. (B) Scatter plots showing examples of across-gene correlation. Top: mRNA and protein at the same stage; bottom: mRNA and protein at different stages. The coloured bar represents the density of points. (C) Protein and mRNA SCC value distribution for individual genes (n=1023) during embryogenesis. SCCs are ranked from the lowest to the highest in the x-axis. The GO terms for genes with significant positive (SCC>0.5, P<0.05 by two-tailed t-test) and negative (SCC<−0.5, P<0.05 by two-tailed t-test) correlation are given in blue and green text, respectively. (D) Heatmap of the significant positively (left) and negatively (right) correlated genes from C. The values of mRNA and proteins were normalized to their respective z-scores. The order of the mRNA heatmaps corresponds to that of the protein heatmaps.

Fig. 6.

High positive correlation between mRNA and protein. (A) Heatmap of the across-gene correlation between mRNA and protein; n=1035±5. The correlation was calculated by corrected Spearman's correlation coefficient (SCC) (as in B,C). The red rectangle marks the highest correlation. (B) Scatter plots showing examples of across-gene correlation. Top: mRNA and protein at the same stage; bottom: mRNA and protein at different stages. The coloured bar represents the density of points. (C) Protein and mRNA SCC value distribution for individual genes (n=1023) during embryogenesis. SCCs are ranked from the lowest to the highest in the x-axis. The GO terms for genes with significant positive (SCC>0.5, P<0.05 by two-tailed t-test) and negative (SCC<−0.5, P<0.05 by two-tailed t-test) correlation are given in blue and green text, respectively. (D) Heatmap of the significant positively (left) and negatively (right) correlated genes from C. The values of mRNA and proteins were normalized to their respective z-scores. The order of the mRNA heatmaps corresponds to that of the protein heatmaps.

Multiple factors, e.g. mRNA or protein degradation and transportation, may affect the transcriptome/proteome correlation. Previous research found that the protein/transcript ratio (PTR) of some genes was relatively stable [accessed by median absolute deviation (MAD)] across tissues (Mergner et al., 2020). We then examined whether the transcriptome/proteome correlation reveals any difference between genes with stable PTR and unstable PTR. Here, genes with MAD<1 were defined as stable and the rest were defined as unstable. We divided the proteins identified in all stages into two groups and found that the across-gene correlation of stable genes was significantly higher than that of the unstable genes in most cases (Fig. S7B). This result was in agreement with Mergner et al. (2020) whose data show that the correlation between stable genes and unstable genes was 0.64 and 0.54, respectively (Fig. S7C).

Within-gene correlation between mRNA and protein

We also calculated the within-gene correlation (correlation between protein and transcript of individual gene at a series time or developmental stages; Buccitelli and Selbach, 2020). To estimate individual gene correlations during Arabidopsis early embryogenesis, we first screened out proteins for which the corresponding mRNA was TPM>1 in three biological replicates at all stages. This resulted in 1023 genes in total. We then defined a gene as significantly positively correlated if the SCC was 0.5-1 with a P<0.05 by two-tailed t-test. Those with SCC between −1 and −0.5 and with P<0.05 were defined as significantly negatively correlated. The distribution of correlation between mRNA and protein is shown in Fig. 6C. However, only 64 genes showed a significant positive correlation, and 40 genes showed a significant negative correlation. As expected, the heatmap of protein and mRNA was in accordance with both positively and negatively correlated genes (Fig. 6D). GO analysis showed that positively correlated genes were mainly enriched in metabolic processes and chromatin assembly, whereas negatively correlated genes were enriched in ribonucleoprotein complex biogenesis and assembly (Fig. 6C, Fig. S8).

In this study, we conducted comprehensive nanoproteomics analysis for Arabidopsis early embryogenesis spanning from the 2/4-cell stage to the heart stage. Owing to the difficulty in isolating embryos, especially at early developmental stages, proteomics for Arabidopsis early embryogenesis has not been reported before. In total, over 5300 proteins were identified and quantified in our study, with ∼20% found to be universal at all stages. However, we noticed that the identified proteins mostly corresponded to genes with higher transcript levels and with lengths between ∼1500 and ∼3500 (Fig. S2). Still, a large portion of the proteome remained unidentified, either because of low protein expression or limitations of the protein extraction techniques. As an alternative to proteomics studies, ribosome profiling can identify transcripts that are being translated (Ingolia et al., 2009). Very recently, this method was adapted to study the single-cell translatome in a human cell line (VanInsberghe et al., 2021). We speculate that single-cell ribosome profiling could be a promising method to unravel a more comprehensive proteome profile in plant early embryogenesis given the advantage of small sample requirements.

In this work, we identified ZAR1 as the most abundant protein from the 2/4-cell to the globular stage. However, the role of ZAR1 in early embryogenesis remains unclear, although it is known that ZAR1 is involved in the first asymmetric division of the zygote (Yu et al., 2016). It is reasonable to speculate that ZAR1 may also play an important role in morphology establishment at later stages of embryogenesis when asymmetric divisions occur. Interestingly, among the 100 most abundant proteins and transcripts, those with adaxial/abaxial pattern specification function (AT2G19730, PGY1, ATL5, STV1, RPS13A and RPS10B) were found at all stages (Table S5). It is important to investigate why these proteins are highly expressed constantly during all embryogenesis stages. Proteins that are related to ribosomes, ribosome biogenesis and rRNA processing were also highly enriched in the 100 most abundant proteins. However, it was noted that most ribosome-related proteins either declined or increased dramatically at the heart stage (Fig. S5C). For example, the intensity of AT1G26880, annotated as ribosomal protein L34e superfamily protein in Araport11 (Cheng et al., 2017), declined from ∼107 to ∼104. Another protein, AT2G27720, annotated as 60S acidic ribosomal protein family in Araport11, increased from ∼107 to ∼108. The dynamic change of highly abundant ribosome-related proteins in early embryogenesis also agreed with the previous finding that ribosome species vary with developmental stage and mutations in ribosomal proteins can cause severe developmental defects (Bassal et al., 2020; Byrne, 2009; Weis et al., 2015). In addition, the dynamics of ribosomal proteins may partly explain the dramatic changes in the proteome during the globular-to-heart transition (Figs 1B, 5). The dramatic decline of ribosomal proteins at the heart stage may be important for earlier stage translation and vice versa. Previous research found that stem cells are characterized by low rates of global protein synthesis but high ribosome biogenesis in mammals (Saba et al., 2021). As we also observed a high level of ribosome biogenesis in plant early embryos, this raises the question of whether plant embryo cells show similarly low rates of global protein synthesis like mammalian stem cells, and when the rates of protein synthesis start to increase.

Although the mRNA level poorly predicts protein abundance in other species (Casas-Vila et al., 2017; Gao et al., 2017; Peshkin et al., 2015), the correlation between mRNA and protein was relatively higher in our data. The mRNA and protein correlation can be affected by many biological factors, such as translation rates, translation rate modulation, modulation of a protein's half-life, protein synthesis delay and protein transport (Liu et al., 2016), as well as technical factors, such as sequencing depth, RNA and protein library preparation, mass spec sensitivity, etc. Unlike the embryo of the mouse, Xenopus and Drosophila, the Arabidopsis embryo is surrounded by cuticles, which limits the communication among endosperm, embryo and maternal integuments (Ingram, 2010; Lafon-Placette and Kohler, 2014). In other words, mRNA and protein transport among embryo, maternal integuments, and endosperm are limited. Therefore, the transport of mRNA and protein has a limited effect on the correlation of mRNA and protein within the embryo. In addition, the maternal-to-zygotic transition of Arabidopsis occurs at a very early stage (before the first division of zygote) according to a recent study by Zhao et al. (2019), compared with cycle 1 in mouse, cycle 14 in Drosophila and cycle 13 in Xenopus (Schulz and Harrison, 2019). Therefore, maternal mRNA and protein may affect the correlation of mRNA and protein in these species at early embryo stages. Indeed, the correlation of mRNA and protein in Xenopus and Drosophila at later stages was found to be higher compared with earlier stages (Becker et al., 2018; Peshkin et al., 2015).

In summary, through nanoproteomics analysis, a dynamic landscape of the Arabidopsis proteome during early embryogenesis was profiled. Some important proteins identified in this research could be potential targets for future investigation. The correlation between transcripts and proteins analysis will enhance our understanding of the early embryogenesis of flowering plants.

Plant materials

Arabidopsis thaliana Col-0 wild type was used for the proteomics study. Plants were grown in a Panasonic growth chamber (MLR-325) at 22°C with a 16/8 h light/dark cycle. Embryo isolation method can be referred to (Raissig et al., 2013; Zhou et al., 2019a). Briefly, flowers were emasculated before opening and manually pollinated after 24 h. Siliques at various developmental stages were collected (Table S1). For samples used for protein library construction, embryos at the heart and later stages were collected. Seeds were scraped off from the septum with needles under a stereoscopic microscope (Olympus SZX16) and transferred to a 1.5 ml tube (Axygen, MCT-150-L-C) containing 50 μl enzyme solution [0.1% cellulase R-10 and 0.08% Macrozyme R-10 (Yakult Pharmaceutical Industry Co.), dissolved in 80 mM D-mannitol (Sigma-Aldrich, M4125) in 10% glycerol]. After enzyme treatment for 15 min at room temperature, seeds were washed twice with wash buffer [80 mM D-Mannitol, 10% glycerol (Sangon Biotech, A501745-0500), 0.058% MES (Sigma-Aldrich, M3671) (pH 5.8)], and were resuspended with wash buffer. Next, seeds were gently crushed using a plastic pestle to release the embryo (after the embryo was released from the seeds, the remaining procedure was finished within 2 h). The mixture was transferred onto a sterilized microscope slide. A home-made glass pipette was used to isolate the embryo from the mixture under a Carl Zeiss PALM Inverted Microscope; morphology was used to confirm the developmental stage. Embryos growing at the same stage were transferred to a 1.5 ml tube containing 100 μl wash buffer and temporarily kept on ice. Collected embryos were washed three times by recollecting embryos under the microscope into 100 μl fresh wash buffer until the solution was clear to prevent contamination from endosperm and maternal tissue. The embryo sample was stored in a −80°C freezer until use.

Nanoproteomics and data processing

Protein extraction and digestion

All nanoproteomics were performed by BGI (Shenzhen, China) microproteomic technology (https://www.bgi.com/global/mass-spec-services/nanoproteomics/). For protein library construction, lysis buffer (7 M urea, 2 M thiourea, 0.2% SDS, 20 mM Tris-HCl, pH 8.0) was added to the sample for sonication. The lysed sample was centrifuged (25,000 g for 15 min at 4°C) to obtain the supernatant for reduction and alkylation. Protein extract quality control was confirmed by SDS-PAGE. For protein identification, 50 μg of protein was dissolved in solution (7 M urea, 2 M thiourea, 20 mM Tris-HCl, pH 8.0) for trypsin digestion. After digestion, samples were desalted and separated into 20 components with high-pH reversed-phase (RP) chromatography. For micro-samples, lysis buffer was added to the sample before sonication. Reduction and alkylation were then performed, followed by trypsin digestion.

Mass spectrometry analysis

Trypsin-digested samples were separated by chromatography using a Thermo UltiMate 3000 UHPLC. The sample was first injected into the trap column for enrichment and desalting, then separated with a self-packed C18 column at a flow rate of 500 nl/min. The peptides separated by the liquid phase were ionized by the nanoESI source and entered the tandem mass spectrometer Orbitrap Fusion TM Lumos TM Tribrid TM (Thermo Fisher Scientific) for data-dependent acquisition mode detection.

Protein identification and quantification

The raw data of micro-samples and library samples was subjected to MaxQuant's (v.1.5.3.30) (Cox and Mann, 2008) integrated Andromeda engine with ‘match between runs’ mode for identification. Parameters were set as follows: enzyme was set as trypsin; minimal peptide length was set as 7; PSM-level FDR was set as 0.01; protein-level FDR was set as 0.01; match between run was chosen; iBAQ was chosen; LFQ was chosen; fixed modification was carbamidomethyl (C); variable modifications was oxidation (M), acetyl (Protein N-term), deamidation(NQ),Gln->pyro-Glu; and the search database was Araport11_ genes.201606.pep.fasta, which was downloaded from TAIR.

Data processing

The MaxQuant output table was filtered for contaminants and reversed sequences. Protein intensity was normalized by dividing the sum of total intensity in each sample and multiplying by 10e9 in the case of small values. The normalized values were log2-transformed for further analysis. Only proteins with at least one ‘Razor+unique peptides’ identified in two out of three replicates was regarded as identified.

RNA-seq and data processing

Library preparation

mRNA was isolated using the Dynabeads mRNA DIRECTTM Micro Kit (Life Technologies) according to the manufacturer's instructions. cDNA synthesis and amplification were performed using a SMARTer Ultra Low RNA Kit for Illumina Sequencing (Clontech). cDNA was amplified for 9-16 cycles of long-distance PCR. The amplified cDNA was purified using VAHTS DNA Clean Beads (Vazyme, N411-01). RNA-seq libraries were prepared using VAHTS Universal DNA Library Prep Kit for Illumina V3 (Vazyme, N607-01, China) according to the manufacturer's instructions and sequenced on an Illumine HiSeq X Ten system.

Data processing

Raw RNA-seq reads were trimmed to remove poor-quality base calls and adaptor contamination using Trim Galore (version 0.6.6) (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with default parameters. Trimmed RNA-seq reads were mapped to the Araport11 (Kawakatsu et al., 2016) transcriptome with Kallisto (v0.44.0) (Bray et al., 2016) using default parameters.

Data analysis

PCA plots

For proteomics PCA analysis, the R package pcaMethods (Stacklies et al., 2007) was used with missing values less than 20%. ggplot2 was used for visualization (https://ggplot2.tidyverse.org/). For RNA-seq, the plotPCA function of the R package DESeq2 was used (Love et al., 2014).

Permutation test

There were 18 samples in our experiment. We randomly assigned three samples to each stage without replacement and then the number of stage-exclusive proteins was recalculated. This process was repeated 10,000 times, resulting in a collection of the number of stage-exclusive proteins for each stage. The P-value was evaluated by how frequently the number of stage-exclusive proteins was more than or less than the number of stage-exclusive proteins identified in our experiment. For instance, there were 126 stage-exclusive proteins identified at the globular stage and the frequency that the number of stage-exclusive identified proteins in the permutation test was larger than 126 was 0.33.

Gene ontology

Metascape (Zhou et al., 2019b) was used with default parameters. Except for Fuzzy C Means clusters, the R package clusterProfiler (v3.18.1) compareCluster function was used with a P<0.05 cutoff (Yu et al., 2012).

Proteome dynamics analysis

The R package Mfuzz was used (Kumar and Futschik, 2007). Standardization was carried out using the ‘standardise’ function. Fuzzifier was determined using the ‘mestimate’ function. Clustering was performed using the Mfuzz function with ‘cmeans’ method.

Subcellular location of the proteome

SUBA4 (Heazlewood et al., 2007), a web-based tool, was used to annotate these proteins. As some proteins had multiple locations, only unambiguously annotated proteins were retained. The total intensity of proteins in the same category was divided by that of all proteins to obtain the percentage of each subcellular protein.

Network construction

The classification of TFs and TRs was downloaded from iTAK (Zheng et al., 2016). Then, STRING was used to construct the network with a minimum required interaction score of 0.7 (Jensen et al., 2009). The network was visualized using Cytoscape (Shannon et al., 2003).

We thank professor Mengxiang Sun (Wuhan University, China) who shared with us the method for isolating the embryos. We also thank Dr Longjian Niu who provided valuable suggestions on the experiment; Dr Mingkun Huang who provided valuable suggestions on writing the manuscript; and Drs Yan Zhang and Yunzhen Wei who provided some useful suggestions on statistical analysis.

Author contributions

Conceptualization: C.H., D.G.; Methodology: Y.H., L.Z.; Formal analysis: Y.H., L.Z.; Resources: Y.H., L.Z.; Writing - original draft: Y.H.; Writing - review & editing: C.H., D.G.; Supervision: D.G.; Funding acquisition: D.G.

Funding

This research was funded by the Transformation project of Hong Kong and Macao scientific and technological achievements (6905891) from Guangdong province, China; the Shenzhen Science and Technology Innovation Council (20200925153547003 to C.H.); and the State Key Laboratory of Agrobiotechnology, Chinese University of Hong Kong (8300052), in Hong Kong, China.

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al., 2022) partner repository with the dataset identifier PXD031922. The RNA-seq data have been deposited to GEO database under accession number GSE197556.

Aida
,
M.
,
Beis
,
D.
,
Heidstra
,
R.
,
Willemsen
,
V.
,
Blilou
,
I.
,
Galinha
,
C.
,
Nussaume
,
L.
,
Noh
,
Y.-S.
,
Amasino
,
R.
and
Scheres
,
B.
(
2004
).
The PLETHORA genes mediate patterning of the Arabidopsis root stem cell niche
.
Cell
119
,
109
-
120
.
Anderson
,
S. N.
,
Johnson
,
C. S.
,
Chesnut
,
J.
,
Jones
,
D. S.
,
Khanday
,
I.
,
Woodhouse
,
M.
,
Li
,
C.
,
Conrad
,
L. J.
,
Russell
,
S. D.
and
Sundaresan
,
V.
(
2017
).
The zygotic transition is initiated in unicellular plant zygotes with asymmetric activation of parental genomes
.
Dev. Cell
43
,
349
.
Armenta-Medina
,
A.
and
Gillmor
,
C. S.
(
2019
).
Genetic, molecular and parent-of-origin regulation of early embryogenesis in flowering plants
.
Curr. Top. Dev. Biol.
131
,
497
-
543
.
Autran
,
D.
,
Baroux
,
C.
,
Raissig
,
M. T.
,
Lenormand
,
T.
,
Wittig
,
M.
,
Grob
,
S.
,
Steimer
,
A.
,
Barann
,
M.
,
Klostermeier
,
U. C.
,
Leblanc
,
O.
et al.
(
2011
).
Maternal epigenetic pathways control parental contributions to Arabidopsis early embryogenesis
.
Cell
145
,
707
-
719
.
Bassal
,
M.
,
Abukhalaf
,
M.
,
Majovsky
,
P.
,
Thieme
,
D.
,
Herr
,
T.
,
Ayash
,
M.
,
Tabassum
,
N.
,
Al Shweiki
,
M. R.
,
Proksch
,
C.
,
Hmedat
,
A.
et al.
(
2020
).
Reshaping of the Arabidopsis thaliana proteome landscape and co-regulation of proteins in development and immunity
.
Mol. Plant
13
,
1709
-
1732
.
Becker
,
K.
,
Bluhm
,
A.
,
Casas-Vila
,
N.
,
Dinges
,
N.
,
Dejung
,
M.
,
Sayols
,
S.
,
Kreutz
,
C.
,
Roignant
,
J. Y.
,
Butter
,
F.
and
Legewie
,
S.
(
2018
).
Quantifying post-transcriptional regulation in the development of Drosophila melanogaster
.
Nat. Commun.
9
,
4970
.
Bouyer
,
D.
,
Kramdi
,
A.
,
Kassam
,
M.
,
Heese
,
M.
,
Schnittger
,
A.
,
Roudier
,
F.
and
Colot
,
V.
(
2017
).
DNA methylation dynamics during early plant life
.
Genome Biol.
18
,
179
.
Bray
,
N. L.
,
Pimentel
,
H.
,
Melsted
,
P.
and
Pachter
,
L.
(
2016
).
Near-optimal probabilistic RNA-seq quantification
.
Nat. Biotechnol.
34
,
525
-
527
.
Buccitelli
,
C.
and
Selbach
,
M.
(
2020
).
mRNAs, proteins and the emerging principles of gene expression control
.
Nat. Rev. Genet.
21
,
630
-
644
.
Byrne
,
M. E.
(
2009
).
A role for the ribosome in development
.
Trends Plant Sci.
14
,
512
-
519
.
Casas-Vila
,
N.
,
Bluhm
,
A.
,
Sayols
,
S.
,
Dinges
,
N.
,
Dejung
,
M.
,
Altenhein
,
T.
,
Kappei
,
D.
,
Altenhein
,
B.
,
Roignant
,
J. Y.
and
Butter
,
F.
(
2017
).
The developmental proteome of Drosophila melanogaster
.
Genome Res.
27
,
1273
-
1285
.
Chen
,
H. Y.
,
Zou
,
W. X.
and
Zhao
,
J.
(
2015
).
Ribonuclease J is required for chloroplast and embryo development in Arabidopsis
.
J. Exp. Bot.
66
,
2079
-
2091
.
Chen
,
J.
,
Strieder
,
N.
,
Krohn
,
N. G.
,
Cyprys
,
P.
,
Sprunck
,
S.
,
Engelmann
,
J. C.
and
Dresselhaus
,
T.
(
2017
).
Zygotic genome activation occurs shortly after fertilization in maize
.
Plant Cell
29
,
2106
-
2125
.
Cheng
,
C.-Y.
,
Krishnakumar
,
V.
,
Chan
,
A. P.
,
Thibaud-Nissen
,
F.
,
Schobel
,
S.
and
Town
,
C. D.
(
2017
).
Araport11: a complete reannotation of the Arabidopsis thaliana reference genome
.
Plant J.
89
,
789
-
804
.
Cox
,
J.
and
Mann
,
M.
(
2008
).
MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification
.
Nat. Biotechnol.
26
,
1367
-
1372
.
Csárdi
,
G.
,
Franks
,
A.
,
Choi
,
D. S.
,
Airoldi
,
E. M.
and
Drummond
,
D. A.
(
2015
).
Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast
.
PLoS Genet.
11
,
e1005206
.
Dresselhaus
,
T.
and
Jürgens
,
G.
(
2021
).
Comparative embryogenesis in angiosperms: activation and patterning of embryonic cell lineages
.
Annu. Rev. Plant Biol.
72
,
641
-
676
.
Dresselhaus
,
T.
,
Sprunck
,
S.
and
Wessel
,
G. M.
(
2016
).
Fertilization mechanisms in flowering plants
.
Curr. Biol.
26
,
R125
-
R139
.
Fortelny
,
N.
,
Overall
,
C. M.
,
Pavlidis
,
P.
and
Freue
,
G. V. C.
(
2017
).
Can we predict protein from mRNA levels?
Nature
547
,
E19
-
E20
.
Gao
,
Y.
,
Liu
,
X.
,
Tang
,
B.
,
Li
,
C.
,
Kou
,
Z.
,
Li
,
L.
,
Liu
,
W.
,
Wu
,
Y.
,
Kou
,
X.
,
Li
,
J.
et al.
(
2017
).
Protein expression landscape of mouse embryos during pre-implantation development
.
Cell Rep
21
,
3957
-
3969
.
Goldberg
,
R. B.
,
De Paiva
,
G.
and
Yadegari
,
R.
(
1994
).
Plant embryogenesis - zygote to seed
.
Science
266
,
605
-
614
.
Heazlewood
,
J. L.
,
Verboom
,
R. E.
,
Tonti-Filippini
,
J.
,
Small
,
I.
and
Millar
,
A. H.
(
2007
).
SUBA: the Arabidopsis subcellular database
.
Nucleic Acids Res.
35
,
D213
-
D218
.
Ingolia
,
N. T.
,
Ghaemmaghami
,
S.
,
Newman
,
J. R. S.
and
Weissman
,
J. S.
(
2009
).
Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling
.
Science
324
,
218
-
223
.
Ingram
,
G. C.
(
2010
).
Family life at close quarters: communication and constraint in angiosperm seed development
.
Protoplasma
247
,
195
-
214
.
Jensen
,
L. J.
,
Kuhn
,
M.
,
Stark
,
M.
,
Chaffron
,
S.
,
Creevey
,
C.
,
Muller
,
J.
,
Doerks
,
T.
,
Julien
,
P.
,
Roth
,
A.
,
Simonovic
,
M.
et al.
(
2009
).
STRING 8-a global view on proteins and their functional interactions in 630 organisms
.
Nucleic Acids Res.
37
,
D412
-
D416
.
Kang
,
H. J.
,
Wu
,
D.
,
Fan
,
T. Y.
and
Zhu
,
Y.
(
2020
).
Activities of chromatin remodeling factors and histone chaperones and their effects in root apical meristem development
.
Int. J. Mol. Sci.
21
,
771
.
Kao
,
P.
,
Schon
,
M. A.
,
Mosiolek
,
M.
,
Enugutti
,
B.
and
Nodine
,
M. D.
(
2021
).
Gene expression variation in Arabidopsis embryos at single-nucleus resolution
.
Development
148
,
dev199589
.
Kawakatsu
,
T.
,
Huang
,
S. S. C.
,
Jupe
,
F.
,
Sasaki
,
E.
,
Schmitz
,
R. J.
,
Urich
,
M. A.
,
Castanon
,
R.
,
Nery
,
J. R.
,
Barragan
,
C.
,
He
,
Y. P.
et al.
(
2016
).
Epigenomic diversity in a global collection of Arabidopsis thaliana accessions
.
Cell
166
,
492
-
505
.
Kawakatsu
,
T.
,
Nery
,
J. R.
,
Castanon
,
R.
and
Ecker
,
J. R.
(
2017
).
Dynamic DNA methylation reconfiguration during seed development and germination
.
Genome Biol.
18
,
171
.
Kumar
,
L.
and
Futschik
,
M.
(
2007
).
Mfuzz: a software package for soft clustering of microarray data
.
Bioinformation
2
,
5
-
7
.
Labib
,
M.
and
Kelley
,
S. O.
(
2020
).
Single-cell analysis targeting the proteome
.
Nat. Rev. Chem.
4
,
143
-
158
.
Lafon-Placette
,
C.
and
Köhler
,
C.
(
2014
).
Embryo and endosperm, partners in seed development
.
Curr. Opin. Plant Biol.
17
,
64
-
69
.
Lau
,
S.
,
Slane
,
D.
,
Herud
,
O.
,
Kong
,
J. X.
and
Jürgens
,
G.
(
2012
).
Early embryogenesis in flowering plants: setting up the basic body pattern
.
Annu. Rev. Plant Biol.
63
,
483
-
506
.
Li
,
C.
,
Gent
,
J. I.
,
Xu
,
H.
,
Fu
,
H.
,
Russell
,
S. D.
and
Sundaresan
,
V.
(
2022
).
Resetting of the 24-nt siRNA landscape in rice zygotes
.
Genome Res.
32
,
309
-
323
.
Liu
,
Y.
,
Beyer
,
A.
and
Aebersold
,
R.
(
2016
).
On the dependency of cellular protein levels on mRNA abundance
.
Cell
165
,
535
-
550
.
Love
,
M. I.
,
Huber
,
W.
and
Anders
,
S.
(
2014
).
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol.
15
,
550
.
Luo
,
Y.
,
Shi
,
D.-Q.
,
Jia
,
P.-F.
,
Bao
,
Y.
,
Li
,
H.-J.
and
Yang
,
W.-C.
(
2022
).
Nucleolar histone deacetylases HDT1, HDT2, and HDT3 regulate plant reproductive development
.
J. Genet. Genomics
49
,
30
-
39
.
Mergner
,
J.
,
Frejno
,
M.
,
List
,
M.
,
Papacek
,
M.
,
Chen
,
X.
,
Chaudhary
,
A.
,
Samaras
,
P.
,
Richter
,
S.
,
Shikata
,
H.
,
Messerer
,
M.
et al.
(
2020
).
Mass-spectrometry-based draft of the Arabidopsis proteome
.
Nature
579
,
409
-
414
.
Nodine
,
M. D.
and
Bartel
,
D. P.
(
2012
).
Maternal and paternal genomes contribute equally to the transcriptome of early plant embryos
.
Nature
482
,
94
-
97
.
Palovaara
,
J.
,
de Zeeuw
,
T.
and
Weijers
,
D.
(
2016
).
Tissue and organ initiation in the plant embryo: a first time for everything
.
Annu. Rev. Cell Dev. Biol.
32
,
47
-
75
.
Palovaara
,
J.
,
Saiga
,
S.
,
Wendrich
,
J. R.
,
van ‘t Wout Hofland
,
N.
,
van Schayck
,
J. P.
,
Hater
,
F.
,
Mutte
,
S.
,
Sjollema
,
J.
,
Boekschoten
,
M.
,
Hooiveld
,
G. J.
et al.
(
2017
).
Transcriptome dynamics revealed by a gene expression atlas of the early Arabidopsis embryo
.
Nat. Plants
3
,
894
-
904
.
Perez-Riverol
,
Y.
,
Bai
,
J.
,
Bandla
,
C.
,
Garcia-Seisdedos
,
D.
,
Hewapathirana
,
S.
,
Kamatchinathan
,
S.
,
Kundu
,
D. J.
,
Prakash
,
A.
,
Frericks-Zipper
,
A.
,
Eisenacher
,
M.
et al.
(
2022
).
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences
.
Nucleic Acids Res.
50
,
D543
-
D552
.
Peshkin
,
L.
,
Wühr
,
M.
,
Pearl
,
E.
,
Haas
,
W.
,
Freeman
,
R. M.
, Jr.
,
Gerhart
,
J. C.
,
Klein
,
A. M.
,
Horb
,
M.
,
Gygi
,
S. P.
and
Kirschner
,
M. W.
(
2015
).
On the relationship of protein and mRNA dynamics in vertebrate embryonic development
.
Dev. Cell
35
,
383
-
394
.
Plotnikova
,
A.
,
Kellner
,
M. J.
,
Schon
,
M. A.
,
Mosiolek
,
M.
and
Nodine
,
M. D.
(
2019
).
MicroRNA dynamics and functions during Arabidopsis embryogenesis
.
Plant Cell
31
,
2929
-
2946
.
Raissig
,
M. T.
,
Gagliardini
,
V.
,
Jaenisch
,
J.
,
Grossniklaus
,
U.
and
Baroux
,
C.
(
2013
).
Efficient and rapid isolation of early-stage embryos from Arabidopsis thaliana seeds
.
J. Vis. Exp.
76
,
50371
.
Roy
,
A.
,
Dutta
,
A.
,
Roy
,
D.
,
Ganguly
,
P.
,
Ghosh
,
R.
,
Kar
,
R. K.
,
Bhunia
,
A.
,
Mukhopadhyay
,
J.
and
Chaudhuri
,
S.
(
2016
).
Deciphering the role of the AT-rich interaction domain and the HMG-box domain of ARID-HMG proteins of Arabidopsis thaliana
.
Plant Mol. Biol.
92
,
389
-
390
.
Saba
,
J. A.
,
Liakath-Ali
,
K.
,
Green
,
R.
and
Watt
,
F. M.
(
2021
).
Translational control of stem cell function
.
Nat. Rev. Mol. Cell Biol.
22
,
671
-
690
.
Schulz
,
K. N.
and
Harrison
,
M. M.
(
2019
).
Mechanisms regulating zygotic genome activation
.
Nat. Rev. Genet.
20
,
221
-
234
.
Shannon
,
P.
,
Markiel
,
A.
,
Ozier
,
O.
,
Baliga
,
N. S.
,
Wang
,
J. T.
,
Ramage
,
D.
,
Amin
,
N.
,
Schwikowski
,
B.
and
Ideker
,
T.
(
2003
).
Cytoscape: a software environment for integrated models of biomolecular interaction networks
.
Genome Res.
13
,
2498
-
2504
.
Stacklies
,
W.
,
Redestig
,
H.
,
Scholz
,
M.
,
Walther
,
D.
and
Selbig
,
J.
(
2007
).
pcaMethods—a bioconductor package providing PCA methods for incomplete data
.
Bioinformatics
23
,
1164
-
1167
.
Tian
,
R.
,
Paul
,
P.
,
Joshi
,
S.
and
Perry
,
S. E.
(
2020
).
Genetic activity during early plant embryogenesis
.
Biochem. J.
477
,
3743
-
3767
.
Tyanova
,
S.
,
Temu
,
T.
and
Cox
,
J.
(
2016
).
The MaxQuant computational platform for mass spectrometry-based shotgun proteomics
.
Nat. Protoc.
11
,
2301
-
2319
.
VanInsberghe
,
M.
,
van den Berg
,
J.
,
Andersson-Rolf
,
A.
,
Clevers
,
H.
and
van Oudenaarden
,
A.
(
2021
).
Single-cell Ribo-seq reveals cell cycle-dependent translational pausing
.
Nature
597
,
561
-
565
.
Walley
,
J. W.
,
Sartor
,
R. C.
,
Shen
,
Z. X.
,
Schmitz
,
R. J.
,
Wu
,
K. J.
,
Urich
,
M. A.
,
Nery
,
J. R.
,
Smith
,
L. G.
,
Schnable
,
J. C.
,
Ecker
,
J. R.
et al.
(
2016
).
Integration of omic networks in a developmental atlas of maize
.
Science
353
,
814
-
818
.
Weis
,
B. L.
,
Kovacevic
,
J.
,
Missbach
,
S.
and
Schleiff
,
E.
(
2015
).
Plant-specific features of ribosome biogenesis
.
Trends Plant Sci.
20
,
729
-
740
.
Willmann
,
M. R.
,
Mehalick
,
A. J.
,
Packer
,
R. L.
and
Jenik
,
P. D.
(
2011
).
MicroRNAs regulate the timing of embryo maturation in Arabidopsis
.
Plant Physiol.
155
,
1871
-
1884
.
Yi
,
L.
,
Piehowski
,
P. D.
,
Shi
,
T. J.
,
Smith
,
R. D.
and
Qian
,
W. J.
(
2017
).
Advances in microscale separations towards nanoproteomics applications
.
J. Chromatogr. A
1523
,
40
-
48
.
Yoshida
,
S.
,
de Reuille
,
P. B.
,
Lane
,
B.
,
Bassel
,
G. W.
,
Prusinkiewicz
,
P.
,
Smith
,
R. S.
and
Weijers
,
D.
(
2014
).
Genetic control of plant development by overriding a geometric division rule
.
Dev. Cell
29
,
75
-
87
.
Yu
,
G.
,
Wang
,
L.-G.
,
Han
,
Y.
and
He
,
Q.-Y.
(
2012
).
clusterProfiler: an R package for comparing biological themes among gene clusters
.
OMICS
16
,
284
-
287
.
Yu
,
T. Y.
,
Shi
,
D. Q.
,
Jia
,
P. F.
,
Tang
,
J.
,
Li
,
H. J.
,
Liu
,
J.
and
Yang
,
W. C.
(
2016
).
The Arabidopsis receptor kinase ZAR1 is required for zygote asymmetric division and its daughter cell fate
.
PLoS Genet.
12
,
e1005933
.
Zhao
,
P.
,
Zhou
,
X.
,
Shen
,
K.
,
Liu
,
Z.
,
Cheng
,
T.
,
Liu
,
D.
,
Cheng
,
Y.
,
Peng
,
X.
and
Sun
,
M. X.
(
2019
).
Two-step maternal-to-zygotic transition with two-phase parental genome contributions
.
Dev. Cell
49
,
882
-
893.e5
.
Zhao
,
P.
,
Zhou
,
X.
,
Zheng
,
Y.
,
Ren
,
Y.
and
Sun
,
M.-x.
(
2020
).
Equal parental contribution to the transcriptome is not equal control of embryogenesis
.
Nat. Plants
6
,
1354
-
1364
.
Zheng
,
Y.
,
Jiao
,
C.
,
Sun
,
H.
,
Rosli
,
H. G.
,
Pombo
,
M. A.
,
Zhang
,
P.
,
Banf
,
M.
,
Dai
,
X.
,
Martin
,
G. B.
,
Giovannoni
,
J. J.
et al.
(
2016
).
iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases
.
Mol. Plant
9
,
1667
-
1670
.
Zhou
,
X.
,
Shi
,
C.
,
Zhao
,
P.
and
Sun
,
M.
(
2019a
).
Isolation of living apical and basal cell lineages of early proembryos for transcriptome analysis
.
Plant Reprod.
32
,
105
-
111
.
Zhou
,
Y. Y.
,
Zhou
,
B.
,
Pache
,
L.
,
Chang
,
M.
,
Khodabakhshi
,
A. H.
,
Tanaseichuk
,
O.
,
Benner
,
C.
and
Chanda
,
S. K.
(
2019b
).
Metascape provides a biologist-oriented resource for the analysis of systems-level datasets
.
Nat. Commun.
10
,
1523
.

Competing interests

The authors declare no competing or financial interests.

Supplementary information