ABSTRACT
Defects in ciliary signaling or mutations in proteins that localize to primary cilia lead to a class of human diseases known as ciliopathies. Approximately 10% of mammalian genes encode cilia-associated proteins, and a major gap in the cilia research field is knowing which genes to prioritize to study and finding the in vivo vertebrate mutant alleles and reagents available for their study. Here, we present a unified resource listing the cilia-associated human genes cross referenced to available mouse and zebrafish mutant alleles, and their associated phenotypes, as well as expression data in the kidney and functional data for vertebrate Hedgehog signaling. This resource empowers researchers to easily sort and filter genes based on their own expertise and priorities, cross reference with newly generated -omics datasets, and quickly find in vivo resources and phenotypes associated with a gene of interest.
INTRODUCTION
The Human Genome Project aimed to revolutionize our understanding of human health and disease by revealing the function of every gene (U.S. Department of Health and Human Services and Department of Energy, 1990). Fulfilling this goal requires a list of all the genes and tools available to manipulate and interrogate each gene's function. The list of genes arrived with the first draft human genome sequence in 2001, and tools such as siRNA or RNA interference enabled any gene product to be depleted, facilitating systematic interrogation of gene function (Lander et al., 2001; Venter et al., 2001). Propelled by the promise and achievements of the Human Genome Project, subsequent international efforts aimed to generate functional null alleles of every gene in multiple models including mouse, zebrafish and yeast (Austin et al., 2004; Auwerx et al., 2004; Giaever and Nislow, 2014; Dickinson et al., 2016; Bradford et al., 2022; Giaever et al., 2002; Groza et al., 2022; Ju et al., 2022). In mouse, mutations in 18,270 protein-coding genes are currently available, covering roughly three-quarters of the known protein-coding genes in the mouse genome, and a subset of these are being systematically phenotyped (Dickinson et al., 2016; Groza et al., 2022). Similarly, coordinated efforts in zebrafish now catalog mutations in 75% of the protein-coding genes in the zebrafish genome with 19,853 mutant/morphant fish available (Bradford et al., 2022). Although some genes attracted the attention of many studies, other genes have yet to be interrogated. A key question that hinders efforts is how to prioritize which of these genes to study. Individual investigators are motivated to focus on certain genes by a myriad of factors and should continue to do so. Additionally, with the abundance of data provided today by various -omics approaches, we can now prioritize unstudied genes based on features including their cell-type expression pattern, timing of expression, subcellular localization or binding partners. We built a tool that exemplifies how such prioritization can work, focusing on genes that encode proteins associated with primary cilia.
Primary cilia are microtubule-based organelles that project from nearly every cell in the body (Pazour and Witman, 2003; Hoey et al., 2012). Once thought to be vestigial organelles, primary cilia became a focus of studies in the past two decades because they were linked to several signaling pathways important for development and disease states (Pan and Snell, 2002; Pazour et al., 2002; Yoder et al., 2002; Huangfu et al., 2003; Han et al., 2009; Wong et al., 2009; Menezes and Germino, 2009; Goetz and Anderson, 2010). Primary cilia are structurally composed of nine microtubule doublets that form the backbone of the projection, known as the axoneme. The axoneme is covered by a phospholipid bilayer that is contiguous with the cellular plasma membrane. Proteins enriched in the ciliary membrane regulate its phospholipid composition, imparting a distinct membrane fluidity and protein composition compared with that of the plasma membrane (Bae et al., 2009; Chavez et al., 2015; Garcia-Gonzalo et al., 2015). At the base of the cilium is a ciliary transition zone that controls protein traffic into and out of the cilium (Gonçalves and Pelletier, 2017; Mercey et al., 2024). Together, the transition zone and ciliary membrane create a privileged compartment allowing primary cilia to function as a fundamental cellular organelle.
There is a pressing need to know the function of cilia-associated proteins, because defects in primary cilia structure or in proteins that localize to primary cilia lead to a class of genetic disorders known as ciliopathies (Hildebrandt et al., 2011; Anvarian et al., 2019; McConnachie et al., 2021). Some ciliopathies are rare, including Bardet–Biedl syndrome, orofaciodigital syndrome, Joubert syndrome and Meckel syndrome, whereas others are more common, such as polycystic kidney disease and Von Hippel–Lindau disease. Ciliopathies often present with a multitude of symptoms affecting the eyes, brain, skeleton, kidneys and liver, among other organs (Reiter and Leroux, 2017). Understanding primary cilia signaling and defects that lead to ciliopathies will be crucial for developing targeted therapies. In our collective drive to understand these ciliary signaling pathways, a comprehensive understanding of the proteins that make up the specialized ciliary protein milieu is critical.
To aid in these efforts, we developed an Excel-based spreadsheet with key information that investigators can use to sort and prioritize subsets of cilia-associated genes, depending on their research interests. We first compiled an inventory of genes encoding proteins associated with primary cilia. We next examined the in vivo vertebrate resources available for each of these genes in mouse and zebrafish, cataloging alleles that have and have not been analyzed. Lastly, we provided gene-driven, disease-driven and process-driven examples of how we envision this tool being a useful resource for researchers. We demonstrated the utility of this tool by applying it to one of the most common monogenic diseases – polycystic kidney disease. Because primary cilia have such a large impact on kidney function, we focused on kidney expression data for each of the ciliary genes in our spreadsheet (Calvet, 2002; Dell, 2015; Ma et al., 2017; Devlin and Sayer, 2019). Vertebrate Hedgehog (Hh) signaling requires intact/functioning primary cilia and potentially influences cystic kidney phenotypes (Huangfu et al., 2003; Maezawa et al., 2012; Silva et al., 2018; Ma et al., 2019; Hsieh et al., 2022). We overlaid results from two independent Hh screens on our tool to show how ciliary signaling can inform which genes are prioritized (Breslow et al., 2018; Pusapati et al., 2018). This spreadsheet will supplement individual investigators’ intuition, facilitating prioritization of cilia-associated genes and their roles in signaling pathways, and highlighting available vertebrate resources to further their studies in vivo.
RESULTS
Compiling a ‘parts list’ for primary cilia
The first step in creating our ciliary tool was compiling an inventory of proteins that localize to or are enriched in primary cilia. Over the past two decades, researchers using discovery-based approaches like RNA sequencing (RNA-seq), siRNA screens or proximity labeling identified numerous putative ciliary proteins. Although quite successful, such a variegated approach presents challenges: some datasets provide only a list of proteins, while others link the ciliary gene to disease, and others investigate the ciliary gene's role in altering known ciliary signaling pathways. Many datasets use different cell types or tissues, and growing evidence suggests that different cell types and tissues have specialized ciliary functions and likely unique ciliary composition. We combed through existing datasets and compiled a unified database of 1999 unique ciliary genes, representing ∼10% of all known human genes (see Table S1 and Materials and Methods) (Arnaiz et al., 2009, 2014; van Dam et al., 2013, 2019; Reiter and Leroux, 2017; Vasquez et al., 2021; Mehta et al., 2022; Elliott et al., 2023). As a unified database, we overcame hurdles including the lack of unity across the databases and manually curated missing or incorrectly cross-referenced data.
Mouse alleles and zebrafish mutants and resources, and their respective phenotypes
We next asked which of these ciliary genes have available in vivo vertebrate models (Fig. 1). Mouse and zebrafish are powerful and complementary organisms for studying cilia-related phenotypes. We focused on mouse and zebrafish as vertebrate models for studying primary cilia because of the vertebrate-specific link of primary cilia with Hh signaling. One factor distinguishing zebrafish and mouse is the timing of zygotic transcription. In mice, zygotic genome activation starts at the two-cell stage, resulting in embryonic lethality for many ciliary mutants, precluding their analysis in specific tissues without the use of conditional alleles. In zebrafish, early embryo development relies on maternally expressed genes and circumvents the early lethality seen in mice with ciliary mutants, allowing for straightforward analysis of phenotypes that would require conditional analysis in mice, such as cystic kidneys. However, as mammals, mice share relevant physiology with human development and disease states. Together, both mouse and zebrafish have numerous alleles available in their respective databases and serve as complementary systems to better our understanding of the function of cilia-associated genes.
Sankey chart of ciliary genes, resources and phenotypes in zebrafish and mouse. Sankey plot of the 1999 compiled human ciliary genes shows the zebrafish and mouse orthologs and reagents available as well as reported lethality phenotypes, and non-lethal phenotypes across species.
Sankey chart of ciliary genes, resources and phenotypes in zebrafish and mouse. Sankey plot of the 1999 compiled human ciliary genes shows the zebrafish and mouse orthologs and reagents available as well as reported lethality phenotypes, and non-lethal phenotypes across species.
Of the 1999 identified human ciliary genes, there are mouse orthologs for 1993 (98%) (Fig. 1, right). Mouse Genome Informatics (MGI) and MouseMine indicated that 2075 mouse alleles are available for 1960 of the identified human ciliary genes (Baldarelli et al., 2024; https://www.mousemine.org). Of these, 1780 are available as conditional alleles and 180 are constitutive mutants. There are no alleles currently available for 33 of the mouse genes. We mined MGI's cataloged information on each of the genes for reported phenotypes. We categorized phenotypes as embryonic lethal, peri-/post-natal lethal or non-lethal phenotypes, or as no reported phenotypes (Fig. 1, right). Over 40% (832) of the analyzed alleles present with embryonic (368) or early post-natal (464) lethality phenotypes in mice. Of these 832 lethal phenotypes, conditional alleles are available for 789 genes, allowing researchers to bypass lethal phenotypes with spatial and temporal control. An additional 782 human ciliary genes have mouse orthologs and alleles that present with non-lethal phenotypes. Lastly, 346 mouse alleles are currently available, with no reported phenotypes. Closer examination of these alleles will likely yield cilia-associated phenotypes.
We used the Zebrafish Information Network (ZFIN) to assess the availability of fish mutants for the 1999 unique human ciliary genes (Fig. 1, left). ZFIN identified 2321 zebrafish orthologs corresponding to 1828 (91%) of the identified human ciliary genes. Of these zebrafish orthologs, 972 are available with knockdown reagents, including CRISPR, morpholinos or transcription activator-like effector nucleases (TALENs). An additional 197 ciliary genes are available as transgenic insertions, and 452 are available as other mutants (e.g. indels, point mutants, complex substitutions, etc.). In total, zebrafish mutants and resources are available for 1621 (81%) of the identified ciliary genes. Of the total identified ciliary genes, 207 have corresponding zebrafish orthologs, but no reagent or mutant available in zebrafish. Roughly half of the 1621 disrupted zebrafish genes display embryonic lethality (46), non-embryonic lethality (85) or non-lethal (620) phenotypes (Fig. 1, left). These phenotypes correspond to 751 of the human ciliary genes. The low frequency of embryonic lethality (∼2%) likely reflects the strong maternal effect during early development in fish. Finally, 870 mutants or morphants exhibit no reported phenotype.
In total, 1594 of the 1999 identified human ciliary genes have alleles and knockdown reagents in both mouse and zebrafish (Fig. 2). Mouse alleles are exclusively available for 364 of the human ciliary genes with either no corresponding ortholog (162) or with no allele (202) in zebrafish. Conversely, 26 of the human ciliary genes have resources available only in zebrafish, with no current mouse alleles. At least one ortholog in either species corresponds to every gene identified in this human cilia-associated dataset, and alleles from either model cover 1984 (99%) of the unified list of human ciliary genes. Together, mouse and zebrafish provide excellent vertebrate systems for the study of ciliary genes, covering the majority of ciliary genes meriting investigation.
Zebrafish and mouse orthologs and alleles of the compiled ciliary genes. An UpSet plot showing intersections of the compiled human ciliary genes with their zebrafish and mouse orthologs and alleles.
Zebrafish and mouse orthologs and alleles of the compiled ciliary genes. An UpSet plot showing intersections of the compiled human ciliary genes with their zebrafish and mouse orthologs and alleles.
Ciliary regulators of Hh signaling
The primary cilium is crucial in vertebrate Hh signaling to regulate the balance of pathway activation and repression (Goetz and Anderson, 2010; Briscoe and Therond, 2013; Bangs and Anderson, 2017). Suppressed Hh signaling can result in developmental defects, while over-activation of Hh signaling can lead to pathologies such as medulloblastoma (Jiang and Hui, 2008). We included data from two independent high-throughput CRISPR-based Hh screens in NIH 3T3 cells, allowing a researcher to quickly determine which of the ciliary genes are positive regulators, negative regulators or attenuators of Hh signaling (in mouse fibroblast-based assays) (Breslow et al., 2018; Pusapati et al., 2018). Moreover, the researcher can also see what in vivo vertebrate models are available for studying these genes. Both screens identified 52 ciliary genes as positive regulators of Hh signaling. Mutant mouse alleles are available for all 52 ciliary genes identified as positive Hh regulators, with 49 available as conditional alleles. In mice, 47 of the alleles present with embryonic or peri-natal lethality, and only two mutants currently have no noted phenotypes. Similarly, zebrafish mutants or knockdown reagents are available for 41 of the ciliary genes identified as positive Hh regulators. Mutations in three positive Hh-regulating genes (gli2a, cc2d2a and smo) result in genetic lethality in zebrafish, and mutations in 17 positive Hh-regulating genes currently have no reported phenotypes. These functional data provide additional insights into one known signaling pathway linked with primary cilia. Because this resource is an editable spreadsheet with multiple points for cross referencing datasets, a researcher can easily customize the dataset to their interests in a similar manner.
Kidney phenotypes and expression data
The kidney is especially sensitive to phenotypes due to loss of function of cilia-associated genes, and many ciliopathies present with cystic kidney phenotypes (Devlin and Sayer, 2019; Santoni et al., 2020). We identified 594 genes in the dataset that displayed kidney-associated phenotypes in either zebrafish or mouse. We categorized these as kidney cysts or kidney phenotypes; in mouse, we subcatagorized the kidney phenotypes as being direct (e.g. enlarged kidney or abnormal kidney morphology) or indirect [e.g. altered renal function, such as abnormal blood urea nitrogen (BUN) or changes in urine content]. MGI and MouseMine reported cystic kidney phenotypes in 90 alleles, corresponding to 72 human genes (Fig. 3) (Baldarelli et al., 2024; https://www.mousemine.org). In addition, 268 alleles corresponding to 200 human genes displayed explicit kidney phenotypes, although renal cysts may not have been examined. Furthermore, we categorized an additional 96 alleles as having indirect kidney phenotypes, without any reported cysts or direct kidney phenotypes. These 96 alleles have a strong likelihood of presenting with renal phenotypes if the kidneys are examined more thoroughly.
Kidney phenotypes caused by ciliary mutants in mouse and zebrafish models. An UpSet plot showing intersection of kidney cysts and other kidney phenotypes reported in zebrafish and mouse mutants.
Kidney phenotypes caused by ciliary mutants in mouse and zebrafish models. An UpSet plot showing intersection of kidney cysts and other kidney phenotypes reported in zebrafish and mouse mutants.
Inclusion of zebrafish phenotypes complement what is known in mice, and together the two models cover more human genes than either system alone. ZFIN described 114 zebrafish mutants/morphants, corresponding to 109 human ciliary genes, as displaying kidney cysts. Another 78 zebrafish alleles presented with additional kidney phenotypes, although cystic kidneys were not reported explicitly. In mice, a conditional allele in kidney may lead to cystic phenotypes, whereas the null allele is embryonic lethal – precluding its study without a conditional allele. The ability of zebrafish mutants to bypass this early lethality reveals renal cystic phenotypes and interesting genes that could be worth pursuing in mice with kidney-specific Cre drivers and conditional alleles to examine their roles in cystogenesis. To this point, we identified 73 genes exclusively in zebrafish presenting with cystic kidney phenotypes, and an additional 51 genes had other kidney phenotypes. None of these genes had a similar report of cystic phenotypes nor direct/indirect kidney phenotypes in mice.
Knowing which genes are expressed in specific segments of the nephron and their relative expression pattern is key to understanding their contribution to kidney phenotypes. We overlaid two independent RNA-seq datasets onto our spreadsheet, allowing for quick analysis of expression patterns in the kidney (Ransick et al., 2019; Chen et al., 2021). The first dataset performed single-cell RNA-seq on adult mouse kidneys, whereas the second dataset performed full-length RNA-seq on microdissected mouse renal tubules. Together, these datasets reveal expression patterns of the cilia-associated genes in different segments of the mouse nephron.
Use cases
We envision a variety of uses for this tool. Below we provide several examples that use this tool to prioritize genes in a researcher's project. First, we cover how a researcher may use the tool to find available in vivo resources available for a gene of interest. Second, we show how a researcher may use this tool to investigate a ciliary disease of interest. Lastly, we cover how a researcher could use this tool to investigate a specific signaling pathway.
Use case 1: in vivo resources available for ciliary genes
The most straightforward use of this tool is to identify in vivo mouse and zebrafish models available for studying a cilia gene simply by searching the database by gene or protein name. For example, if a researcher wants to know more about aurora kinase A (Aurka), they can search the database for Aurka or use Microsoft Excel's filter function on the gene name column and type in ‘AURKA’. This filtered view enables the researcher to see in which datasets AURKA protein was identified as ciliary (as well as those datasets that did not identify this protein). The researcher can also see that this gene was found to be an attenuator of Hh signaling with a false discovery rate of less than 20%. They can then follow a link to the International Mouse Phenotyping Consortium (IMPC) to see the current phenotypes characterized in IMPC's comprehensive International Mouse Phenotyping Resource of Standardised Screens (IMPReSS) pipeline (https://www.mousephenotype.org/impress/). The researcher can see that Aurka is associated with embryonic lethality in mice, but that there are conditional alleles available, providing an opportunity to investigate deletion of this gene in the researcher's tissue of choice. In addition, the ZFIN ID links to all phenotypes associated with the aurka gene reported in zebrafish. Should the researcher decide to pursue further investigations in vivo, IMPC has ordering information for mouse alleles, and ZFIN provides ordering information for zebrafish mutants as well as CRISPR, morpholino and TALEN sequences (Bradford et al., 2022; Groza et al., 2022).
Another example might involve a researcher wanting to prioritize a gene among many hits in a screen. Perhaps they performed RNA-seq and want to identify the differentially expressed ciliary genes. They can compare their hits to those which are on the list of 1999 human cilia-associated genes. Inclusion of ENSEMBL IDs, HUGO Gene Nomenclature Committee (HGNC) IDs, mouse, human and zebrafish gene names in the spreadsheet facilitates cross referencing the datasets. To identify new areas of research, they might want to know which differentially expressed ciliary genes have no reported phenotypes. Filtering the dataset by ‘No reported phenotypes’ in mice provides a shortlist of follow-up candidates. They can further reduce this shortlist by filtering. For example, if they are interested in positive Hh regulators, they might select the gene Tubd1, which was found to be a positive regulator in both Hh screens, and pursue further investigations (Fig. 4A) (Breslow et al., 2018; Pusapati et al., 2018). This tool also shows that conditional and knockout alleles of Tubd1 are available in mice, as are transgenic insertions and point mutants in zebrafish, to further advance their studies (Fig. 4B). Even without hits from a screen, researchers can find value in knowing that over 400 ciliary genes are listed with no noted phenotypes in mice, yet with nearly 300 conditional mouse alleles available. Filtering the dataset based on reported zebrafish phenotypes or mutant availability can help prioritize or de-prioritize genes that a researcher may wish to investigate further in mice.
Identifying positive Hh regulators with no reported mouse phenotype. Screenshots of the ciliary tool. (A) Data filtered based on positive Hh regulators identified in the Pusapati et al. (2018) dataset. (B) Mouse and zebrafish alleles and reagents available for the positive Hh regulators with no reported mouse phenotype.
Identifying positive Hh regulators with no reported mouse phenotype. Screenshots of the ciliary tool. (A) Data filtered based on positive Hh regulators identified in the Pusapati et al. (2018) dataset. (B) Mouse and zebrafish alleles and reagents available for the positive Hh regulators with no reported mouse phenotype.
Use case 2: prioritizing for genetic interactors in polycystic kidney disease
Another way we envision this resource to be valuable is in the study of cilia-related diseases such as polycystic kidney disease. Mouse models indicate that ciliary proteins function to inhibit a pro-cystic pathway from within cilia (Ma et al., 2013; Lee and Somlo, 2014; Ma et al., 2017). Any gene on the ciliary gene list we compiled that is also expressed in the kidney could be a candidate driver of kidney disease. Knowing which genes have and have not been analyzed for kidney phenotypes is essential for guiding future studies. A researcher can sort by known phenotypes (or lack of phenotypes) and filter their list based on those phenotypes for which ciliary genes have expression above a set threshold in renal epithelia or specific segments of the nephron. By sorting the list of ciliary genes in this tool, a researcher can refine the list and generate a shortlist of candidate genes, and immediately know what mouse and zebrafish mutant alleles are already available.
Use case 3: Hh signaling
Further analysis of the commonalities and differences between the two independent Hh screens reveals potential areas of focus for Hh studies. Only three ciliary genes were identified in both screens as negative Hh regulators – Ptch1, Sufu and Gsk3b – all of which have conditional mouse mutants and zebrafish resources available (Fig. 5A). Gsk3b appears in this shortlist twice because zebrafish possess two orthologs, gsk3ba and gsk3bb; however, there are currently no zebrafish reagents or mutants available for the study of gsk3bb. Interestingly, eight genes (Rab23, Tulp3, Kif7, Rbx1, Edc4, Hgs, Xpo7 and Tsc2) were identified as positive Hh regulators in one screen, but as negative Hh regulators in the other screen (Fig. 5B). All eight genes present with embryonic or perinatal lethality, and all have conditional mouse mutants and zebrafish resources available for further study. These biological processes clearly intersect, and this tool can be used as an anchoring point to identify such points of similarity and distinction among different studies.
Commonalities and differences from two independent Hh screens. (A) List of ciliary genes identified in two independent screens as negative regulators of Hh signaling. (B) List of ciliary genes identified as positive regulators of Hh signaling in one screen and negative regulators of Hh signaling in another screen.
Commonalities and differences from two independent Hh screens. (A) List of ciliary genes identified in two independent screens as negative regulators of Hh signaling. (B) List of ciliary genes identified as positive regulators of Hh signaling in one screen and negative regulators of Hh signaling in another screen.
DISCUSSION
This compiled spreadsheet of ciliary genes serves as a resource for researchers, highlighting genetic vertebrate models available in both mice and zebrafish as well as a prototype for how to organize information in a manner that enables researchers to prioritize which genes to study. For this tool, we identified mouse alleles and zebrafish reagents for ∼98% and ∼81% of the 1999 human ciliary genes, respectively. Roughly 83% of the ciliary genes have phenotypes reported in either mouse or zebrafish, leaving 311 mouse alleles and 232 zebrafish mutants with no reported phenotype to date. With the majority of analyzed alleles presenting with notable phenotypes, the alleles with no reported phenotype are likely to present with phenotypes when carefully analyzed by experts.
Our hope is that investigators will edit and customize this tool to their research needs. In addition to mouse and zebrafish, there are many model organisms used in the study of primary and motile cilia, including Chlamydomonas reinhardtii, Caenorhabditis elegans, Xenopus laevis and Drosophila melanogaster. We chose to focus here on mouse and zebrafish models as we were motivated to concentrate on vertebrate primary cilia biology. Similar approaches could be performed analyzing orthologs in these other models. Although our resource's curation specifically focused on kidney expression and phenotypes, it is not exhaustive; other tissues or cell types of interest can be incorporated using parallel logic. For example, many ciliopathies affect the eyes and brain, so expression data from either tissue would be important for researchers interested in these tissues. Similarly, although we focus here on primary cilia, a similar approach could be applied to other organelles. The CilioGenics database provides integrated analysis of cilia-related data and contains 1996 of the 1999 cilia-associated genes we identified (Pir et al., 2024). We included the CilioGenics score and ranking, and investigators could incorporate additional data such as protein interactions or evolutionary conservation. Lastly, as additional ciliary proteins are identified, they, along with their mouse and zebrafish resources, can be manually added to this resource.
MATERIALS AND METHODS
Dataset acquisition
We created an all-inclusive list of ciliary genes from seven datasets: (1) SysCilia v1 (van Dam et al., 2013); (2) a list of established and candidate genes underlying ciliopathies (Reiter and Leroux, 2017); (3) CiliaCarta (van Dam et al., 2019); (4) SysCilia v2 (Vasquez et al., 2021); (5) the National Heart, Lung, and Blood Institute (NHLBI)’s primary cilium proteome database (Mehta et al., 2022); (6) the Embryonic Ciliome (Elliott et al., 2023); and (7) CilDB (Arnaiz et al., 2009, 2014). For CilDB, we narrowed our included list of genes to three conditions: (1) both human and mouse orthologs exist; (2) the number of ciliary evidences (number of studies in which the gene appears) in Homo sapiens is greater than or equal to three; and (3) the number of ciliary evidences in Mus musculus is greater than or equal to two. All databases contained a reference to the human ENSEMBL ID for each gene or protein in the dataset, except for NHLBI's primary cilium proteome database, which referenced only the mouse gene name. Many databases had significant overlap as well as distinct information for each gene. We anchored all genes using the Alliance of Genome Resources and removed duplicates, resulting in a unified database of 1999 unique ciliary genes (Bult and Sternberg, 2023; Alliance of Genome Resources Consortium, 2024). For each of the unique ciliary genes, we included the ciliogenics score and rank from the Explore data section of the CilioGenics database (https://ciliogenics.com; Pir et al., 2024).
A major gap in many of the databases centered on the lack of orthologs between species. For example, ENSEMBL does not currently list human GAPDH as an ortholog for mouse Gapdh. Therefore, any dataset that used the ENSEMBL database to identify orthologs may falsely indicate no orthologous genes found between species. To fill this gap, we manually curated missing orthologs in our database.
Mouse alleles and phenotypes
To better understand mouse models and phenotypes corresponding to our gene list, we used the Alliance of Genome Resources (https://www.alliancegenome.org), MGI (https://www.informatics.jax.org) and MouseMine (https://www.mousemine.org). We first generated a list of orthologs using the Alliance of Genome Resources’ SimpleMine tool. We entered all of the unique HGNC IDs and used the output for mouse orthologs to obtain a list of all mouse orthologs for those genes, resulting in 2075 unique MGI numbers corresponding with 1993 HGNC IDs. We performed a batch query through MGI's website using the unique MGI numbers. For available mouse allele information, we exported the resulting batch query table to MouseMine. In MouseMine, we clicked Manage Columns and added information on Alleles, including Symbols, Primary Identifier, Allele Type and Allele Attributes. We used the resulting information from Allele Attributes to determine the availability of mouse alleles such as conditional, nulls or other mutant alleles. In MGI, we also exported mammalian phenotypes associated with each of the MGI Numbers. We queried this list based on lethality, cystic kidneys, direct or indirect kidney phenotypes, or any other noted phenotypes.
Zebrafish reagents and phenotypes
We generated a list of zebrafish orthologs using the Alliance of Genome Resources SimpleMine tool. From this, we found that 2321 zebrafish orthologs exist for 1828 of the genes in our list. We used ZFIN to determine available fish mutants and morphants, including transgenic insertions, CRISPR, TALENs, morpholinos and other reagents (Bradford et al., 2022). We also used ZFIN to analyze phenotypes associated with the ciliary dataset. We found that 846 zebrafish orthologs have reagents available, corresponding with 751 human genes in our list.
UpSet visualization of intersecting alleles and phenotypes
UpSet visualization was performed using UpSetR Shiny App (https://gehlenborglab.shinyapps.io/upsetr/) by inputting HGNC IDs corresponding to specific orthologs, alleles and phenotypes (Lex et al., 2014).
Acknowledgements
The authors are grateful to David Shaw (The Jackson Laboratory, Bar Harbor, ME, USA) from Mouse Genome Informatics (MGI) Support for guidance extracting mouse data. The authors thank members of the Department of Human Genetics including the Caspary laboratory and Quinn Eastman for helpful discussions and edits.
Footnotes
Author contributions
Conceptualization: R.E.V.S., T.C.; Methodology: R.E.V.S., T.C.; Data curation: R.E.V.S., T.C.; Writing - original draft: R.E.V.S., T.C.; Writing - review & editing: R.E.V.S., T.C.; Visualization: R.E.V.S., T.C.; Supervision: T.C.; Funding acquisition: R.E.V.S., T.C.
Funding
This work was supported by the National Institute of General Medical Sciences [K12GM000680, R35GM148416], National Institute of Diabetes and Digestive and Kidney Diseases [F32DK127848, R01DK128902] and National Science Foundation [BIO DFG 2329634]. Open Access funding provided by National Institute of Diabetes and Digestive and Kidney Diseases. Deposited in PMC for immediate release.
Data availability
All relevant data can be found within the article and its supplementary information.
References
Competing interests
The authors declare no competing or financial interests.