The GenitoUrinary Development Molecular Anatomy Project (GUDMAP) is an international consortium working to generate gene expression data and transgenic mice. GUDMAP includes data from large-scale in situ hybridisation screens (wholemount and section) and microarray gene expression data of microdissected, laser-captured and FACS-sorted components of the developing mouse genitourinary (GU) system. These expression data are annotated using a high-resolution anatomy ontology specific to the developing murine GU system. GUDMAP data are freely accessible at www.gudmap.org via easy-to-use interfaces. This curated, high-resolution dataset serves as a powerful resource for biologists, clinicians and bioinformaticians interested in the developing urogenital system. This paper gives examples of how the data have been used to address problems in developmental biology and provides a primer for those wishing to use the database in their own research.
Mammalian development is complicated; each of the thousands of embryological events that operate in sequence to make a mature body typically involves the concerted action of many cells in response to a multitude of simultaneous environmental signals that regulate hundreds of different genes. Traditional gene-by-gene analyses have served developmental biology well in the era when its main goal was discovering basic developmental principles from whatever organism might illustrate a principle best. Now that the focus of many developmental biologists has moved on to understanding in detail the development of specific, clinically relevant body systems with a view to preventing or repairing malformation, the gene-by-gene approach is no longer sufficient and it is necessary to consider the genome as a whole and to take a systems biology approach (Bard, 2007; Davidson, 2009; Rumballe et al., 2010). This requires comprehensive databases of the expression of thousands of transcripts, annotated at high resolution and made easily accessible to manual and automated queries.
The development of the genitourinary (GU) system is attracting much attention for two reasons. First, humans suffer from a wide range of congenital abnormalities of urogenital organs, which can damage this system directly (kidney failures and infertility syndromes) (Schedl, 2007) and also cause indirect damage to other organs, for example because failing kidneys can raise systemic blood pressure (Hoy et al., 2005). Second, damage to the GU tract from non-developmental causes, for example infection, is repaired poorly and there is an urgent need to develop novel approaches to GU tract organ regeneration (Unbekandt and Davies, 2009). A comprehensive description of gene expression during urogenital development will be central to understanding the mechanisms of congenital urogenital abnormalities and to the development of better ways to treat them.
GUDMAP aims to provide a fundamental description of gene expression in the developing mouse GU system, generating gene expression data that will ‘enable, facilitate and stimulate research’ into understanding the GU system (McMahon et al., 2008). In addition, microarray analyses and the generation of transgenic mouse strains with genetic markers will bolster the overall aim of defining molecular and cellular anatomy through developmental time. The specific focus on the GU system combined with the range of data (in situ and microarray across a range of developmental time points) distinguishes GUDMAP from other gene expression databases.
Some gene expression data for the developing murine GU system are already publicly available, but they are dispersed across various resources that cover a range of species, organ systems and data types [GEO (Barrett et al., 2009), ArrayExpress (Parkinson et al., 2009), GXD (Smith et al., 2007), EMAGE (Richardson et al., 2010)]. The EuReGene database (http://www.euregene.org) holds in situ gene expression data for the developing [17.5 days post-coitum (dpc)] and adult mouse kidney, but not for other components of the GU system or time points of development. Although funding for the EuReGene project ended in 2009, there are plans to assimilate the data into the GUDMAP database and make it available once again. Additionally, the Kidney Development Database (Davies, 1999) (http://golgi.ana.ed.ac.uk/kidhome.html) holds text descriptions from published developmental studies that have a bearing on kidney development.
Here, we present an overview of the GUDMAP database. Our goals are to describe how the data are generated by expert GUDMAP consortium laboratories, how the quality of the data and its annotation are maintained by a dedicated editorial team and how the data are made available via the GUDMAP website (www.gudmap.org).
MATERIALS AND METHODS
Description and generation of GUDMAP gene expression data
The database currently holds in excess of 8700 in situ gene expression entries, covering nearly 3200 different genes. The vast majority is RNA in situ hybridisation (ISH) data, both from wholemount preparations of intact GU organs and from histological sections. Currently, 2894 unique genes have been analysed by wholemount in situ hybridisation (WISH) and 663 genes by section in situ hybridisation (SISH). Further information about the number of data entries and categories of information in the GUDMAP database can be found at www.gudmap.org/Website_Reports/Stats/gudmap_stats.html. The database also contains data from a small number of immunohistochemistry (IHC) assays, as well as in situ expression data from a small number of transgenic reporter strains.
GUDMAP currently holds ~300 individual microarray entries, which comprise samples representing specific kidney and urogenital subcompartments. This list of samples will soon include a 90-array timecourse on all the major sorted cell populations from the gonad spanning the period when its fate is decided (B. Capel and S. Jameson, Duke University, NC, USA, personal communication; data currently under curation). These were isolated using manual microdissection, laser capture microdissection (LCM) or fluorescence-activated cell sorting (FACS)-based techniques from fluorescently tagged transgenic animals (Brunskill et al., 2008). Microarray data files deposited on the website include both raw data (.CEL files) and normalised data.
GUDMAP gene expression data are being generated by a cycle of progressively more detailed studies. One approach taken has been to begin with complete gene sets (e.g. transcription factors; secreted molecules and their receptors) and perform an initial low-resolution analysis of expression in the entire GU tract using wholemount in situ screens. This is followed by prioritisation of subsets of genes for re-analysis at higher resolution using SISH, usually on the basis of an expression pattern that suggests compartment specificity in a particular organ (e.g. kidney, bladder, gonad). Another approach has been to begin with microarray analyses of specific time points or GU tissues followed by bioinformatic prioritisation of genes enriched in key temporospatial compartments for validation and further characterisation using SISH. Information from these detailed studies is used to focus further microarray assays on cells from new and existing transgenic mice carrying appropriately expressed reporters. An important component of this approach is to ensure that in situ and microarray data are fully integrated for interrogation and analysis. As the data have grown, the ontology has altered with it and our capacity to more accurately annotate entries has increased. As a result, existing entries are being re-analysed and annotated to a much higher level of detail.
Organisation of GUDMAP data in the database
The GUDMAP database is organised around genes, anatomical parts of the GU system and developmental stages. The primary identifiers are as follows: for genes, Mouse Genome Informatics (MGI) gene IDs; for anatomical parts, GUDMAP ontology terms with eMouse Atlas Project (EMAP) IDs; for stages, Theiler stage (TS) numbers (Theiler, 1989). Gene symbols and gene names are updated every few days from MGI.
Each in situ entry in the database contains the expression data for a single gene assayed with a defined probe in a single sex at a single Theiler stage.
GUDMAP recognises that it is an important feature to be able to provide a full description of the probes used in database entries to enable replication by the community. Therefore, all methodologies for probe design and the sequence of the specific probes used or generated are indicated in each entry in the database. The probe section of an entry page is highlighted in Fig. 4C. A large number of WISH and SISH entries have been hybridised with PCR-generated riboprobe sequences, as previously described (Georgas et al., 2009; Georgas et al., 2008). A link is available under ‘Resources’ to ‘UQ GUDMAP – Probe Design’ (http://uqgudmap.imb.uq.edu.au/tools.html) (Thiagarajan et al., 2011). ‘UQ GUDMAP Tools’ is provided and maintained by GUDMAP consortium members at the University of Queensland, Australia, and provides access to the tools used by this group for the design of PCR primers for the generation of locus-specific riboprobes for ISH analysis.
Each microarray entry contains gene expression data for tissues or cell populations sampled at a single developmental stage. Hence, there are multiple entries for many genes, reflecting differences in tissue/organ stage or type. The GUDMAP microarray data follow the same MIAME-compliant conventions as used in GEO (Barrett et al., 2009) and all entries are secondarily submitted to GEO. Microarray entries are called ‘samples’ and are grouped into ‘series’ that are essentially experiments that comprise groups of samples in a study.
GUDMAP ontology and annotation
GUDMAP has developed a high-resolution ontology to describe the anatomical structure of the developing murine GU tract (Little et al., 2007). Written as a partonomic, text-based, hierarchical ontology for both the embryonic and postnatal stages, it has been developed as an expansion of the existing EMAP ontology (Baldock et al., 2003; Bard et al., 1998; Davidson and Baldock, 2001) (http://www.emouseatlas.org/). Encompassing TS17-28 of development, (representing ~10 dpc to sexually mature adult), the ontology denotes structures that can be identified histologically (http://www.gudmap.org/Resources/Ontologies.html). Anatomical structures are displayed as part of one or more structural parents. For example, at TS23, the early proximal tubule is part of the renal cortex, which in turn is part of the metanephros (kidney). This organisation allows easy and flexible use of the ontology for annotation and searching.
The anatomy ontology is based on the standard Theiler developmental stages. The use of stages (e.g. TS23) rather than developmental time (e.g. 15 dpc) gives more precision to the ontology and thus to annotated gene expression entries in the database, although for the majority of entries developmental time is also specified. This is because embryos, even those in the same litter, can vary in developmental rate so that at the same developmental time several developmental stages can be represented. The GUDMAP ontology is currently being incorporated into the existing EMAP ontology, which is used by other resources such as the mouse Gene Expression Database (GXD) (Smith et al., 2007) and EMAGE (Richardson et al., 2010). To this end, each term in the GUDMAP ontology is given a unique EMAP ID.
Each data-producing laboratory uses the anatomy ontology to annotate gene expression. For a given gene, expression in anatomical structures is annotated as present, not detected, uncertain or not examined. Pattern information can be applied to refine the annotation using a controlled vocabulary of terms such as regional, spotted, graded and ubiquitous (for definitions, see the GUDMAP glossary at http://www.gudmap.org/Help/Glossary.html#Expression_Pattern). The hierarchal structure of the ontology can be used to infer expression between terms with ‘part_of’ relationships (http://www.obofoundry.org/ro/#OBO_REL:part_of) (Smith et al., 2005). Thus, annotations of differing resolution can be easily accommodated and when searches are performed inferences can be made up or down the hierarchy (http://www.gudmap.org/Help/Glossary.html#Inferred_Annotation).
The GUDMAP Editorial Office (EO) provides assistance with the data submission process and checks the data for completeness and compliance, including definitive checking of the identity of probes used in ISH. Full details of the EO protocols can be found at http://www.gudmap.org/Research/Protocols/EO.html. The EO ensures that data are immediately made publicly available and easy to access on the web interface. Additionally, the EO helps to maintain open networks of communication between consortium members and users. For example, as far as time allows, editors are on-hand to assist users with searches that are more complex than those provided on the standard interface.
The starting point for using GUDMAP is the gene expression database homepage (http://www.gudmap.org/gudmap/pages/database_homepage.html) (Fig. 1). The database is organised around genes, anatomical terms and developmental stages (see Materials and methods, GUDMAP entries) and can be queried by gene (via gene symbol), by anatomy (via anatomy term), by molecular function (via Gene Ontology annotation) and by accession ID for genes, probes and entries (MGI, Entrez, Ensembl, etc.). More complex anatomical queries can be made using a ‘Boolean anatomy search’. Researchers can also browse through the in situ, transgenic and microarray data in the database. The website also provides tutorials that describe how the GU system develops in mice and organ summary pages that provide an overview of gene expression in individual structures in the GU system. The website also contains a series of demonstrations (http://www.gudmap.org/Help/Demos.html) and a step-by-step tutorial (http://www.gudmap.org/Help/Using_GUDMAP_Tutorial.html), which illustrate how to use the GUDMAP website and database to answer relevant and interesting biological questions, demonstrating the functionality and value of GUDMAP, as well as being useful user guides.
Finding expression data for a given gene
To find where a gene is expressed during development, a simple gene query can be used to return the ‘expression summary’ for that gene (Fig. 2). The gene expression summary gives access to microarray expression profiles (Fig. 3), in situ expression entries (Fig. 4), in situ image galleries (Fig. 5) and disease associations for that gene. Figs 1, 2, 3, 4, and 5 illustrate how to execute a simple query and introduce the main features/pages of the GUDMAP expression database.
Using the Boolean anatomy search to find genes that mark a structure of interest
From the expression database homepage (Fig. 1), it is possible to make a simple ‘query by anatomy’. This searches for entries that have either a direct or inferred annotation to a particular anatomical component of interest. To perform a more complex anatomical query, the Boolean anatomy search can be used. This tool can search for database entries or genes across multiple anatomical components, across a range of developmental stages and with particular patterns and locations of expression. For example, it can be used to search for the co-expression of genes in components of interest or to identify genes that might mark a particular structure. In the latter case, to find genes with expression restricted to the ureteric tip, a Boolean search would look for genes with ‘present’ expression in the ureteric tip, but ‘not detected’ expression in adjacent structures (ureteric trunk, cap mesenchyme, etc.). A demonstration of the use of the Boolean anatomy search in this way is given on the website (http://www.gudmap.org/Help/Demos.html#Demo_2; http://www.gudmap.org/Help/Using_GUDMAP_Tutorial.html#Boolean).
GUDMAP transgenic mouse strains
The project has generated (and continues to add to) a resource of novel transgenic mouse strains carrying genetic markers that are either currently available or will become so in due course (Fig. 6). Strains currently posted are: Id3, Upk1b, Klf3, Osr2, Sox18, Cyp11a1, Ifitm3, Akr1b7, Upk3a, Tmem100 and S100b. These transgenic strains have been chosen mainly on the basis of in situ gene expression results in order to facilitate more detailed gene expression analyses as well as functional studies in the GU system. Verification, preliminary characterisation and information about availability are given on the website (http://www.gudmap.org/Resources/MouseStrains/index.html). All transgenic mice strains generated by the GUDMAP consortium that are currently available can be ordered (http://jaxmice.jax.org/index.html).
Tutorials of genitourinary development and organ summary pages
To help interpret GUDMAP data, a description of GU development in the mouse is given on the tutorial pages (http://www.gudmap.org/About/Tutorial/Overview.html). In addition, the specific development of individual components of the GU system is described on the organ summary pages (http://www.gudmap.org/Organ_Summaries/index.html), along with links to expression data for these components. These pages are supplemented with schematic diagrams (Kylie Georgas, The University of Queensland, Australia) that serve to illustrate the developing components of the mouse GU system across different stages. For users seeking to identify genes that have expression patterns that are similar to those of their own interest, these schematics provide a simple visual representation that relates the spatial distribution of gene expression to anatomical ontology terms that can be readily searched in the database.
Disease and phenotype data
Further development of the GUDMAP resource has involved obtaining disease-gene associations and building these data into the GUDMAP database architecture. This enables disease data to be integrated with the existing gene expression data, enhancing GUDMAP as a research tool for GU development and disease. A separate part of the web interface (http://www.gudmap.org/gudmap_dis/index.jsp) allows expression data to be accessed by searching for genes that are associated with a disease or phenotype of interest, or by finding genes that share a similar pattern of phenotype or disease association with a gene of interest. Disease-gene associations are taken primarily from Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/omim) (Amberger et al., 2009), published by the National Center for Biotechnology Information (NCBI), and phenotype associations are taken from MGI (http://www.informatics.jax.org/phenotypes.shtml). Disease and phenotype associations are determined for all genes that have in situ data in GUDMAP. Disease-gene associations are obtained in two ways: directly from NCBI and by text-mining OMIM entries for orthologous human gene symbols. If a gene symbol is present, an association between the disease and the gene is assumed and the entry can be assessed manually to confirm the nature of the association. Full details of the methods used to obtain these associations are described on the GUDMAP website (http://www.gudmap.org/gudmap_dis/Dis_Info.jsp).
Submission of data to GUDMAP
Until recently, only data from the GUDMAP consortium have been accepted for publication in the database. Now it is possible to enter data from other sources. If you would like to submit data, please contact the GUDMAP EO (email@example.com); high-resolution SISH data will be given priority.
Key outcomes of GUDMAP data analysis
The data contained within the GUDMAP database have already proven valuable in advancing the understanding of morphology and morphogenesis within the GU tract. Analyses have led to the identification of key markers of specific time points or compartments (Brunskill et al., 2008; Thiagarajan et al., 2011), the identification of previously unidentified subcompartments (Chiu et al., 2010; Georgas et al., 2009; Mugford et al., 2009) and the association between gene networks and key developmental processes (Brunskill et al., 2008; Chiu et al., 2010). It has also been referred to, or utilised by, a number of other studies investigating gene expression and development (Combes et al., 2009; Dallosso et al., 2009; Gerber et al., 2009; Parreira et al., 2009; Shah et al., 2010; Surendran et al., 2010).
Generation of an atlas of gene expression in the developing kidney
Brunskill et al. generated microarray gene expression data for 15 separate subcompartments of the developing kidney collected using either LCM of anatomical compartments based on lectin staining or FACS from fluorescently tagged transgenic mice (Brunskill et al., 2008). Analysis of these data enabled the mapping of changes in expression profile during successive stages of nephron development. Employing precise sampling of tissues for microarray expression analysis, the authors showed that developmentally related components display a high degree of correlative gene expression and that certain structures, for example the early proximal tubule, show highly restricted gene expression patterns. Furthermore, the bioinformatic identification of transcription factor binding sites in the proposed minimal promoters of genes enriched in specific compartments of the developing kidney allowed the prediction of key genetic pathways crucial for the development of renal subcompartments. This included the identification of downstream targets of HNF1B in the early proximal tubule-enriched gene set and overlapping sets of Lcf/Lef targets in other compartments of the kidney. As such, this study represents the most comprehensive atlas of temporospatial gene expression for any developing organ. Microarray data were validated using ISH, the results of which are available from the GUDMAP database; for example, Entry ID GUDMAP:9112 (Prnp) (http://www.gudmap.org/gudmap/pages/ish_submission.html?id=GUDMAP%3A9112). The results of the analysis reported by Brunskill et al. (Brunskill et al., 2008) can be accessed on the GUDMAP website (http://www.gudmap.org/gudmap/pages/genelist_folder.html) and researchers can also download raw microarray data for their own analyses (see http://www.gudmap.org/Help/Download_Help.html).
Subcompartment-restricted anchor genes for the prioritisation of reporter mouse strains
The microarray data generated by Brunskill et al. (Brunskill et al., 2008) will allow continued bioinformatic interrogation for a number of purposes. Already, it has facilitated a key aim of the GUDMAP consortium – the identification of genes that specifically mark single developmental compartments within the GU tract. Such genes will serve to prioritise the development of transgenic reporter mice of value to the research community. Using a stringent bioinformatic filter, Thiagarajan et al. (Thiagarajan et al., 2011) selected ~250 putative anchor genes, which are defined as genes with expression restricted to one of 11 subcompartments within the developing mouse kidney. Two hundred of these genes were validated using high-resolution SISH, thereby identifying 37 anchor genes across six compartments (early proximal tubule, medullary collecting duct, ureteric tip, renal vesicle, loop of Henle and renal corpuscle). Five anchor genes were identified for the medullary collecting duct (Gsdmc4, Upk3a, Fam129a, Clmn and AI836003), four in the renal corpuscle [Gpsm3, Tdrd5, RIKEN clone C230096N06 (MGI 2416283) and Vip] and one each in the ureteric tip (Slco4c1), renal vesicle (Npy) and loop of Henle (Umod). Reflecting the initial observation of a strong cluster of proximal tubule genes, 25 of these anchor genes showed specific expression in the early proximal tubule.
The fully annotated expression patterns, probe details and ISH images for this complete set of validated genes are available on the GUDMAP website. For example, Spp2 in the early proximal tubule can be found in Entry ID GUDMAP:9147 (http://www.gudmap.org/gudmap/pages/ish_submission.html?id=GUDMAP%3A9147) and Umod in the loop of Henle can be found in Entry ID GUDMAP:9104 (http://www.gudmap.org/gudmap/pages/ish_submission.html?id=GUDMAP%3A9104). Precise, curated descriptions of the molecular probes used for in situ assays are given on the GUDMAP site. This information can be used to determine which transcripts are being assayed, for example in the context of jointly analyzing in situ and microarray results. Microarray and in situ data for a given gene are best accessed from the gene expression summaries, where there are links to extensive information on the genes, genomic locations, phenotypes, functions and pathway information at the MGI, University of California at Santa Cruz (UCSC) Genome Browser and Kyoto Encyclopedia of Genes and Genomes (KEGG) sites. Researchers can use GUDMAP data, in the context of information from these resources, to manually explore relationships within gene lists produced by their own analyses or to explore the key results from bioinformatics analyses carried out in silico.
Use of GUDMAP data to interrogate a key developmental process
The availability of microarray data for individual anatomical compartments is a key strength of GUDMAP. Further analysis of genes enriched in a specific compartment based upon microarray data has assisted in the subdivision of these anatomical compartments into previously unidentified molecular subcompartments. A key example of this was reported by Georgas et al. (Georgas et al., 2009), who used GUDMAP microarray expression to identify genes with enriched expression in the renal vesicle and decreased expression in Wnt4 mutants. Sixty-three genes were identified and subjected to high-resolution SISH, the result of which indicated a subdivision of this structure into distal renal vesicle (closest to the adjacent ureteric tip; e.g. Papss2, GUDMAP:8960, http://www.gudmap.org/gudmap/pages/ish_submission.html?id=GUDMAP%3A8960) and proximal renal vesicle (farthest from the adjacent ureteric tip; e.g. Tmem100, GUDMAP:8888, http://www.gudmap.org/gudmap/pages/ish_submission.html?id=GUDMAP%3A8888). From their SISH analysis, the authors went on to show that the distal part of the late renal vesicle, marked by Lhx1 and Bmp2, fuses with the adjacent part of the collecting system, the ureteric tip, and that this process involves the insertion of cap mesenchyme-derived cells that express renal vesicle markers into the ureteric tip at the late renal vesicle stage. This is earlier than had previously been proposed.
Mugford et al. used WISH data contained in the GUDMAP database as the starting point for an investigation into the compartmentalisation of the nephron progenitor population (Mugford et al., 2009). Using the results of the genome-wide low-resolution WISH screen of transcription factors available on the GUDMAP site, the authors identified 45 genes expressed in the cap mesenchyme. These genes were selected for further analysis by high-resolution SISH and annotated using the GUDMAP anatomy ontology (http://www.gudmap.org/Resources/Ontologies.html). Based on the annotations of the in situ results, the authors revealed ten distinct categories of gene expression patterns, indicating a complexity of cell states that had not previously been appreciated. From this categorisation, Mugford et al. were able to begin to dissect the roles of transcription factors and signalling pathways in spatially and molecularly distinct cell populations in the cap mesenchyme during the earliest stages of nephron induction and differentiation.
A final example comes from the lower urinary tract, for which microarray analyses comparing its different regions at embryonic days 13 and 14 and subsequent WISH analyses have been undertaken. This revealed novel domains of gene expression on the dorsal genital tubercle. Bioinformatics interrogation of this gene cluster predicted a network of Wnt7a-associated genes involved in the epidermal development of the genital tubercle (Chiu et al., 2010).
The GUDMAP website provides free access to the most comprehensive gene expression dataset of the GU system in the developing mouse. Combining both in situ and microarray data it serves as a powerful resource for developmental biologists, clinicians and bioinformaticians alike. GUDMAP data have already provided insight into the progression of gene expression during nephrogenesis, the genetic regulatory mechanisms of kidney development and into gene expression patterning in the early nephron. The illustration of the resource that we provide here serves as a primer for researchers with an interest in the developing GU system.
We acknowledge help from members of the GUDMAP consortium whose data are represented on the GUDMAP website, including Blanche Capel, Samantha Jameson, Kevin Gaido, Sean Grimmond, Peter Koopman, Jim Lessard, Chad Vezina and Pumin Zhang and members of their groups. We especially thank M. Todd Valerius, Deborah Hoshizaki and Elizabeth Wilder for their contributions to the website. M.H.L. is a Principal Research Fellow of the National Health and Medical Research Council, Australia. This work is supported by the National Institutes of Health via DK070136 (M.H.L.), DK070200 (D.D.) and DK070181 (A.P.M.).
Competing interests statement
The authors declare no competing financial interests.