The Simons Foundation launched its autism research initiative (SFARI; http://sfari.org) in 2003 to generate new insight into the causes of autism spectrum disorder, and to advance diagnosis and treatment. For readers of this journal, perhaps the most relevant foundation project is SFARI Gene (http://gene.sfari.org/), which will be described more fully below. This evolving database, funded by the foundation and created by Mindspec, Inc., houses comprehensive information on the human genetics of autism, as well as on relevant mouse models of the disorder.
In addition to SFARI Gene, the foundation currently funds approximately 90 principal investigators in three principal areas: gene discovery, molecular mechanisms, and cognition and behavior. SFARI’s flagship project, the Simons Simplex Collection (https://sfari.org/simons-simplex-collection), engages 12 academic medical centers across North America to recruit 3000 ‘simplex’ families with autism. Each such family has one child affected with autism, and both parents and at least one sibling are unaffected. Biospecimens are currently being banked and, upon completion of the recruitment effort in 2011, a complete database (https://sfari.org/sfari-base) will house exhaustive phenotypic information on each of the probands, as well as information generated by genome-wide surveys of copy number variation. Additional genetic data, and other data, will be imported as new research projects on the simplex collection are completed. This repository of human data is complemented by SFARI Gene, which collects valuable information from the published literature about other human genetic studies linked to corresponding animal models of autism.
Overview of SFARI Gene
It is becoming increasingly clear that autism is linked to many more genes than previously anticipated. The complex genetic architecture of neuropsychiatric disorders makes developing a genetic database for them both complicated and necessary in order to keep track of the numerous susceptibility genes uncovered by recent high-throughput methods. Several types of genetic variation, such as common variants of small effect (single nucleotide polymorphisms, SNPs), as well as rare single-gene mutations of large effect, can contribute to autism. Additionally, structural variations in the genome, such as microdeletions or duplications, are also associated with the disorder. In SFARI Gene, we have developed an integrative model and built a publicly available web portal for the ongoing collection, curation and visualization of genes linked to the disorder (Basu, 2009). The content of this resource originates entirely from the published scientific literature.
One important and unique feature of SFARI Gene is that it provides detailed annotation of the candidate genes to show their relevance to autism. All support studies are also included in the gene display page, with links to the abstract of the original articles in PubMed. Additionally, in order to provide a panoramic view of the molecular role of the autism-related genes, the reference section also includes citations that are (1) highly referred by the scientific community, and (2) recent recommendations of studies on the function of genes. A dedicated team of scientists at Mindspec continuously updates the annotations of the SFARI Gene entries with selected citations from the primary scientific literature. A panel of external advisors will soon provide additional annotations on the strength of the evidence implicating each gene in autism.
A new module: animal models of autism
SFARI Gene provides a comprehensive collection of animal models linked to autism. As in the gene module of the resource, the content of the animal model module is extracted from the published scientific literature and is manually annotated by expert biologists (notably including models that were generated even before the gene in question was implicated in autism). The attributes of the animal models in SFARI Gene include the detailed description of the type of genetic construct (knockout, knock-in, knockdown, overexpression, conditional, etc.), together with the wide spectrum of phenotypic features reported in the scientific literature.
To describe the various animal models of autism in a common annotation platform, we built an additional repository of standardized terms/controlled vocabulary of mouse behavior, and other molecular features that are relevant for the biology of autism. The core behavioral features of autism involving higher order human brain functions, such as social interactions and communications, can only be approximated in animal models; therefore, our annotation strategy includes other autism-associated traits, such as seizures and circadian rhythms, which are heritable and more easily quantified in animal models. This data model attempts to capture and organize mouse phenotypic data in clinically relevant domains that are used to define autism. To this end, we developed PhenoBase, a look-up table for annotating models with controlled vocabulary, for describing animal models in a systematic fashion. One important feature of our annotation model is the inclusion of experimental paradigm in the phenotypic profile page. This feature of SFARI Gene provides a crucial assessment of the strength and specificity of the animal models. An example of an annotated animal model entry in SFARI Gene is shown in Fig. 1.
Searching SFARI Gene for animal models
The animal model module is seamlessly integrated within the gene portal so that the data can be searched and retrieved using a single search engine. This configuration essentially links two different types of datasets: genes and animal models. From the search page, users select the dataset and navigate based on their requirements (Fig. 2). The information can be searched and displayed in several ways, including complex Boolean queries. Animal model entries are displayed at three levels. At the first level of display in the summary row format, each entry is annotated with gene symbol, gene name, model species, synteny, total number of model reports, and total number of animal models, together with a primary PubMed reference reporting the generation of the model for the candidate gene. Additionally, within the summary line display, a link is provided to the human study for the corresponding gene. Each entry is further displayed at a detail level showing (1) the gene summary, with links to external databases such as Mouse Genome Informatics (MGI; http://www.informatics.jax.org) and Allen Brain Atlas (http://www.brain-map.org/); (2) references; and (3) a list of animal models. Finally, at the third level, for each model, the extended phenotypic profile is organized under 16 categories that are relevant for the biology of autism. The animal model phenotype is shown as changes observed, no changes, or not reported (Fig. 1).
For direct participation of the scientific community, SFARI Gene offers an ‘Edit’ function that permits researchers to add new annotations/comments to an entry. Upon approval by a moderator, the new annotation becomes part of the entry. We view the interactive tools that accompany this modular database as essential in the effort to create a site that fully addresses the complexities of autism.