ABSTRACT
Many genes that regulate development share a 180 bp DNA sequence, called the homeobox, encoding a 60 amino acid DNA-binding domain ( McGinnis et al., 1984c; Scott and Weiner, 1984). Because the homeobox is long enough to hybridize to related, but different, genes, it has been a powerful tool for discovering developmental regulators. This year is the 40th anniversary of the first homeobox report. Here, I describe work carried out at Indiana University that led to the discovery of the homeobox. The accompanying Perspective from McGinnis and Levine describes the independent discovery made at the Biozentrum in Basel ( McGinnis and Levine, 2024). At the time, the competition was lively but, as we all met each other – and realized that no one cares more about your work than competitors – we fortunately became friends and have enjoyed many years of following and respecting each other's work.
Homeotic genes
The historical roots of the homeobox discovery lie with one person, Professor Edward Lewis of the California Institute of Technology, a 1995 Nobel prizewinner. He became fascinated with Drosophila mutations that cause one part of the body to develop aberrantly, in the form of a different part – caused by mutations in homeotic genes. Such substitutions and conversions were first described in 1894 by William Bateson (Bateson, 1894). Lewis worked not with the occasional animal oddity, but with mutant flies that stably and predictably had transformations of thoracic and abdominal body parts. The most famous was the four-winged fly (Lewis, 1978). Normal flies have one full pair of wings in the second thoracic segment and, in the third segment, two vestigial wings (halteres) that assist balance during flight. The mutant four-winged flies had their third thoracic segment partly converted into another second thoracic segment, thus adding a second pair of wings.
Lewis spent more than half a century investigating homeotic genes that control the identities of adult body segments (Lewis, 1998) and found that some cluster on one chromosome in a region he named the Bithorax Complex (BX-C). BX-C genes were fascinating not only for their functions, but for the diverse types of mutations that affect them, and for remarkable chromosome ‘transvection’ behaviors discovered by Lewis, where somatic chromosome pairing affects gene activities (Lewis, 1954; Kaufman et al., 1973; Fukaya and Levine, 2017). Other homeotic genes are not clustered or co-localized but act at a distance to control the BX-C genes, such as genes of the Polycomb group that encode chromatin structure regulators – a gene discovered by Lewis' wife Pamela Lewis (Lindsley and Grell, 1968). The BX-C genes Ultrabithorax (Ubx), abdominal-A (abd-A) and Abdominal-B (Abd-B) encode three transcription factors (Sanchez-Herrero et al., 1985).
It remains astonishing how Drosophila genetic analysis allowed a master like Lewis to discover so many fascinating gene properties and functions, well before anything was known about DNA structure or molecular gene regulation. Researchers working with bacteria, yeast, Caenorhabditis elegans and Drosophila enjoyed the Awesome Power of Genetics (APOG), and Lewis' work is a prime example. Lewis could isolate and combine multiple mutations that, we know now, affected gene control elements. His clever combinations of cis-regulatory mutations led to dramatic adult transformations that avoided embryonic or larval death. The impression that there are eight BX-C genes came from observing reduced-function alleles, and the resolution of the BX-C containing three genes came from looking at recessive lethal mutations (Sanchez-Herrero et al., 1985).
Lewis discovered that BX-C genes are organized peculiarly: the order of the genes along the chromosome corresponded to the order of the body segments affected by the genes (Lewis, 1978; Duncan and Lewis, 1981). Most gene clusters, such as ribosomal RNA genes or hemoglobin genes, consist of multiple copies of the same kind of gene. In contrast, some families of closely related genes, such as tubulins, have members scattered about the genome. The BX-C, Lewis suggested, might be a clustered set of related genes playing analogous roles in different body parts but conferring different morphology. Mutations, he showed, could reduce the function of a gene that acted in a more posterior region and the consequent transformation would be posterior to anterior, as we saw with the haltere-to-wing example (Lewis, 1982). Mutations that activated a normally more posterior-acting gene in a more anterior place did the opposite kind of change (e.g. wings to halteres). Evidently, the more posterior gene is at least partly dominant over the more anterior gene. All of this was inferred without any knowledge of what ‘activation’ meant. It could have been a ubiquitous protein differentially turned on by a small molecule, or it could have been differential transcription of the genes in space, as indeed it turned out to be (Levine et al., 1983).
Lewis believed that he was studying ancient gene systems, the kind of system that would facilitate an evolutionary transition from, say, a worm with identical segments to a fly with highly diverse segments. It did not escape his attention that homeotic genes were key to certain types of evolution. He wrote: ‘During evolution a tandem array of redundant genes presumably diversified by mutation to produce this [BX-C] complex’ (Lewis, 1978).
What was missing from Lewis' BX-C were genes for more anterior body structures, a gap filled by pioneering work from Thomas Kaufman at Indiana University (Kaufman et al., 1980). Kaufman, like many biologists, was fascinated with one of the most spectacular ‘classic’ homeotic mutations: Antennapedia (Antp). Instead of antennae, these mutants have legs growing out of their heads. The gene is normally active in part of the thorax (Lewis et al., 1980a; Struhl, 1981, 1982) and the antenna-to-leg conversions are due to aberrant activation of Antp in the head.
Dominant gain-of-function mutations may tell you what a damaged gene can do rather than what the gene normally does. Kaufman and colleagues pursued experiments to find out what Antp loss-of-function does. He had recognized that homeotic mutations other than Antp were located near Antp, possibly very near, so there could be a complex. He and his lab members blasted away at flies with mutagens and identified recessive mutations that mapped in or near Antp (Kaufman, 1978; Denell et al., 1981; Wakimoto and Kaufman, 1981). The screens provided a treasure trove of loss-of-function mutations that identified at least seven different genes. Five of them turned out to be homeotic genes that act in regions mostly anterior to the places in the fly body where BX-C genes act. Antp was joined by Sex combs reduced (Scr), Deformed (Dfd), labial (lab) and proboscipedia (pb) (Lewis et al., 1980b). They were clustered closely, and their order along the chromosome corresponded well to the order, head to thorax, of body parts they affect. The Antennapedia Complex (ANT-C) was discovered, and it was even richer in genes than the BX-C (Box 1). I note without details that some in the field, even at Indiana University, viewed the relation of the new complex to BX-C with, in my opinion, excessive skepticism.
The ANT-C also contained a segmentation gene – not a homeotic gene – which Kaufman and colleagues named fushi tarazu (ftz; ‘segment deficient’). The two people with whom Lewis shared the 1995 Nobel, Christiane Nüsslein-Volhard and Eric Wieschaus, published a seminal paper in 1980 describing three major classes of segmentation genes (Nusslein-Volhard and Wieschaus, 1980). Mutations in the ‘gap’ genes caused multiple segments to be missing from the dying homozygous embryos, the ‘pair-rule’ mutations caused alternative body segments to be missing, and the ‘segment polarity’ mutants were missing part of the pattern within each body segment, with partial duplication of the remaining pattern. The ftz gene, oddly embedded in the ANT-C, is a pair-rule gene (Wakimoto et al., 1984). Two partially redundant zerknullt genes (zen1 and zen2; ‘crumpled’), involved in dorsal-ventral patterning, were also found in the ANT-C. Finally, bicoid (bcd) is also in the ANT-C and is active in females during oogenesis (Berleth et al., 1988). Bcd RNA and protein at one end of the egg causes anterior-type segments to form there.
The emerging idea from all these surprises was that there is progressive formation of pattern in fly embryos (Nusslein-Volhard and Wieschaus, 1980; Nusslein-Volhard et al., 1985; Carroll et al., 1988). Maternal factors lay out general anterior-posterior and dorsal-ventral orientations; then, gap genes broadly define anterior, middle and posterior regions. Pair-rule genes then form the proper number of segments, and segment-polarity genes control patterning within each segment. Once the genes were cloned it became clear that the pattern of transcription of each type of gene usually fits with where it has an effect. Earlier-acting genes (e.g. gap genes) affect the spatial patterns of the transcription of later-acting genes. Homeotic genes of the ANT-C and BX-C are also transcribed in spatial patterns that are influenced by segmentation genes.
Cloning the ANT-C
In 1980, I was fortunate to become a postdoctoral fellow as a collaboration between the labs of Thomas Kaufman and Barry Polisky, intending to isolate molecular clones of Antp. Barry's expertise in DNA replication and recombinant DNA manipulation was crucial because I had never worked with DNA. Thom had recruited a superb group of students to do genetic studies, and I am grateful to Thom and his lab members for all that they patiently tried to teach me. At that time, the full glory of the ANT-C was just emerging from the lab, so excitement grew about obtaining genomic DNA clones of Antp and maybe its neighbors. But, what to do?
A stroke of fortune came when Welcome Bender drove across the country from Stanford, where he had been a postdoctoral fellow with David Hogness, to Harvard Medical School where he would set up his lab. Welcome, with Mike Akam and Francois Karch, was the first person to clone homeotic genes, and they had done a spectacular job of it (Bender et al., 1983). He interrupted his trip to learn more about the ANT-C from Thom Kaufman, and was extremely generous in teaching me how he had used ‘chromosome walking’ to clone the BX-C.
The early days of gene cloning took advantage of sources of highly abundant mRNAs, such as those for ovalbumin, ribosomal RNA, histones and hemoglobin. If a tissue made mainly one kind of RNA, the RNA from it could be used to create or find DNA clones and thus get to the genes. David Hogness had realized that most genes would not be so abundantly transcribed in any tissue. He noted that geneticists often knew the location of a gene, and its function, but nothing about its RNA or protein products. What was needed was a way to isolate DNA from genes based upon their chromosome location. In 1972, Hogness wrote a grant in which he proposed to use a library of fragmented chromosomal DNA, contained in a bacterial vector system. Starting from a DNA clone somewhere near a gene of interest, the library of pieces could be screened to isolate overlapping clones, which would extend in both directions from the original clone. The ‘chromosome walking’ process could be repeated to gradually assemble clones representing a long stretch of a chromosome. The libraries that were needed to try this approach did not exist until a few years later, but by then the Hogness lab was ready to apply this idea to some exceptionally interesting genes, including the BX-C. Welcome Bender at the Hogness lab at Stanford collaborated with Ed Lewis at Caltech to isolate the BX-C, and succeeded. In situ hybridization to chromosomes, invented by my Massachusetts Institute of Technology thesis advisor Mary Lou Pardue while she was a graduate student with Joseph Gall at Yale (Gall and Pardue, 1969; Pardue and Gall, 1969), allowed the cloning progress to be followed by hybridizing genomic or cDNA clones to polytene chromosomes.
To clone the ANT-C, I followed the Hogness idea and the Bender methods, and was fortunate to have two people join in that effort: graduate student Amy Weiner and technician Robert Laymon. Both worked extremely hard and effectively for the years it took to isolate the ANT-C. Tulle Hazelrigg, then a graduate student in the Kaufman lab, isolated a new mutation that joined a part of the ANT-C near the Dfd gene to a previously cloned part of the genome (Hazelrigg and Kaufman, 1983) – an astonishing coincidence, as so little of the genome had been cloned. As word spread, people would jokingly submit a requisition to Tulle for a mutation for cloning their favorite gene. We made a recombinant genomic library from her mutant, isolated the junction piece, and ‘walked’ from it. In a second approach, we collaborated with Vincenzo Pirrotta at EMBL, who had invented a way to microdissect tiny regions of Drosophila polytene chromosomes and clone the DNA (Scalenghe et al., 1981). The third approach was to extend a chromosome walk that had been initiated at a nearby α tubulin gene (Mischke and Pardue, 1982), remarkably, in the lab where I had done my graduate work. All three approaches eventually worked and gave us entry points into different parts of the ANT-C (Scott et al., 1983). Meanwhile, Richard Garber, a postdoctoral fellow in Walter Gehring's lab in Basel, cloned the ANT-C using other methods (Garber et al., 1983).
At the time there was an expectation, a mystique almost, that genes such as homeotic genes would have rare transcripts that could not be detected by normal means (e.g. northern blots). So, when we began to find transcripts without any fuss, we were hesitant to believe they were actually homeotic transcripts. Recall that, at this point, we did not know whether homeotic genes encoded transcripts, or proteins, at all. Happily, the isolation of cDNAs began to solidify the findings. Remarkably, Antp emerged as one of the largest genes then known, with a transcription unit of about 103 kb (Garber et al., 1983; Scott et al., 1983).
Finding the homeobox
After making blots of restriction enzyme-digested cloned DNA that encompassed the ANT-C, or at least a good chunk of it, I hybridized cDNAs to the clones of genomic ANT-C DNA to find out where exons were located. During this process, I saw weak signals in addition to strong ones, which could have meant very short exons, background noise or some kind of weak DNA sequence similarity. This moment happens with many discoveries: ‘Either it is wrong or it is quite interesting’. I narrowed down those ‘extra’ signals to small bits of DNA. We had mapped the ftz gene, and I was finding cross-hybridization with it, as well as with other less-characterized places. I got a Ubx cDNA from the Bender lab and it cross-hybridized too. It could have been transposons contained within transcribed regions, or some sort of simple-sequence DNA, satellite-like.
It seemed worthwhile to sequence the cross-hybridizing small DNA fragments, but I had never done any sequencing. Across the corridor was Tom Blumenthal's lab. They were doing DNA sequencing using 14″ by 17″ acrylamide gels and 32P-labeled nucleotides. They helped me take my smallest cross-hybridizing fragments and do some sequencing. I exposed my first-ever sequencing gel to X-ray film, developing the film at around midnight. The sequence looked ugly, with lanes curved and some bands smeared, but parts looked possibly readable. One lane each for nucleotide. I read what I could and compared sequences from different genes, and I saw some short similarities – very exciting, but what were they? The most important question was whether they were protein-coding.
All I needed was a copy of the genetic (codon) code. But I did not have one and it was 1 am. I went to the department library but no textbooks were to be found. Then I found a volume of the Cold Spring Harbor Symposium on Quantitative Biology that had papers about the genetic code. Table 3 of one paper (Nirenberg et al., 1966) showed the code, with some question marks because initiating and stop codons were not firmly established. Back to my sequences and, yes, there was protein-coding similarity. At that moment, I knew that homeotic and segmentation genes encode proteins, and that at least one segmentation gene has a protein related to at least two homeotic genes.
At the time ‘box’ (used as in TATA box, which had been discovered in histone gene sequences by David Hogness and never formally published), meant a binding sequence for a DNA-binding protein. So ‘homeobox’ was a bit confusing as it was not a binding site, but a ‘homeodomain’ worked as a designation of the 60 or so amino acids forming the DNA-binding domains of Hox and other homeodomain-class transcription factors (Desplan et al., 1985; Kissinger et al., 1990). Strikingly, the products of the magnificently studied yeast mating type regulators (Nasmyth and Tatchell, 1980; Strathern et al., 1981) were homeodomain proteins (Laughon and Scott, 1984; Shepherd et al., 1984; Wolberger et al., 1991).
Antp, Scr, Dfd, lab, pb, ftz, bcd, zen1 (zen), zen2, Ubx, abd-A and Abd-B all contain homeoboxes (Regulski et al., 1985). Quickly, many researchers found that numerous genes outside the ANT-C and BX-C also have homeoboxes, usually with rather different sequences but still clearly encoding homeodomain transcription factors. The literature can be confusing because Hox is sometimes used to describe any homeobox gene but is more usefully restricted to the homeotic clusters. The Hox homeobox sequences, sometimes called Class 1, are more similar to each other than to other types of (non-clustered) homeoboxes (Class 2).
The discovery of homeobox sequences in vertebrates, by searching for weakly cross-hybridizing DNA sequences (Levine et al., 1984; McGinnis et al., 1984a,b), was very exciting, especially when some of them turned out to be clustered – and, in fact, were in homologous Hox complexes (Boncinelli et al., 1988). Our current understanding of Hox genes as half-billion-year-old regulators that govern the body from head to tail, and control morphogenesis of many tissues and organs, is a wonderful outcome of those early stumbling days. The differences between our fingers (Morgan et al., 1992), our hindbrain segments (Lufkin et al., 1991) and our vertebrae (Kessel et al., 1990; Condie and Capecchi, 1993) are governed by Hox genes. Hox genes have been implicated in developmental abnormalities (Goodman and Scambler, 2001) and several types of cancer (Shenoy et al., 2022). The changes in numbers and types of Hox genes during evolution is a powerful tool for inferring steps that may have happened as altered morphogenesis led to speciation (Gaunt, 2022). Regulation of Hox genes has brought remarkable insights into long- and short-range regulation of transcription (Deschamps and Duboule, 2017; Gentile and Kmita, 2020).
But, job security remains for those who study Hox, because full understanding is not yet here. How does the clustering benefit the organism and why does it sometimes break down, as in (ironically) Drosophila with its two complexes? How do transcription factors that have such similar DNA-binding domains nonetheless direct formation of distinct morphology (e.g. different vertebrae). How do Hox proteins act in combinatorial fashion? What genes are transcriptionally controlled by Hox proteins, and how is specificity attained (or is it)? Partial answers exist for these questions, but much remains to be learned. Hox studies will have reached a more polished stage when someone can use Hox proteins to engineer the growth of a specific shape of tissue or organ, presumably first in culture or an animal model. Being able to shape a growing vertebra in a patient would be a boon to spine repair.
Evolutionary conservation of development-regulating genes
Homeobox genes have taken their place among evolutionarily conserved genes that control development, reflecting the common ancestry of animals, as well as fundamental structures of proteins that go all the way to yeast. Before the 1980s, it was apparently beyond imagination that similar body shape-regulating genes might be conserved across the animal kingdom and, to my knowledge, no one ever said they would be. I learned of two exceptions from the accompanying paper (McGinnis and Levine, 2024): Ed Lewis proposed in a 1980 grant that Hox genes would be in humans, and Geoffrey St. Hilaire proposed commonalities between arthropod and mammalian body plans in 1822 (Geoffrey St. Hilaire, 1822), i.e. well before Darwin's Origin of Species. But dominant public views focused on the extreme variations among animals, both in adult form and in the processes of development, and many people expected largely distinct developmental gene systems. In retrospect, it is remarkable that, for a long time, the conservation of development-regulating genes was not emphasized. Why? Was there not at least a curmudgeon somewhere who would say it, without evidence, to be outrageous? In many cases of remarkable findings, one can look back in the literature and find, somewhere, an insightful idea that was dismissed or mocked at the time, but in this case, I do not know of a published claim of this sort. Perhaps a reader will correct me.
In fact, in developmental biology the idea of genes that control body form being ‘conserved’ across the vast spectrum of animal forms should have seemed likely. Otherwise, a situation would have had to arise where genes controlling body structures in some ancient organism were discarded in favor of a new system. The referee's whistle would blow, out would go a system that built bodies during, say, the Precambrian, and in would come two new systems, one leading to insects and one to mammals. Although perhaps not impossible – partly redundant systems could have bridged the transitions – it seems simpler that systems that are working fine would be tweaked by mutations to alter body form without wholesale abandonment of some really terrific genes. A reasonable metaphor would be the ease of changing software compared with changing hardware.
Naming the Hox complexes
A final anecdote is about how the mammalian Hox complexes came to have a logical nomenclature. In 1992, a homeobox conference was held in Ascona, Switzerland. Many laboratories had isolated parts of many Hox (and other homeobox) genes using homeoboxes as probes, but the names given to the genes were often random. Different labs gave different names to the same genes. However, the arrangement of the genes, one of Lewis' greatest discoveries, offered a simple resolution. In mammals, the four complexes could be called A-D, and the genes within them labeled from 1-13, with the lower numbers indicating genes active in more anterior parts of the body.
The conference offered a near-complete gathering of the relevant scientists, but getting them to agree during a group discussion potentially resembled herding cats. An afternoon hike provided an unusual opportunity for sequential persuasion. A few of us who agreed on the new nomenclature began the hike after most others had headed up the mountain. Hiking as rapidly as we could, we overtook one person or group after another, persuaded them of the new plan while they were puffing and had less energy for arguing, then proceeded uphill to the next group. At each encounter, we could say that all the people lower on the mountain already agreed. By the time we reached the beautiful beech forest at the top of the ridge, we had the agreement of nearly all the scientists who had been cloning mammalian genes. The new nomenclature was announced at the conference that night. Everyone helped to sort out the many synonyms for the genes, and later the nice simple plan was published as a letter to Cell. I signed it: Matthew Scott, Ad Hoc Chairman, Beechwood Ridge Nomenclature Committee, Ascona, Switzerland (Scott, 1993). Later, the resort was puzzled to receive reprint requests.
This article is part of the collection ‘40 years of the homeobox’. See related articles in this collection at https://journals.biologists.com/dev/collection/10249/40-years-of-the-homeobox.
Acknowledgements
I hope I have made clear how much homeobox research has benefitted from collaboration, generosity, substantial communication between competitors, and a community-wide sense of excitement. I have been fortunate to work in a field that has that ethos. I thank Margaret Fuller for comments on the manuscript, and Bill McGinnis and Mike Levine for sharing our papers and thoughts before publication. Great thanks to my postdoctoral advisors, Thomas Kaufman and Barry Polisky, for teaching me so much, and to Amy Weiner and Bob Laymon for their excellent work and team spirit. Thanks to all the people in my laboratories at the University of Colorado and Stanford who contributed mightily to homeobox-related research, and to lab members who worked on other projects but contributed expertise and scientific excitement that helped all of us. I am particularly grateful to Sean Carroll and Allen Laughon, who took the risk of joining a new lab and did beautiful work to make it succeed. Allan Spradling has been an inspiration to me since graduate school and I am deeply grateful for his teaching and friendship. It has been an honor and joy to work with all of you and learn from you. I am grateful for support of the lab by the National Institutes of Health, the American Cancer Society and the Howard Hughes Medical Institute.