ABSTRACT
Accounting for 10-20% of breast cancer cases, triple-negative breast cancer (TNBC) is associated with a disproportionate number of breast cancer deaths. One challenge in studying TNBC is its genomic profile: with the exception of TP53 loss, most breast cancer tumors are characterized by a high number of copy number alterations (CNAs), making modeling the disease in whole animals challenging. We computationally analyzed 186 CNA regions previously identified in breast cancer tumors to rank genes within each region by likelihood of acting as a tumor driver. We then used a Drosophila p53-Myc TNBC model to identify 48 genes as functional drivers. To demonstrate the utility of this functional database, we established six 3-hit models; altering candidate genes led to increased aspects of transformation as well as resistance to the chemotherapeutic drug fluorouracil. Our work provides a functional database of CNA-associated TNBC drivers, and a template for an integrated computational/whole-animal approach to identify functional drivers of transformation and drug resistance within CNAs in other tumor types.
INTRODUCTION
Breast cancer is the second most common cause of cancer deaths among women in the USA (Siegel et al., 2018). The most aggressive subtype, triple-negative breast cancer (TNBC) makes up ∼15% of breast cancers. TNBC is molecularly heterogeneous with few currently identified druggable molecular targets, poor therapeutic response and low survival rates (Bauer et al., 2007; Dent et al., 2007). Less than 40% of women with metastatic TNBC survive 5 years (Bauer et al., 2007). Standard-of-care treatment of TNBC is limited to chemotherapy, including therapies targeting DNA-damage repair and, for some patients, immunotherapy, including atezolizumab (Marra et al., 2019; Harbeck and Gnant, 2017). Meanwhile, advances in sequencing technology have opened new opportunities for understanding the mechanisms of tumorigenesis and drug response (Hyman et al., 2017; Al-Lazikani et al., 2012). Such studies may improve breast cancer survival by improving predictions of progression in individual patients, identifying novel therapeutic targets and improving the utility of our preclinical models.
Developing genetic models for TNBC is challenging. Most TNBCs contain mutations in TP53, but some other genes are also commonly mutated (Cancer Genome Atlas Network, 2012; Jiang et al., 2019; Nagahashi et al., 2018). However, computational work (Cancer Genome Atlas Network, 2012; Jiang et al., 2019; Mermel et al., 2011; Sanchez-Garcia et al., 2014; Curtis et al., 2012) has identified extensive genomic copy number alterations (CNAs) that define regions that commonly overlap between different patients. For example, the MYC locus – a key regulator of basal-like tumor biology (Cancer Genome Atlas Network, 2012; Chandriani et al., 2009) – is commonly amplified in TNBC. Understanding the role of CNAs in TNBC would benefit from a comprehensive functional study of putative driver genes and their interactions in a whole animal context.
Recently, the Drosophila field has developed multigenic models of cancer to capture aspects of tumor complexity; these models have been used to explore drug responses, including as a screening platform to treat patients with cancer also diagnosed with a resistant disease (Vidal et al., 2005; Markstein et al., 2014; Gladstone et al., 2012; Willoughby et al., 2013; Bangi et al., 2016, 2021, 2019). Although flies lack orthologs of some human tissues, epithelial tissues in Drosophila, such as the eye and wing, have proven useful for modeling cancer networks and identifying candidate therapeutics (Vidal et al., 2005; Markstein et al., 2014; Gladstone et al., 2012; Willoughby et al., 2013; Bangi et al., 2016, 2021, 2019; Read et al., 2005; Dar et al., 2012; Hirabayashi et al., 2013; Levinson and Cagan, 2016; Levine and Cagan, 2016).
In this study, we established a Drosophila p53-Myc platform as a tool for exploring TNBC genomic complexity. To leverage this platform, we first used a computational approach to rank genes within common TNBC CNA regions based on their likelihood to promote tumor progression. We then used our Drosophila p53-Myc platform to assess the functional relevance of many of the most highly ranked candidate genes within each CNA region, based on their ability to enhance transformation in a whole-animal platform. The result is a functional database of TNBC-driving genes within common CNAs. Finally, we used this database to build a library of more-complex Drosophila TNBC models. In contrast to a p53-Myc model, these more-complex lines failed to respond to fluorouracil – which is clinically relevant for TNBC – demonstrating that increased genetic complexity can lead to drug resistance and identifying candidate resistance factors. As an integrated approach, this work provides a path towards functionally, deconvoluting the role of CNAs in tumor progression and drug response.
RESULTS
TP53 and MYC are the most common driver genes in TNBC
To determine the number of mutated gene loci driving TNBC, we performed an analysis using the MutSigCV algorithm (Lawrence et al., 2013; https://www.genepattern.org/modules/docs/MutSigCV#gsc.tab=0). We found that 81% of TNBC tumors in the 2012 The Cancer Genome Atlas (TCGA) dataset (Cancer Genome Atlas Network, 2012) include a mutation predicted to alter TP53 function, which is consistent with the reported 80% of basal-like tumors (Cancer Genome Atlas Network, 2012). No other driver genes were commonly mutated (Fig. 1A; Table S2). This is consistent with previous work showing that breast cancer is primarily driven by CNAs rather than point mutations (Jiang et al., 2019; Ciriello et al., 2013).
The location and size of a CNA is not influenced solely by selective pressure, but also by chromatin architecture and the location of recombination hotspots (Pfeifer et al., 2006; Taylor et al., 2008; Li et al., 2014). As a result, a small number of genes within a given CNA region are responsible for the selective pressure and are therefore ‘drivers’, while neighboring genes are included due to proximity to the driver and mostly represent ‘passengers’ that do not appreciably contribute to disease progression.
As a first step towards exploring the genes within CNAs from the TCGA data, we began with 186 CNAs previously identified by GISTIC 2.0 and ISAR (Cancer Genome Atlas Network, 2012; Sanchez-Garcia et al., 2014, Mermel et al, 2011). We performed hierarchical clustering using Euclidean distance on the 186 CNAs and driver gene mutations for the 72 primary TNBC tumors with both types of data available. We found that the tumors did not fall into discrete clusters, and tumors from patients with shorter or longer overall survival did not cluster together (Fig. 1A). This suggests that in the aggregate, these genomic alterations do not explain differences in clinical outcome. This may be because some CNAs have more impact on outcomes than others or because region-level analysis does not provide the resolution to understand the impact of individual driver genes. Therefore, a key step in using this genomic data to model TNBC is to identify drivers within the common CNAs.
The most common TNBC CNA was amplification of 8q24.1, a small region that – as identified by GISTIC 2.0 – solely contains the oncogene MYC (Fig. 1A). Excluding combinations of CNA that overlap, the most frequently identified combination of two events was mutation of TP53 and amplification of MYC, occurring in 72% of TNBC tumors. We, therefore, used both TP53 and MYC as the basis of a Drosophila platform designed to identify additional drivers.
p53 and Myc promote cancer-like phenotypes in Drosophila
To characterize phenotypes that are due to mutations in p53 and Myc, we generated individual fly lines containing transgenes that provide targeted expression of (i) Drosophila Myc (UAS-Myc) plus (ii) two different RNA-interference-mediated knockdown constructs targeting endogenous P53 (UAS-p53lh or UAS-p53sh) (see Fly stocks section in Materials and Methods for details). Overexpression of Myc and strong loss of P53 (∼80% in the presence of UAS-Myc) was confirmed by western blotting (Figs S1A, B). To confirm activity, we used a GMR-GAL4 driver to express UAS-p53lh plus UAS-Myc (yielding GMR>p53lh;Myc flies) in the developing eye field. GMR>p53lh;Myc flies exhibited enlarged eyes with normal ommatidial patterning (Fig. 1B), presumably reflecting Myc-mediated cell enlargement.
We also tested the p53sh and Myc transgenes individually and in combination with a patched-Gal4 driver (ptc) that directs discreet expression of upstream activating sequence (UAS)-fused transgenes during several developmental stages. ptc>Myc pupal survival (the percentage of pupae that survive to adulthood, also known as the eclosion rate; see Lethality analyses in Materials and Methods) was reduced to 85.8%. Further reduced survival (81.5%) was exhibited by ptc>Myc,p53sh (Fig. 1C). Reducing P53 alone (ptc>p53sh) had no effect on survival. Similar results were obtained in the background of another fly strain (yhsf) (see Fly stocks in Materials and Methods), and with p53lh (Fig. S1C,D).
Using an inducible UAS-GFP to visualize transformed cells, we observed expansion of the ptc domain at the anterior/posterior boundary of developing ptc>Myc wing epithelia (‘wing discs’; Fig. 2A,D, Fig. S2B). In contrast, the ptc domain was smaller with knockdown of p53 by p53sh (Fig. 2A,D) and p53lh (Fig. S2B). In the presence of ptc>Myc, p53sh restored the ptc domain area to normal size (ptc>Myc,p53sh; Fig. 2D). We have previously observed that p53 knockdown can reduce transformation phenotypes in the presence of an oncogene, although it enhances overall transformation by reducing senescence (Bangi et al., 2016). Area reduction by p53lh was not significant (Fig. S2B).
Specific targeting of transgene expression to the anterior/posterior boundary of the wing disc provides a useful assay for measuring aspects of transformation in Drosophila cancer models (Levinson and Cagan, 2016; Vidal et al., 2006; Sonoshita et al., 2018). In confocal images of both ptc>Myc and ptc>Myc,p53sh wing epithelia, we observed transformed cells delaminating into the basal region of the epithelium (Fig. 2B). These delaminating cells showed high levels of cleaved, activated caspase and matrix metalloproteinase (Fig. 2C). Some of the delaminating cells were seen migrating away from the ptc domain (Fig. 2E). Delaminating, caspase-positive cells (Fig. S2A) and migrating cells (Fig. S2C) were also seen with ptc>p53lh;Myc. The co-occurrence of caspase activation with translocation of cells away from the ptc domain is consistent with studies in other Drosophila cancer models (Geisbrecht and Montell, 2004; Rudrapatna et al., 2013; Gorelick-Ashkenazi et al., 2018), in which this translocation was preceded by aspects of transformation and epithelial-to-mesenchymal transition. Altering additional cancer genes enhanced this phenotype (below), further suggesting these are, indeed, aspects of transformation. However, more-detailed studies will be required to confirm this. Based on the multiple aspects of transformation exhibited by our ptc>Myc,p53sh line, we concluded that this Drosophila line provides a useful genetic platform for identifying functional candidate driver genes within regions of CNA.
Prioritizing candidate driver genes from CNAs
Most of the 186 CNA regions identified by ISAR and GISTIC 2.0 contain dozens or hundreds of genes. Fig. 3A provides a flowchart to summarize our approach for prioritizing these genes for functional testing. To curate likely driver genes, we first eliminated genes that (i) occur in known, common copy number variants, (ii) were not differentially expressed, (iii) were associated with an increase in expression when copy number was reduced or, (iv) do not have a clear ortholog in Drosophila. To identify genes that are specifically relevant to TNBC, we analyzed the candidate genes for significance within the TNBC subset using a mild probabilistic cutoff. These steps reduced an original list of 12,621 candidates to 6694 (see Analysis of TCGA data in Materials and Methods; Fig. 3A).
When analyzing the TNBC dataset at this step, we found that some genes are deleted more frequently in TNBC than they are amplified, even though the CNA had been identified as an amplification by ISAR or GISTIC 2.0, and vice versa (see Analysis of TCGA data Materials and Methods). We examined these genes separately in functional testing (below) as Group 2. Indeed, several cancer genes have recently been found to have paradoxical, context-dependent roles (Walerych et al., 2012; Lobry et al., 2014; Shen et al., 2018). Because this study focuses on TNBC, we assigned each gene a CNA type according to our TNBC-specific analysis.
Of the 6694 genes with CNAs, 4455 had a clear fly ortholog, meaning they were available for functional testing. The genes were further stratified into groups based on: (i) the Helios score of each gene (for genes identified by ISAR) resulting from machine learning on TCGA and functional data in breast cancer (Sanchez-Garcia et al., 2014), (ii) whether copy number of a gene influenced its expression, (iii) whether the gene was altered more frequently than other genes in the region, and (iv) the size of the region (see Analysis of TCGA data Materials and Methods). Our analyses resulted in a set of prioritized genes for each common TNBC CNA (Table S2).
For our functional studies, we focused first on the top ISAR genes (Group 1I; Fig. 3A), small GISTIC 2.0 regions (Group 1G; Fig. 4A), select genes with ambiguous classification of CNAs (Group 2), and three lower ranked genes for which we had fly lines on hand (Group 4G). We then tested Drosophila lines representing these genes in functional assays, in many cases testing multiple lines for each gene (Fig. S3A). Control lines included three with transgenes unrelated to human genes, and one p53 null allele – expected to have no effect when p53 was knocked down using RNA interference (RNAi). In general, we determined how to analyze genes based on their status in our TNBC-specific CNA analysis: genes comprising deletions were treated as tumor suppressors, whereas those comprising amplifications were treated as oncogenes.
Driver genes enhance p53/Myc transformation phenotypes
Our computational analysis and ranking protocol identified a set of candidate driver genes within TNBC-associated regions of CNA. To functionally assess candidate genes, we placed candidate transgenes in trans to ptc>Myc,p53sh, creating a ‘3-hit’ model by standard genetic crossing. Amplified genes were assessed by overexpression (ptc>Myc,p53sh/UAS-candidate); deleted genes were assessed by RNAi-mediated knockdown (ptc>Myc,p53sh/UAS-RNAi[candidate]) or by removing one functional copy (ptc>Myc,p53sh/mutant−/+).
To validate our approach, we tested four well-known tumor drivers: oncogenes Dp110 (officially known as Pi3K92E) and Egfr were overexpressed, and tumor suppressors Pten and Rb were reduced by knockdown in the context of ptc>Myc,p53sh. In each case (4/4), adding the cancer-associated gene to a ptc>Myc,p53sh background decreased survival to adulthood (eclosion rate; see Lethality analyses in Materials and Methods; Fig. 3C); adding Pten also decreased larval survival (pupariation rate; Fig. S3B). These genes also significantly increased translocation of cells within the wing disc (Fig. 3D) compared to ptc>Myc,p53sh alone. For Pten, an example of a driver that showed only mild translocation and overgrowth of transgenic tissue was significantly increased (Fig. 3E). In contrast, 0/4 Group 4 lines and 1/4 control lines showed significant reduction in eclosion (Fig. S3C). Three lines of Group 4 and three control lines showed significant reduction in pupariation, perhaps due to variation in genetic background (Fig. S3D). However, 0/4 Group 4 lines and 0/4 control lines showed significant change in cell translocation (Fig. S3E). Furthermore, 0/3 Group 4 lines and 0/2 control lines showed significant increase in transgenic tissue overgrowth (Fig. S3F). In summary, well-known drivers enhanced aspects of transformation in the ptc>Myc,p53sh model while non-drivers did not.
As a first step in identifying functional driver genes within regions of CNA, we functionally tested 222 fly lines, covering 47 of the 50 genes in Group 1, 40 of the 53 genes in Group 2, three of 2023 genes in Group 4G, and three control genes. Of the 222 lines tested over all groups, 100 exhibited increased lethality when placed in trans to ptc>Myc,p53sh, compared to ptc>Myc,p53sh alone, defining 69 separate genes (Tables S4, S9, S10). To determine whether genes that decreased survival also increase aspects of transformation, we tested 66 of these genes in a ptc>Myc,p53sh compared to ptc>Myc,p53sh alone (multiple lines in some cases; Tables S4, S8-S10) for changes in cell translocation. For genes that did not show a significant change in translocation in an initial trial, we also measured tissue overgrowth. Together, 48/66 genes (63/116 lines) – when placed in a ptc>Myc,p53sh background – significantly increased translocation (Fig. 4A,C and Fig. S4A; Table S12) or overgrowth (Fig. 4B,C and Fig. S4B; Table S13) phenotypes; these 48 genes were judged to be functional drivers in this context.
We identified some differences in the percentage of drivers identified (hereafter referred to as ‘hit rate’) within each computationally defined group. Most tested Group 1 genes were drivers, including 56% of genes in Group 1I and 75% of genes in Group 1G. The higher hit rate for Group 1G suggests that genes in small GISTIC 2.0 regions are especially likely to be regulators of tumor progression (‘hereafter referred to as ‘drivers’). In Group 2G, where listings within at least one database were in agreement with our TNBC-specific analysis for each gene, 76% genes were drivers, a similarly high hit rate compared to that in Group 1G. However, only 38% genes tested in Group 2I – i.e. regions identified as amplifications by ISAR but deletions by our TNBC-specific analysis – were drivers. This suggests that algorithms, such as GISTIC 2.0 and ISAR are useful for identifying functional amplifications and deletions. That is, genes appearing in regions identified only as amplifications are generally unlikely to function as tumor suppressors. We opted not to test the remainder of the genes in Group 2I.
Altogether, in addition to MYC our analyses identified 48 identified drivers. These 49 genes define a functional set of CNA-associated putative driver genes (Fig. 5A; Table S1) and account for the observed copy number aberration of 66 partially overlapping, computationally defined regions (Table S5). Of note, several of these driver genes have been reported in in silico or in vitro screens for drivers (Tables S1 and S6) (Ciriello et al., 2013; Aure et al., 2013; Vogelstein et al., 2013; Forbes et al., 2015; Marcotte et al., 2016; Griffith et al., 2017). Furthermore, four of the genes were determined to have been mutated at a significant rate (Table S2). Finally, we assessed these genes for effects on survival in the TCGA breast cancer dataset (Liu et al., 2018). Twelve genes showed a trend towards increased progression or decreased survival (Fig. 5B) in patients (Table S7; see Lethality analyses Materials and Methods). Most (31/49) of these genes showed evidence of confirmation of the driver status when using at least one of these methods (Table S1). This analysis demonstrates the strength of our computation/genetics approach to identifying functional drivers.
Genetic complexity abrogated drug response in TNBC models
Work by a variety of laboratories including ours suggest that one source of resistance to therapeutic drugs is genetic complexity of the tumor. To test the effects of genetic complexity on drug response, we used our database of candidate CNA-associated driver genes to generate six 3-hit models that reflect some of the genetic complexity in TNBC. Specifically, we paired ptc>Myc,p53sh with overexpression of oncogenes Dp110, Hey, Myb, Ppcs or aPKC, or knockdown of the tumor suppressor Rop.
To test the effect of genetic complexity on drug response, we screened 72 FDA-approved cancer drugs, JQ1 (targeting Myc-pathway activity), and five novel drugs that have shown activity in other Drosophila cancer models (Fig. 6A). Drugs were tested at 27°C to provide an optimal level of lethality, allowing us to use rescue as a quantitative assay. Drugs were mixed into the culture medium of ptc>Myc,p53sh flies to be consumed orally. Fluorouracil, a chemotherapy drug used in the treatment of TNBC (Muss et al., 2009), provided the strongest rescue of ptc>Myc,p53sh-induced lethality at both 27°C (Fig. 6A,B) and 29°C (Fig. S5A), mirroring activity in patients with TNBC. In contrast, all six selected 3-hit TNBC lines built from our functional database failed to show significant rescue by fluorouracil at either 27°C (Fig. 6C) or 29°C (Fig. S5B).
We examined the impact of fluorouracil on transgenic tissue overgrowth on two selected 3-hit lines. ptc>Myc,p53sh showed a trend toward decreased overgrowth in response to fluorouracil (10 or 50 µM; Fig. 6D), consistent with rescue from lethality (Fig. 6A-C). The response was attenuated in response to overexpression of Dp110, such that no response to 50 µM was observed. The addition of Myb produced a strong dose-response in reduction of tissue overgrowth, yet notably, this did not translate into whole-animal rescue (Fig. 6C). We conclude that, similar to other tumor types (Bangi et al., 2016), increased genetic complexity can lead to drug resistance in models of TNBC.
DISCUSSION
Producing useful genetic models of cancer and designing targeted therapies require an understanding of the genes that drive tumor progression. This is especially challenging with TNBC, a disease in which most driver genes emerge from copy number aberration rather than mutation, and carry many passengers with them. Approaches to identifying driver genes in this context have included statistical analyses of breast tumor-sequencing data (Aure et al., 2013), pan-cancer analyses (Ciriello et al., 2013; Vogelstein et al., 2013; Forbes et al., 2015), crowdsourcing (Griffith et al., 2017), machine learning (Sanchez-Garcia et al., 2014; Schroeder et al., 2014), cell culture screening approaches in transgenic lines (Kessler et al., 2012) and breast cancer cell lines (Marcotte et al., 2016; Koedoot et al., 2019; Patel et al., 2018), and a recent forward-genetics approach in a mouse model of BRCA1-deficient TNBC (Miao et al., 2020). Our study complements this body of literature by providing a systematic characterization of putative TNBC driver genes in a whole animal model, using a genetic background representative of a majority of patients with TNBC.
Our data are consistent with previous evidence, i.e. that mutation of TP53 (Cancer Genome Atlas Network, 2012; Jiang et al., 2019; Nagahashi et al., 2018) and amplification of MYC (Cancer Genome Atlas Network, 2012; Jiang et al., 2019; Curtis et al., 2012) are the two most common genetic aberrations in TNBC, and frequently occur in combination (Fig. 1). In Drosophila, Myc promoted aspects of transformation including tissue overgrowth and cell translocation (Fig. 3). Knockdown of p53 enhanced Myc lethality but not overgrowth or translocation (Fig. 2). These data indicate that p53 is likely to affect other processes in the targeted tissue, such as senescence or metabolism, which can impact survival in Drosophila (Bangi et al., 2016; de la Cova et al., 2014). Our ‘base’, i.e. the Myc,p53sh transgenic line, constitutes a simple genetic animal model of TNBC, exhibiting activation of matrix metalloprotease and caspase cleavage, previously validated markers of cell translocation and metastasis-like behavior in fly cancer models (Levinson and Cagan, 2016; Vidal et al., 2006; Sonoshita et al., 2018).
Many fly lines showed increased lethality when placed in trans to ptc>Myc,p53sh, including all known cancer drivers tested and several negative controls (Fig. S3C,D), suggesting that rescue of lethality provides a sensitive but not specific assay. However, only altering cancer-related genes – increasing ortholog expression of oncogenes or decreasing activity of tumor suppressors – altered transformation phenotypes in the Drosophila wing disc, providing a more-specific second-line assay. Based on these functional assays, we identified or confirmed the cancer driver activity of 49 genes (Fig. 5A). In some cases, multiple driver genes are present within the same CNA region (Table S1). This phenomenon was suggested by computational predictions, such as Helios (Sanchez-Garcia et al., 2014), and confirmed in our functional experiments. The impact of multiple driver genes in the same amplification or deletion, including whether those genes interact, remains an area open for investigation. We should also note that genes without clear Drosophila orthologs were not tested using our approach.
Some drivers, such as CCNE1, are well-established cancer drivers. Others, such as TRIO, are less well-studied but have previously appeared in the literature and databases of cancer driving genes. DNAJB6 has the highest Helios score in its region but is not considered a driver based in an assay of anchorage-independent growth (Sanchez-Garcia et al., 2014). Its observed effects in a translocation assay (Koedoot et al., 2019) and our overgrowth assay highlight the importance of testing for drivers in multiple contexts, such as an intact epithelium, in which competition with wild-type tissue can be observed. To our knowledge, PPCS, TM2D1, INADL, RBM34, C6orf203 and STXBP1 have not been linked to breast cancer. The variety of driver genes identified in this study underscores the potential of an integrated computational/experimental approach.
The high hit rates in Groups 1 and 2G (57-76%) indicate a significant enrichment in driver genes based on computational work alone. None of the Group 4 genes tested were identified as drivers. This suggests computational features can enrich for driver genes over the initial identification of CNA regions by GISTIC 2.0 and ISAR. Some genes exhibited paradoxical properties. A small number of genes were amplified in the TCGA data but demonstrated decreased expression. For one such gene, MYB, we tested for both tumor suppressor and oncogene activity, and found that knockdown led to increased cell translocation (Fig. 4A), while overexpression led to tissue overgrowth (Fig. 4B). Similarly, our study helped deconvolve genes in ‘ambiguous’ Group 2 CNA regions. For genes identified as amplified in one TNBC database and deleted in another, 76% exhibited driver effects when tested according to whichever CNA occurred more frequently in TNBC.
In contrast, genes in regions only identified as amplifications by using ISAR but more frequently deleted in the TNBC set (Group 2I) were less likely to be drivers (38% hit rate). Our assays demonstrated that the driver genes identified in this group function as tumor suppressors, despite appearing in amplified regions (Table S10). One example, MSRA, has been found to be a tumor suppressor in other functional studies (Lei et al., 2007; Luca et al., 2010). Together, these data suggest that, when altered by different CNAs, the same gene can become either an oncogene or a tumor suppressor, depending on subtype and biological context.
Developing a functional database allowed us to then ask a question significant to the development of therapeutics: Does genetic complexity alter drug response, when the ‘base’ set of common drivers is the same? We found that, while fluorouracil rescued our Myc,p53sh base mode from lethality and mitigated transgenic tissue overgrowth, introducing an additional transgene variously enhanced or prevented the effect of fluorouracil on tissue overgrowth in a dose- and gene-dependent manner. Yet in all cases tested – including examples of both oncogenes and tumor suppressors, and genes from both Group 1 and 2 – the additional transgene abrogated fluorouracil-induced rescue from lethality. This suggests that increasing genetic complexity reduces treatment response, confirming findings from work in other tumor models (Bangi et al., 2016). Our data indicate that genetic complexity may play a role in the poor outcomes for patients with TNBC (Dent et al., 2007), and suggest genes that are candidates to mediate chemoresistance. Our system also provides a platform to study the combinatorial effect of genetic aberrations on multiple tumor-like phenotypes. In the future, this system could be used to examine the mechanisms by which specific genetic combinations confer resistance to therapy.
Using a functional approach, we have identified multiple new driver genes that help explain specific amplifications and deletions found in patients with TNBC. Some of these genes are recently discovered or poorly understood cancer drivers that merit further research into the specific roles they play in TNBC. Further, we provide data that some of these genes can mitigate the response of a Myc,p53sh model to the chemotherapeutic fluorouracil. Further understanding the functional impact of these driver genes on drug response may help guide prognosis and drug selection based on genotype, and suggest new avenues for therapeutic development.
MATERIALS AND METHODS
Analysis of TCGA data
Data from the invasive breast carcinoma dataset were downloaded from The Cancer Genomic Atlas (TCGA) data portal (National Cancer Institute, 2018, https://www.cancer.gov/ccg/research/genome-sequencing/tcga ). The dataset contains 1100 cases.
The somatic mutation annotation file (MAF) for this dataset contains 771 cases and 14,375 genes. We uploaded this file to the MutSigCV public server (Lawrence et al., 2013) to retrieve predicted mutated driver genes for breast cancer. According to Lawrence et al. (2013), genes with q<0.1 were considered significant. We also considered genes from COSMIC (Forbes et al., 2015), the TCGA Pan-Cancer Analysis (Ciriello et al., 2013), the Vogelstein dataset (Vogelstein et al., 2013) and CIViC as of April 2016 (Griffith et al., 2017) to be driver genes that might be mutated. Mutations in any of these genes are shown in Fig. 1A. Known predisposing germline variants were found in 52 cases (Cancer Genome Atlas Network, 2012). Mutations in any of these ten genes were included in Fig. 1A.
where OCN=observed copy number in the primary tumor sample; GCN=germline copy number; and TP=tumor purity.
In the minority of cases where a tumor-purity estimate or germline copy number was unavailable, a value of 1 was substituted for each of these parameters.
Thus, an adjusted copy number value of 1 represents normal in this study. Similar to other studies of CNAs (Aure et al., 2013; Akavia et al., 2010), cutoffs of 2^0.3 and 2^-0.3 were used for amplifications and deletions, respectively, representing the gain and loss, respectively, of one copy in ∼40% of the tumor. The copy number for each region shown in Fig. 1A represents the average of all genes in the region. The 72 TNBC cases with both copy number data and somatic mutation data are shown in Fig. 1A.
Neither GISTIC 2.0 nor ISAR was applied to this dataset with TNBC specifically in mind. Thus, following these analyses, genes irrelevant to TNBC biology might appear in CNA regions, such as the estrogen receptor 1-encoding gene ESR1. GISTIC 2.0 was applied to the basal-like gene expression subtype, which only partially overlaps with TNBC. ISAR was applied to the ER/progesterone receptor-negative subgroup the authors refer to as ‘basal’, although this would also include CNAs that are relevant to HER2-positive tumors. These partially overlapping datasets resulted in some discrepancies. Some regions are referred to as amplifications by GISTIC 2.0 in the total breast cancer set but as deletions in the basal subtype set. Similar discrepancies also appeared between the ISAR and GISTIC 2.0 results (and were referred to as ‘ambiguous CNAs’ in this current study). As ISAR has been explicitly designed to pick up amplifications even when they appear in the context of a larger deletion (Sanchez-Garcia et al., 2014), this might explain some of the discrepancies.
To select genes within CNA regions that are specifically relevant to TNBC biology and to determine the direction of their effect, we applied an additional filter. Following the study by Aure et al. (2013), for each gene, we calculated the binomial probability of seeing the number of TNBC tumors with an amplification or deletion of that gene that appear, out of the 110 TNBC tumors in the dataset. We retained genes with a probability <0.05 for further analysis. Since this step represents an extra filter on top of an already rigorous algorithm to detect CNAs, we did not use multiple hypothesis correction. By using this filter, 2321 genes were removed. If more deletions of the gene appeared in the dataset than amplifications, we marked the gene as a deleted and treated it as a putative tumor suppressor; when more amplifications than deletions appeared, we treated it as amplified and a putative oncogene.
Next, we required that candidate genes within CNA regions be differentially expressed. We downloaded the RNA-seq data for all samples available and extracted the transcript quantification values for each gene, which had been generated by TCGA using RSEM (Li and Dewey, 2011; Ciriello et. al, 2015); 2693 genes did not have RNA data available and so were removed at this step. As described by Akavia et al. (2010), we retained genes with a standard deviation of greater than 0.25 for further analysis, thus removing 306 genes. We then converted the transcript quantification values to z-scores for each gene. For each amplified gene, we performed a statistical test comparing the z-scores of primary tumors with that amplification and those without. We performed the equivalent analysis for deleted genes. Where the two distributions had equal variance, we used a Student's t-test, and where they did not, we used the Welch’s test. All distributions were either normally distributed or had a sample size of ≥30, so we judged parametric statistics to be valid. We then applied the Bonferroni correction to all resulting P-values.
For further analysis, we retained genes whose copy number had a direct positive relationship or a non-significant relationship with expression. Because amplifications could conceivably cause chromatin changes that reduce the expression of a tumor suppressor, we also retained genes that had reduced expression when amplified. However, as no theoretical mechanism could cause increased expression of a cancer driver gene when it is deleted, we removed genes that were associated with increased expression when deleted from further analysis (15 genes).
For genes appearing in the ISAR dataset, we selected the top 1-3 genes by Helios score for Group 1I (Table S3). The remaining genes in the top 3 (for regions with 12 or fewer genes) or top quartile (for regions with more than 12 genes) of Helios scores made up tier 2. Because the Helios score is based in part on the frequency of CNAs in the dataset as well as the relationship between copy number and expression, we did not directly consider these parameters to define Group 1I.
From the GISTIC 2.0 genes that did not appear in the ISAR data, we selected genes that 1) have a significant, positive relationship between copy number and expression, 2) appear in the top 3 (for regions with 12 or fewer genes) or top quartile (for regions with more than 12 genes) of genes in the region by the frequency of amplification or deletion occurring in the dataset, and 3) occurred in small (<10 genes) regions. These genes comprise Group 1G (Table S3).
Genes marked for inclusion in Group 1 were moved to Group 2 when there were discrepancies between databases. In Group 2G, each gene belongs to at least one region identified by GISTIC 2.0, and some also belong to amplifications identified by ISAR. Genes in this group belong to at least one amplification and at least one deletion, and were analyzed as tumor suppressors or oncogenes according to the result of our TNBC-specific analysis. In Group 1I, genes belong to amplifications in ISAR but appeared to be deleted more frequently in our TNBC-specific analysis. Select genes in this group were analyzed as tumor suppressors. The remaining ISAR and GISTIC 2.0 genes comprise Group 3 and 4 (Fig. 3B).
Finally, we converted all human genes in these five tiers to fly genes by using homologs compiled from the Drosophila Interactions Database (DroID) (Murali et al., 2011; downloaded 1/2014); DRSC Integrative Ortholog Prediction Tool (DIOPT) (Hu et al., 2011; downloaded 10/2014), HomoloGene (https://www.ncbi.nlm.nih.gov/homologene), Ensembl (Flicek et al., 2014) and OrthoDB (Waterhouse et al., 2013) (the latter three downloaded 7/2014). For genes with multiple possible fly orthologs, we performed the search manually in DIOPT and performed a tBLASTn (Mount, 2007; Johnson et al., 2008) search of the human protein sequence against the fly genome. The top scoring orthologs from each of these methods were used for testing (Table S11).
Fly stocks
Experiments with overexpression of Myc used stock #9675 (Bloomington Drosophila Stock Center) with genotype hs-FLP y w; UAS-Myc. Initial experiments with p53 knockdown used a long hairpin under upstream activating sequence (UAS) control VDRC Id 38235 (Vienna Drosophila Resource Center). This is represented as p53lh. However, although the long hairpin siRNA produced an effective knockdown, it also produced a faster migrating species on western blots, possibly representing expression of a different P53 isoform. Due to this and the convenience of using a hairpin on the third chromosome, a short hairpin with guide sequence 5′-TGCTGAAGCAATAACCACCGA-3′ under UAS control on the third chromosome, generated in our lab, was used. We generated a recombinant third chromosome with this UAS-p53sh and UAS-Myc. This line has the genotype w; UAS-Myc UAS-p53sh; flies in Fig. 1B are in a yellow hairy singed forked (yhsf) background, chromosome markers that should not affect the transgene phenotypes. Both hairpins produce an ∼80% knockdown in the presence of UAS-Myc, but p53sh does not exhibit the shift in band size (Fig. S2B).
In initial lethality experiments crossing Myc p53sh to ptc-Gal4, low numbers of progeny resulted in survival variability after the first two technical replicates (flips). Therefore, only the first two technical replicates from each of seven different independent experiments (14 replicates total) were included in the analysis shown in Fig. 1C.
We combined this line with patched-Gal4 on the second chromosome, the linked balancer SM5(Gal80)-TM6B, and a hs-hid construct on the Y chromosome to generate (hs-FLP) (y) w /hs-hid; (ptc-Gal4; UAS-Myc UAS-p53sh)/SM5(Gal80)-TM6B. This line was used for lethality experiments. We also generated a version with UAS-GFP on the second chromosome. hs-FLP was not detected in a PCR of fly genomic DNA from >30 flies from this line, indicating it likely lost the floating hs-FLP construct on the X chromosome, and so has genotype (y) w /hs-hid; ptc-Gal4 UAS-GFP; UAS-Myc UAS-p53sh/SM5(Gal80)-TM6B. This line was used for the overgrowth and cell translocation assays.
To generate virgins of both these lines, bottles were incubated for 1-3 h at 37°C twice for 1-5 days after lay, activating the hid protein and resulting in death of all fertile males before eclosion. Virgins from heat shocks were only used for experiments and never put back into the stock.
Genotypes and stock numbers for lines used in the genetic screen are listed in Tables S9, S10. Genotypes and stock numbers related to Fig. 4A,B and Fig. S4 are in Tables S12, S13. Stock numbers related to the genotypes shown in additional figures are as follows, in order (exclusive of w):
Fig. S3D (and related panels): BL20108, DGRC203601, BL17396, BL14265
Fig. S3E (and related panels): BL6659, BL6660, BL33623, BL6815
Fig. 4C: BL20713, VDRC10696, d10248, F001574, F000566.
Western blotting
765-Gal4 was crossed to lines carrying p53 knockdown or UAS-Myc to express the transgenes in the entire wing disc. Wing discs from L3 larvae were dissected in cold PBS. Ten discs were placed in RIPA buffer with protease inhibitor and phosphatase inhibitor. LDS sample buffer was added, and the mixture was heated to 70°C for 5 min, then frozen at −80°C until further use. Samples were run on a Thermo Fisher 4–12% bis-tris NuPAGE gel as per manufacturer’s instructions. Protein was transferred onto a PVDF membrane, which was then probed against p53 (DSHB p53-H3 s 1:1000), syntaxin (DSHB 8B-3, 1:1000) or Myc (Santa Cruz Myc d1-717, 1:200) overnight at 4°C. Detection was performed with the Thermo Fisher Pierce™ ECL Western Blotting Substrate as per manufacturer’s instructions. Quantification was performed in ImageJ using the Gel Analyzer tool.
Lethality analyses
To measure survival with two or fewer genes of interest, ptc-Gal4 was crossed to the genotype of interest (i.e. UAS-Myc UAS-p53sh) and the percentage eclosion rate was calculated as: 100×[number of empty pupal cases/(number of uneclosed pupae+number of empty pupal cases)]. Crosses for the genetic screen were set up at 25°C as (hs-FLP) (y) w/hs-hid; ptc-Gal4; UAS-Myc UAS-p53sh/SM5(Gal80)-TM6B ×UAS-gene or UAS-gene/FM7c 2xTb-RFP or UAS-gene/CyO 2xTb-RFP, or UAS-gene/TM6B.
For genes of interest that were homozygous, or heterozygous on the first, second or third chromosomes, respectively (Pina and Pignoni, 2012). For second and third chromosome genes of interest, males of the stock were used and virgins of the stock we generated as described above.
Because chromosomes of the SM5(Gal80)-TM6B balancer are genetically linked by a reciprocal translocation, all progeny in the cross had either ptc-Gal4; UAS-Myc UAS-p53sh or the balancer, which produces a tubby phenotype. Similarly, because of the balancers used, all progeny that did not have the gene of interest had a tubby phenotype. Eclosion was calculated as above only for the non-tubby pupae. Where the gene of interest was homozygous, a relative pupariation rate was also calculated as 100×[number of non-tubby pupae/number of tubby pupae]. This was not calculated where the gene of interest was homozygous lethal and used with a balancer.
To account for the effect of variation in the genetic background of the stocks on survival, two biological replicates were set up for each gene of interest in each experiment. These were flipped four times, for a total of eight replicates for each genotype. Eclosion and relative pupariation for each gene of interest were compared to control with a Student's t-test where distributions were normal or a Mann–Whitney u test where distributions were not normal. When either eclosion or relative pupariation was lower than control (P<0.05), the experiment was repeated. Genes that significantly lowered lethality in at least two independent experiments were considered candidates for tissue phenotype analysis. For simplicity, the aggregate of repeat experiments is shown in Fig. 3C and Fig. S3B-D.
Wing disc analyses
To visualize transformed tissue, we performed the equivalent crosses at 25°C as described above with a UAS-GFP construct on the second chromosome. Wing discs from non-tubby L3 larvae were dissected, fixed, and mounted in Vectashield with DAPI. Apical-basal and anterior-posterior axes were determined by locating the patched-expressing region of the peripodial membrane (Wu et al., 2004). Cleaved caspase and matrix metalloprotease were visualized by immunostaining the fixed tissue (Cell Signaling antibody #9661, 1:200; DSHB antibody 3B8D12, 1:10). To assess translocation, the discs were visualized at 40× magnification on a Leica DM5500Q confocal microscope and the number of GFP-expressing cells in the posterior compartment was counted.
Genes of interest that did not have a significant effect on cell translocation were tested with the more laborious overgrowth assay. Each disc was imaged at 10× magnification, using identical settings within each imaging session, and the red and green channels were exported to separate tiff files. The outline of the disc was selected using the magnetic lasso tool in Photoshop, and the number of pixels in the enclosed areas was measured. The number of green pixels within the disc was measured in ImageJ using the threshold tool at the default setting and then the analyze particles tool. The relative amount of transformed tissue in each disc was then calculated as: number of green pixels in disc/number of pixels in disc.
For each of these two calculations (cell counts for translocation, and disc area ratios for tissue overgrowth), values for each set of discs of a genotype of interest were compared to control, using Student's t-test when distributions were normal or Mann–Whitney u test when distributions were abnormal. For Fig. 2 and Fig. S2, two-tailed statistics were used. For the genetic screen, only samples with a mean greater than control were tested, and one-tailed statistics were used. We then calculated the false discovery rate (FDR)-adjusted P-values using the Benjamini-Hochberg method available in the Python package Statsmodels (https://doi.org/10.25080/Majora-92bf1922-011).
In the genetic screen, to account for variation in the baseline characteristics of the ptc>Myc,p53sh line over time (due to genetic drift) and slight variation in imaging settings from experiment to experiment, these statistical tests were first performed using only the controls from each individual experiment. However, as some experiments contained only a small number of samples for each genotype, and in order to calculate an FDR (above), we also performed the comparisons after aggregating the data for all experiments. Fly lines resulting in enhancement over control with P<0.05 in the individual experiment and FDR<0.1 in the aggregate were considered significant.
Drugs
We used a library of drugs approved by the FDA to treat cancer, several drugs typically used to treat TNBC, as well as novel compounds developed in our laboratory, which showed anti-cancer activity in other fly thyroid and colorectal cancer models. Doses were approximations of the maximum tolerable dose, based on previous experience with other fly models, with lower doses of some drugs based on preliminary experiments on p53/Myc (Table S8). Drugs were dissolved in either water or DMSO, and then diluted 1:1000 by volume in the fly food. Water-soluble drugs were compared to a ‘water’ control for which no drug was added to the food. Drugs dissolved in DMSO were compared to food with 0.1% DMSO by volume.
Drug experiments
We crossed ptc-Gal4 to w; Myc,p53sh at 27°C, where the rate of eclosion on DMSO was 56%, lower than at 25°C. Flies were allowed to lay on four replicates of fresh drug food for each drug for 24 h. They were then flipped into another set of four replicates each and allowed to lay for 24 h, giving a total of eight replicates for each drug condition. The eclosion rate in percent was calculated as: 100×number of empty pupal cases/[number of uneclosed pupae+number of empty pupal cases].
The crosses used for experiments shown in Fig. 6 were performed similarly to those for the genetic screen. Virgin females of hs-FLP/hs-hid; ptc-Gal4; Myc,p53sh/SM5(Gal80)-TM6B were crossed with a control line comprising either genotype w or genes of interest identified in the screen. The eclosion rate in percent was calculated as described above only for non-tubby pupae. The relative pupariation rate in percent was also calculated: 100×[number of non-tubby pupae/number of tubby pupae]. Drug and food conditions were the same as above.
Statistics
Fig. 1A includes both copy number and mutation data; copy number calculation is described above. For the purpose of clustering, mutation data were converted to numerals ranging from 0 to 2, with values <1 indicating likely loss of function and values >1 indicating likely gain of function, as follows: Missense Mutation=2.0, In-frame Insertion=1.9, In-frame Deletion=1.8, Nonstop Mutation=1.7, RNA Alteration=1.6, Splice Site mutation=0.3, Frameshift Insertion=0.2, Frameshift Deletion=0.1, Nonsense Mutation=0.0. Hierarchical clustering was then performed in Python 3 with the graphing package Seaborn (https://zenodo.org/record/883859#.Y3BVeuRBw2w) using the ‘clustermap’ function with the default Euclidean distance metric to generate Fig. 1A. To account for the effect of mixed data types on this clustering, we repeated this analysis using the original categorical designations for mutation type (i.e. ‘Missense Mutation’) and the same copy number data, and performed clustering using Gower distance (Gower, 1971) with the Python package Gower (https://zenodo.org/record/883859#.Y3BVeuRBw2w). This method also did not produce any discernible clusters. Code to reproduce both versions of the figure is available at https://github.com/jennifereldiaz/fly-tnbc.
Statistical tests were performed either in Python 3 with the statistics packages SciPy (http://www.scipy.org/) and Statsmodels (Seabold and Perktold, 2010) or Prism. In Fig. 6D, 2-way ANOVA and multiple t-tests (unpaired) functions in Prism 8 were used.
Survival analyses utilized the breast cancer portion of the TCGA Pan-Cancer clinical endpoints database (Liu et al., 2018) and were performed using the built-in log-rank test, Cox proportional hazard model and Kaplan–Meier curve function in the Python package Lifelines (CamDavidsonPilon/lifelines: v0.14.3, https://zenodo.org/record/1252342). Raw P-values from the log-rank test are shown in the Table S1 rather than false discovery rates, as this analysis is intended to complement the larger body of computational and functional studies on these genes rather than serve as a point of conclusion on its own.
Acknowledgements
We thank R.L.C.'s Laboratory for important discussions and, particularly, Sindhura Gopinath for technical assistance. We thank Avi Ma'ayan (Icahn School of Medicine at Mount Sinai, New York, USA), Giovanni Ciriello (Department of Computational Biology, UNIL, Lausanne, Switzerland) and Yasin Şenbabaoğlu (Genentech, San Francisco, USA) for advice, Benjamin Neel (NYU Langone Health, New York, USA) for sharing data, and Dana Pe'er (Memorial Sloan Kettering Cancer Center, New York, USA) for sharing ISAR data. We thank Peter Smibert (10x Genomics, Pleasanton, USA) for designing key knockdown sequences, and Masahiro Sonoshita (Institute for Genetic Medicine, University of Hokkaido, Japan) for generating and providing the P53 knockdown line.
Footnotes
Author contributions
Conceptualization: J.E.L.D., R.L.C.; Methodology: J.E.L.D.; Investigation: J.E.L.D., V.B., C.B., C.H., T.K.D.; Writing - original draft: J.E.L.D., R.L.C.; Writing - review & editing: R.L.C.; Supervision: J.E.L.D., R.L.C.; Project administration: R.L.C.; Funding acquisition: R.L.C.
Funding
This work was supported by grants from the National Institutes of Health (NIH) (grant nos: U54 OD020353, R01 CA109730, T32 GM007280), and the Mary Kay Ash Foundation. Deposited in PMC for immediate release.
Data availability
All data and accession numbers needed to evaluate the conclusions in the paper are present in the paper and/or the Supplemental Materials. Code is available at github.com/jennifereldiaz/fly-tnbc. Additional data related to this paper may be requested by contacting the corresponding author.
References
Competing interests
The authors declare no competing or financial interests.