Mapping genes to pathways and to hallmarks
We created a map that connects 1384 genes to 343 pathways and to the 10 cancer hallmarks (Fig. 1). On average, 200 genes signal to a hallmark via 25 pathways. However, there is a large variation in the number of genes and pathways connected to each of the hallmarks. For example, the hallmarks “resisting cell death,” “sustaining proliferative signaling,” and “tissue invasion and metastasis” are connected to, and thus possibly enabled by, 50 or more pathways and more than 300 genes. In contrast, the hallmarks “replicative immortality” and “reprogramming energy metabolism” are connected to 1 and 2 pathways, respectively, and only a handful of genes. Additionally, many GO processes in the PID could not be linked to any of the 10 cancer hallmarks. This resulted in 385 genes and 119 pathways that were solely connected to the category “Other” and were not considered further in this work.
Some well-known cancer genes are hubs in the map, i.e., they appear in many pathways and influence multiple hallmarks. For example, phosphatidylinositol-4,5-biphosphate 3-kinase, catalytic subunit alpha (PIK3CA) appears in 39 pathways and signals to 8 hallmarks, and tumor protein p53 (TP53) is found in 9 pathways and links to 4 hallmarks. However, the majority of genes are part of 1 or 2 pathways and link to 1 or 2 hallmarks. See Additional file 2: Figure S1, Additional file 3: Figure S2, Additional file 4: Figure S3, Additional file 5: Figure S4, Additional file 6: Figure S5 for a detailed graphical overview of the connectivity in this map.
The multilevel mutational landscape
We projected the gene mutation data of the 2740 TCGA tumor samples onto the map. Grouping of the samples by their cancer types allowed us to investigate the average MI per cancer type at the level of genes, pathways, and hallmarks (Fig. 2a). At the level of genes, there was a large variation in mutation frequency among cancer types. This is not surprising because it is well known that cancer types are characterized by specific gene mutation signatures [12]. For example, adenomatous polyposis coli (APC) mutations are mostly specific to colorectal adenocarcinomas (CORE); 77% of these samples have an APC mutation, much more than the frequency for any other cancer type. At the level of pathways, there was a smaller variation in MI among cancer types. For example, the coefficient of variation (CoV) of the average MI score of the p53 pathway across cancer types was smaller than the CoV of the mutation status of the gene TP53. The CoV is defined as the SD divided by the mean. In this case, it is the SD of the 10 average MI scores divided by the mean of these 10 MI scores. At the level of hallmarks, the average MI scores across cancer types were even more similar to each other. Many of them were between 75 and 100%, indicating that for almost all tumor samples across cancer types there are gene mutations that link to each of the hallmarks. The two aforementioned sparsely connected hallmarks are exceptions. The greater similarity across cancer types at the level of hallmarks is also shown by the decrease of the average CoV from 0.93 to 0.53 to 0.25 when moving from genes to pathways to hallmarks (Fig. 2b).
We compared the CoVs derived from this map with those derived from 1000 randomly rewired multilevel maps (see “Randomly rewired map” in the Additional file 1 section). For these randomly rewired maps, genes were connected to randomly selected pathways, and pathways were connected to randomly selected hallmarks. However, the in-degree and out-degree of the two bipartite graphs in this map, i.e., from genes to pathways and from pathways to hallmarks, were maintained. We observed substantially higher CoVs for several pathways for the actual map compared with the randomized map (Additional file 7: Figure S6). This indicates that cancer types are characterized by different MI scores at the levels of not only genes but also pathways. Although the CoVs were small at the level of hallmarks, we found that for some hallmarks they were slightly yet consistently larger than those for randomly rewired maps, hinting that cancer types might have different MIs in hallmarks.
In conclusion, the multilevel map populated with mutation data establishes the already intuitive answer to our question of how genetically different tumors can share the same hallmark characteristics; mutations in different genes impinge on the same or functionally related pathways and ultimately deregulate the same biological processes.
Mutual exclusivity and co-occurrence of gene mutations that enable hallmarks
Our strategy of propagating the binary mutation calls to the levels of pathways and hallmarks is based on the concept of “mutual exclusivity.” We investigated ME and CO genes in the context of the multilevel map.
The results of the statistical analyses showed that the largest number of significant ME and CO associations was discovered for pathway pairs (22% of all pathway pairs tested) followed by hallmark pairs (8%) and then control pairs (6%) (Fig. 3a). For all categories, we observed more ME associations than CO associations. However, the ratio of significant ME to CO associations differed substantially among the three categories (P = 0.03, Chi square test). Importantly, at the level of pathways, we observed more ME associations than CO associations (the ME:CO ratio was 11:1 for pathway pairs compared with an ME:CO ratio of 2:1 for control pairs), confirming the notion that mutual exclusivity is more prevalent in pathways. At the hallmark level, the ME:CO ratio was 3:1, which means that relative to the ME:CO ratio of 11:1 for pathways, there were more CO associations for the hallmark pairs. This observation hints that enabling hallmarks might require the concerted deregulation of distinct biological functions, yet not providing overwhelming evidence for this hypothesis.
This pattern was similar across most cancer types except for ovarian serous cystadenocarcinoma (OV), which was dominated by CO associations at the hallmark level (Fig. 3b). An overview of all significant associations is shown in Additional file 8: Table S1.
To investigate CO and ME associations further in pathways and hallmarks, we employed a complementary statistical analysis. Instead of looking only at the significant associations, we analyzed the complete distribution of P values using the TS statistic [20]. The TS and its confidence interval (CI) were obtained for the P values derived from the ME and CO tests for pathway, hallmark, and control pairs, separately (Fig. 3c). A high TS indicates that there are more small P values than is expected by chance. Interestingly, for hallmark and control pairs, the TS was significantly larger for CO associations than for ME associations. Thus, although there were more significant individual ME associations (Fig. 3a), the overall distribution of P values was skewed towards low P values for CO associations (Fig. 3c). Other observations of the TS were in line with the previous analysis. Specifically, the TS for ME associations was the highest in pathway pairs, followed by hallmark pairs, for which the TS was significantly lower. Control pairs had the lowest TS. For CO associations, in contrast, there was no significant difference in the TS between pathway and hallmark pairs. This provides further evidence for the hypothesis that ME is more prevalent in pathway pairs than in hallmark pairs.
The role of infrequently mutated genes in pathway deregulation and hallmark activation
We employed the multilevel map to elucidate the role of genes that are not SFM in a cancer type.
First, we observed that many tumor samples (approximately 20% on average across all cancer types and hallmarks), which lacked mutations in SFM genes that were linked to hallmarks, had mutations in other genes that were linked to hallmarks and could potentially play a role in enabling them (Additional file 9: Figure S7).
Recently, ME and CO analysis was used to illuminate the role of these genes through associations with other genes in sub-networks [10]. Specifically, Leiserson et al. [10] identified sub-networks of ME and CO associations between genes using a protein–protein interaction network as a scaffold. In the present study, we used the multilevel map as a scaffold and identified many associations among genes that are not SFM. For example, the ME and CO analysis for head and neck squamous cell carcinoma (HNSC) revealed several associations between SFM and non-SFM genes (Fig. 4a). In general, almost half (45%) of all significant pairs at the pathway and hallmark levels included at least one gene that was not SFM in the tumor type (Fig. 4b). The distribution of pairs containing or consisting only of SFM genes was significantly skewed across ME and CO associations (P < 0.001, Chi square test) as well as across the three categories (P < 0.001, Chi square test). In addition, we found a large number of CO pairs at the hallmark level containing one or no SFM genes. This is a clear indication that the concerted effect of mutations in genes that are not SFM could play an important role in establishing cancer hallmark characteristics.