Multi-cohort analyses revealed two prognostic 2OG-dependent oxygenases signatures
We analyzed the prognostic significance of 61 2OG-dependent oxygenase genes (Additional file 2) in 19,781 tumor samples from multiple TCGA cohorts [14] covering 25 cancer types (Additional file 1). Prognostic genes were defined as those whose expression levels were significantly correlated with patients’ OS. Pancreatic cancer is difficult to treat. Since the highest number of prognostic genes (29 genes) was observed in the pancreatic cancer cohort (PAAD; 178 samples), two additional pancreatic cancer cohorts from ICGC (PACA-AU and PACA-CA; 269 and 234 samples) were used in combination as training cohorts (Fig. 1a). We defined two gene signatures (signatures 1 and 2) as favorable and unfavorable prognostic factors by taking into consideration genes that were significant in univariate Cox regression analyses in 2 out of 3 pancreatic cancer cohorts (Fig. 1a, b). Signature 1 included KDM8, KDM6B, P4HTM, ALKBH4, and ALKBH7. Likewise, KDM3A, P4HA1, ASPH, PLOD1, and PLOD2 made up signature 2 (Fig. 1a).
Patients were median-dichotomized based on mean expression scores of signatures 1 and 2 genes. Cox regression analyses revealed that patients with high expression of signature 1 genes had significantly better OS in 6 cancer types (Additional file 3): bladder urothelial carcinoma (BLCA: HR, 0.662; 95% confidence interval [CI] 0.450–0.974; P = 0.036), renal papillary cell carcinoma (KIRP: HR, 0.370; 95% CI 0.157–0.871; P = 0.023), liver cancer (LIHC: HR, 0.656; 95% CI 0.424–0.915; P = 0.048 and LIRI-JP: HR, 0.490; 95% CI 0.259–0.938; P = 0.031), lung adenocarcinoma (LUAD: HR, 0.625; 95% CI 0.443–0.879; P = 0.007), pancreatic adenocarcinoma (PAAD: HR, 0.454; 95% CI 0.278–0.741; P = 0.002), and uterine corpus endometrial carcinoma (UCEC: HR, 0.401; 95% CI 0.229–0.702; P = 0.002). Similar results were obtained using log-rank tests, consistent with the fact that signature 1 was a marker of good prognosis (Fig. 2a). In contrast, patients with high expression of signature 2 genes had significantly worse prognosis in 9 cancers: bladder urothelial carcinoma (BLCA: HR, 1.459; 95% CI 1.096–2.137; P = 0.042), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC: HR, 1.972; 95% CI 1.003–3.877; P = 0.045), head and neck squamous cell carcinoma (HNSC: HR, 1.479; 95% CI 1.056–2.072; P = 0.023), renal clear cell carcinoma (KIRC: HR, 1.483; 95% CI 1.096–2.007; P = 0.011), renal papillary cell carcinoma (KIRP: HR, 3.862; 95% CI 1.565–9.526; P = 0.003), liver cancer (LIRI-JP: HR, 5.271; 95% CI 2.429–11.440; P < 0.001 and GSE14520: HR, 2.285; 95% CI 1.458–3.580; P < 0.001), lung adenocarcinoma (LUAD: HR, 1.562; 95% CI 1.116–2.188; P = 0.009), pancreatic adenocarcinoma (PAAD: HR, 1.969; 95% CI 1.217–3.186; P = 0.006), and gastric adenocarcinoma (STAD: HR, 1.725; 95% CI 1.142–2.605; P = 0.009) (Fig. 2b and Additional file 3).
Cross-platform subgroup and multivariate analyses confirmed the validity of signatures 1 and 2 as independent prognostic factors
To assess the independence of signatures 1 and 2 over current tumor staging systems, we performed subgroup analyses of their prognostic effects in patients with early (stages I and/or II), intermediate (stages II and/or III), and late (stages III and/or VI) cancer stages. Kaplan–Meier analyses revealed that signature 1 successfully identified high-risk (low-expression score) and low-risk (high-expression score) patients with early (bladder urothelial carcinoma, liver cancer, lung adenocarcinoma, pancreatic adenocarcinoma, and uterine corpus endometrial carcinoma), intermediate (liver cancer and uterine corpus endometrial carcinoma), and late (renal papillary cell carcinoma) disease stages (Fig. 3a and Additional file 4A). Signature 2 was also independent of disease stage as it successfully predicted survival in early (liver cancer, lung adenocarcinoma, and pancreatic adenocarcinoma), intermediate (bladder urothelial carcinoma, liver cancer, and gastric adenocarcinoma), and late (bladder urothelial carcinoma, head and neck squamous cell carcinoma, renal papillary cell carcinoma, liver cancer, and gastric adenocarcinoma) stages (Fig. 3b and Additional file 4B).
ROC analyses were employed to determine the predictive performance (sensitivity and specificity) of signatures 1 and 2 on 5-year OS. Signature 1 performed the best, as measured by area under the curve (AUC), in pancreatic adenocarcinoma (PAAD: AUC = 0.754) followed by renal papillary cell carcinoma (KIRP: AUC = 0.738), liver cancer (LIRI-JP: AUC = 0.652 and LIHC: AUC = 0.613), bladder urothelial carcinoma (BLCA: AUC = 0.645), uterine corpus endometrial carcinoma (UCEC: AUC = 0.635), and lung adenocarcinoma (LUAD: AUC = 0.625) (Fig. 3c and Additional file 4C). Signature 2 performance in renal papillary cell carcinoma (KIRP: AUC = 0.810) was the best, followed by cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC: AUC = 0.692), liver cancer (GSE14520: AUC = 0.675 and LIRI-JP: AUC = 0.625), pancreatic adenocarcinoma (PAAD: AUC = 0.668), renal clear cell carcinoma (KIRC: AUC = 0.665), head and neck squamous cell carcinoma (HNSC: AUC = 0.632), lung adenocarcinoma (LUAD: AUC = 0.623), gastric adenocarcinoma (STAD: AUC = 0.618), and bladder urothelial carcinoma (BLCA: AUC = 0.605) (Fig. 3d and Additional file 4D). Performance of both signatures 1 and 2 was superior to current tumor-node-metastasis (TNM) staging except for the following: signature 1 in liver cancer, lung adenocarcinoma, and uterine corpus endometrial carcinoma and signature 2 in bladder urothelial carcinoma, renal clear cell carcinoma, liver cancer, and lung adenocarcinoma (Fig. 3c, d and Additional file 4C, D). Remarkably, when used in combination with TNM staging, both signatures consistently outperformed each of the individual classifiers, reinforcing their incremental prognostic values (Fig. 3c, d and Additional file 4C, D). Significantly, while TNM staging could not predict outcome in cervical squamous cell carcinoma and endocervical adenocarcinoma patients (CESC: AUC = 0.455), signature 2 sufficiently served as an adverse prognostic factor (CESC: AUC = 0.692) (Fig. 3d).
Univariate Cox regression analyses revealed that TNM stage was associated with patient survival in different cancer types except for cervical squamous cell carcinoma and endocervical adenocarcinoma, and pancreatic adenocarcinoma (Additional file 3). This was expected given the low AUC values for TNM stage in both pancreatic adenocarcinoma (PAAD: AUC = 0.593) and cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC: AUC = 0.455), suggesting that current TNM staging system for these cancers are inadequate (Fig. 3c, d). Multivariate Cox regression analyses after adjusting for TNM stage showed that signatures 1 and 2 remained significantly associated with survival (Additional file 3). For 2 liver cancer cohorts, we considered additional clinicopathological features. The GSE14520 cohort consisted of Chinese patients with hepatitis B-associated hepatocellular carcinoma [17], whereas LIRI-JP was a Japanese-based cohort of mixed etiology [29]. Tumor size, cirrhosis, TNM stage, Barcelona Clinic Liver Cancer (BCLC) stage, and alpha-fetoprotein (AFP) levels were all significantly associated with survival in the GSE14520 cohort; tumor size could also predict survival in the LIRI-JP cohort (Additional file 3). When these significant covariates along with signatures 1 or 2 were included in multivariate Cox models, the signatures remained significant risk factors: signature 1 (LIRI-JP: HR, 0.541; 95% CI 0.283–0.904; P = 0.043) and signature 2 (LIRI-JP: HR, 4.539, 95% CI 2.055–10.029; P < 0.001 and GSE14520: HR, 2.012; 95% CI 1.267–3.195; P = 0.003) (Additional file 3). These results highlight the potentially superior prognostic ability of our signatures: signatures 1 and 2 identified high- and low-risk patients in 8 and 12 independent cohorts covering 10 cancer types (Fig. 1a).
Significance of somatic mutations in risk-stratified patients
Patients were risk stratified into low- and high-risk groups using signatures 1 and 2. For signature 1, high-risk patients had significantly lower expression levels of good prognosis genes ALKBH4, ALKBH7, KDM8, KDM6B, and P4HTM (Additional file 5A). In contrast, high-risk patients as stratified by signature 2 had significantly higher expression levels of adverse prognosis genes ASPH, KDM3A, P4HA1, PLOD1, and PLOD2 (Additional file 5B). To ascertain the relationship between tumor hypoxia and expression of signature genes, hypoxia scores were computed for each patient as mean expression values (log2) of 52 hypoxia signature genes [19]. Signature 1 expression scores in patients negatively correlated with hypoxia score (Additional file 6A). Since tumor hypoxia is associated with distant metastasis, recurrence, and reduced therapeutic response [30], high expression of signature 1 genes (low hypoxia score) was correlated with less advanced disease states consistent with it being a marker of good prognosis (Additional file 6A).
Conversely, signature 2 scores positively correlated with tumor hypoxia and hence poor survival outcomes (Additional file 7A). We anticipated that patients’ individual risks of death, as determined from signatures 1 and 2, would positively correlate with tumor hypoxia. Indeed, the risk score for each patient, as calculated by taking the sum of Cox regression coefficient for each of the individual genes multiplied with its corresponding expression value [31], was correlated with the hypoxia score (Additional files 6B, 7B). Hence, high-risk patients had more hypoxic tumors, suggesting that our gene signatures are efficient and adequate in predicting death.
To ascertain the association between patients’ risks, as determined by our gene signatures, and somatic mutations, we retrieved the five most commonly mutated genes for each cancer. Mutations in PCDHA1, a cell adhesion gene from the cadherin superfamily, were associated with short survival in bladder urothelial carcinoma (BLCA: HR, 1.649; 95% CI 1.058–2.569; P = 0.027) and gastric adenocarcinoma (STAD: HR, 1.525; 95% CI 1.007–2.307; P = 0.046) but with prolonged survival in uterine corpus endometrial carcinoma (UCEC: HR, 0.516; 95% CI 0.272–0.978; P = 0.042) (Additional file 3). Mutations in another gene from the protocadherin alpha cluster, PCDHA2, were also associated with adverse outcomes in gastric adenocarcinoma (STAD: HR, 1.604; 95% CI 1.061–2.427; P = 0.025) (Additional file 3). Mutations in TTN and the tumor suppressor TP53 were associated with short survival in bladder urothelial carcinoma (BLCA: HR, 1.610; 95% CI 1.091–2.376; P = 0.016) and uterine corpus endometrial carcinoma (UCEC: HR, 1.780; 95% CI 1.025–3.090; P = 0.041) (Additional file 3). Interestingly, another tumor suppressor PTEN, when mutated, was linked to better outcomes in uterine corpus endometrial carcinoma (UCEC: HR, 0.427; 95% CI 0.234–0.781; P = 0.006) (Additional file 3). Similar observations were made for a lipid kinase gene PIK3CA in uterine corpus endometrial carcinoma (UCEC: HR, 0.362; 95% CI 0.190–0.689; P = 0.002) (Additional file 3). Likewise, MUC4 mutations prolonged survival in renal clear cell carcinoma patients (KIRC: HR, 0.570; 95% CI 0.370–0.880; P = 0.012) (Additional file 3), an observation that is consistent with another study [32].
Multivariate Cox regression analyses on signatures 1 and 2 while controlling for significant somatic mutation variables revealed that the gene signatures were independent survival predictors for bladder urothelial carcinoma (signature 1: HR, 0.686; 95% CI 0.466–0.912; P = 0.047 and signature 2: HR, 1.411; 95% CI 1.062–2.070; P = 0.048), renal clear cell carcinoma (signature 2: HR, 1.520; 95% CI 1.123–2.056; P = 0.007), gastric adenocarcinoma (signature 2: HR, 1.800; 95% CI 1.184–2.737; P = 0.006), and uterine corpus endometrial carcinoma (signature 1: HR, 0.519; 95% CI 0.293–0.920; P = 0.024) (Additional file 3). Signatures 1 or 2 and mutation status were collectively associated with OS (Fig. 4). In bladder urothelial carcinoma, high-risk patients (low signature 1 score) harboring mutant alleles of PCDHA1 had ~ 50% increased mortality at 5 years compared to low-risk patients (high signature 1 score) with wild-type PCDHA1 (P = 0.016; Fig. 4). Although results were less dramatic for PCDHA1 and signature 2, we still observed a ~ 25% elevated mortality at 5 years for these two patient groups with bladder urothelial carcinoma (P = 0.040; Fig. 4). In gastric adenocarcinoma, high-risk patients (high signature 2 scores) with mutant PCDHA1 had the worst outcomes (P = 0.002; Fig. 4). Conversely, PCDHA1 mutation was associated with good prognosis in uterine corpus endometrial carcinoma, hence high-risk patients with wild-type PCDHA1 had the lowest survival rates while survival was prolonged by ~ 20% in low-risk patients with mutant PCDHA1 (P = 0.003; Fig. 4). PIK3CA (P < 0.001) and PTEN mutations (P = 0.001) were associated with good outcomes in uterine corpus endometrial carcinoma (Fig. 4). Mutations in another cadherin gene PCDHA2 when considered alongside signature 2 were also associated with survival in gastric adenocarcinoma (P < 0.001; Fig. 4). Survival rates were reduced by ~ 37% in high-risk patients with mutant PCDHA2 (Fig. 4). Joint relation between TP53 mutations and signature 1 significantly influenced survival in uterine corpus endometrial carcinoma (P = 0.002; Fig. 4). Since MUC4 mutations were associated with good outcomes, survival rates were the lowest in high-risk patients (high signature 2 scores) with wild-type MUC4 (P = 0.003; Fig. 4).
Tumor suppressive roles of KDM8 through cell cycle regulation and cell adhesion maintenance
Of all the signature genes, KDM8 was identified as one of the most down-regulated genes in tumors (Fig. 5a). Patients with high KDM8 levels had a significantly lower risk of death in pancreatic and liver cancer cohorts (Fig. 5e). Prognostic significance of KDM8 was also independent of tumor stage (Fig. 5e). KDM8 expression decreased as tumor malignant grade increased in that stage 1 tumors had the highest median KDM8 values (Fig. 5b). Moreover, KDM8 expression was negatively correlated with hypoxia score, indicating that patients with low levels of KDM8 had more hypoxic tumors and poorer survival outcomes (Fig. 5c). Together, these observations suggest that KDM8 may function as a tumor suppressor. This hypothesis is corroborated by an independent report on the role of KDM8 in cell cycle regulation [33]. Indeed, we observed that KDM8 expression was negatively correlated with the expression levels of canonical cell cycle genes: cyclins (CCNA2, CCNB1, CCNB2, CCND1, CCNE1, and CCNE2) and cyclin-dependent kinases (CDK1, CDK2, CDK4, CDK6, CDK7, and CDK8), which were consistent across all liver and pancreatic cancer cohorts (Fig. 5d). This implied that KDM8 is required for tight control of the cell cycle machinery and its reduction may lead to aberrant proliferation commonly seen in cancer cells.
To ascertain the biological consequences of deregulated KDM8 expression, we conducted differential expression analysis on liver cancer patients categorized into KDM8-low and -high groups. A total of 745 genes were differentially expressed (DEGs) between the two groups (fold change > 2 or < − 2, P < 0.05) (Additional file 8). Significant enrichments of biological pathways involved in metabolism, immune regulation, VEGF production, inflammation, and cell adhesion were observed (Fig. 5f and Additional file 9). Furthermore, DEGs were overexpressed as targets of HNF4A, HNF4G, FOXA1, FOXA2, and NR2F2 transcription factors (TFs) (Fig. 5g). These TFs play central roles in cell polarity maintenance and epithelial differentiation [26, 34, 35], hence down-regulation of KDM8 may drive epithelial–mesenchymal transition (EMT) and tumor progression. HNF4A is a key TF responsible for regulating a myriad of hepatic functions including cell junction assembly [26, 36]. Pathway analysis of KDM8 DEGs revealed enrichment of processes related to cell adhesion, suggesting potential crosstalk between KDM8 and HNF4A. Of the 745 DEGs, analysis on a hepatoma-based HNF4A chromatin immunoprecipitation-sequencing dataset demonstrated that 148 genes were directly bound by HNF4A [37]. To further reinforce the interplay between HNF4A and KDM8, we observed that 110 of the 745 DEGs were overrepresented in HNF4A-null mice [26] and 45 of these genes were direct HNF4A targets (Fig. 5h). Differential expression analysis between HNF4A-deficient and wild-type mice showed that a majority of the 45 genes were down-regulated, as expected, suggesting that HNF4A directly activates their gene expression, many of which are involved in a multitude of cell adhesion processes (Fig. 5i).