In our recent article entitled “Genomic Characterization of Esophageal Squamous Cell Carcinoma Reveals Critical Genes Underlying Tumorigenesis and Poor Prognosis” published in the American Journal of Human Genetics, we reported a high-throughput genomic sequencing study of ESCC. By detecting and characterizing somatic variants, we comprehensively analyzed the effects of CNVs, mutations, and relative gene expression on patients’ overall survival (OS). For essential genes, further biological experiments were carried out in vitro and in vivo. The goal was to identify critical mutations underlying the poor prognosis of ESCC and to find potential prognosis makers and therapy targets [12].
In the 67 samples comprising the sequencing cohort, 19,434 mutations were found in exon regions of the ESCC genome. Using MutSigCV software [13], tumor protein p53 (TP53), cyclin-dependent kinase inhibitor 2A (CDKN2A), notch homolog 1, translocation-associated (Drosophila) (NOTCH1), and nuclear factor, erythroid 2 like 2 (NFE2L2) were identified as significantly mutated genes. These four genes were then subjected to OS analysis. The results showed that patients harboring NOTCH1 mutations had a longer lifespan after surgery than those without mutations. In addition, using a cohort of independent 321 samples, we found that individuals with lower NOTCH1 expression had a higher 5-year OS rate than those with higher NOTCH1 levels. Multivariate Cox regression analysis indicated that after adjustment for age, sex, tumor stage, smoking, and alcohol consumption, NOTCH1 expression was significantly associated with OS. These findings suggest that NOTCH1 might play an essential role in ESCC progression.
Compared with mutations, we found much higher frequencies of CNVs in ESCC. To explore their influence on survival, both CNVs and expression of the associated genes were detected by quantitative polymerase chain reaction (qPCR). A number of CNVs and genes were found to be associated with poor patient outcomes.
MYB proto-oncogene like 2 (MYBL2), a reported cell cycle regulator, showed elevated gene copy numbers in 70% of the tumors subjected to whole-genome sequencing. Its transcripts and protein were also overexpressed in ESCC compared with adjacent normal tissues. Both the gene copy number and expression of MYBL2 showed negative effects on individuals’ survival. In vitro studies demonstrated that overexpression of MYBL2 increased proliferation in ESCC cell lines.
Non-coding RNA plays an important role in tumorigenesis and development. However, previous comparative genomic hybridization studies mainly focused on coding genes. In our study, a CNV-harbored microRNA, miR-4707-5p, was found to be significantly overexpressed in tumors, and individuals with high miR-4707-5p levels exhibited worse prognosis than those with low miR-4707-5p levels. An in vitro pilot experiment revealed that miR-4707-5p has a strong ability to promote cell migration and invasion. The pro-metastasis ability of miR-4707-5p was confirmed in two different mouse tumor metastasis models. Furthermore, through mechanism studies, we found that miR-4707-5p can decrease E-cadherin by targeting adenosine deaminase, RNA specific B1 (ADARB1), in turn promoting cell metastasis. Therefore, the CNV-miR-4707-5p-ADARB1-E-cadherin axis might be a target of ESCC therapy.
Interestingly, VANGL planar cell polarity protein 1 (VANGL1), a novel high-frequency mutant gene that we found, showed no association with ESCC prognosis. Thus, VANGL1 mutation might play a role only at early stages of the neoplastic process.
Although a list of prognosis predictors was found, we could not rule out the possibility that low-frequency mutations also have an impact on prognosis, at least for particular individuals. However, due to the limited sample size, the association between these low-frequency mutant genes and ESCC prognosis could not be analyzed in this study. As cancer occurrence and development are the result of the participation of multiple genes, a multi-gene prognostic signature would be more convincing, though this is difficult to realize because of the limited sample size, low frequency of mutations, and complexity of genomic variants.