Quality control of sequence data
We performed WES in six cases of DSRCT that, on the basis of previously published results of transcriptome profiling, microRNA (miRNA) in situ hybridisation (ISH), and IHC assays, were divided into three groups that recapitulated the traits of mesenchymal–epithelial reverse transition (MErT), hybrid/partial EMT, and EMT [9]. After alignment of raw reads, we evaluated the quality of data in terms of coverage and sample purity. Analysis with Conpair showed that each tumor-normal pair was correctly matched: a sample concordance > 99% and negligible levels of cross-sample contamination among tumor-normal pairs were observed (Additional file 1: Figure S2a, b). The mean coverage, after removal of duplicate reads, ranged from 109.4× to 123.9× for tumor samples and from 52.3× to 71.7× for normal samples (Additional file 1: Figure S2c). We also estimated the percentage of target bases covered at least 50× that ranged from 71.4% to 88.8% for tumor samples and from 46.0% to 71.9% for normal samples (Additional file 1: Figure S2d).
After variant calling and filtering to remove possible artefacts and false positive variants (see criteria in Additional file 1: Figure S1), we identified 137 somatic mutations affecting 135 genes (listed in Additional file 2: Table S1).
Mutation spectrum across the cases
Each case was characterized by a sizeable number of mutations: from 8 to 33 mutations per case (mean, 23) (Fig. 1a). The number of mutations for each case according to chromosomal location indicated a non-preferential pattern of distribution (Fig. 1a) and a positive correlation between the total number of mutations/chromosome and chromosome length (Pearson’s r = 0.67). A consistent proportion of mutations were categorized as missense or intronic, followed by silent mutations (Fig. 1b).
Figure 2 shows the inter-patient heterogeneity of mutated genes across the six DSRCT cases. Among them, 135 were case-specific and only two genes, mucin 19 (MUC19) and glucuronidase, beta pseudogene (GUSBP1), were found mutated in two cases but in different positions.
Comparison between the mutational profile and COSMIC mutational signatures
Before surgery, all patients had received 3–6 cycles of multi-drug chemotherapy including one or two alkylating agents [9]. To evaluate whether the identified mutations may have been due to the dominant effect of chemotherapy, we tested 30 mutational signatures from COSMIC in the aggregated list of our 137 somatic mutations. The comparison between the observed (Fig. 3a) versus the reconstructed (Fig. 3b) mutational profiles with an irrelevant error (Fig. 3c) indicated that the DSRCT profile essentially derived from the contribution of three mutational signatures denominated in COSMC as 1, 3 and 29 (respective weights: 0.36, 0.26, and 0.10); the mutational signature associated with treatment with alkylating agents (signature 11, weight = 0) did not contributed to mutational profile.
Mutated genes, associated pathways, and manual curation
Functional analysis by using IPA indicated that the 135 genes affected by damaging mutations in DSRCTs caused enrichment in the following biological pathways: DNA damage response (DDR) of kidney cell lines, DNA damage, delay in repair of DNA, repair of DNA, DNA damage checkpoint, and DDR of epithelial cell lines (all P < 0.001) (Fig. 4a).
Manual curation of the list of mutated genes was based on the recent evidence that genes encoding RNA-binding proteins (RBPs) and DNA/RNA-binding proteins (DRBPs) as well as other genes related to RNA machinery are strictly connected to DDR [23] and on our previous data [9]. According to this section, we found that 27% of the 135 mutated genes belonged to the DDR network (Fig. 4b) or to MErT/EMT process (Fig. 4c).
Categorization in these two biological processes showed that each case harboured mutation in at least one gene for each category. Details regarding each gene hereafter described, such as full name, a summary from NCBI Entrez Gene database (http://www.ncbi.nlm.nih.gov/gene) and PubMed (http://www.ncbi.nlm.nih.gov/pubmed) and their presence in COSMIC and in the Cancer Census are reported in Additional file 2: Table S2.
Mutations in the DDR network
The DDR network was affected by 26 mutated genes that can be grouped in three subsets: core genes, genes encoding RBPs, and other genes related to RNA machinery.
Core genes
A damaging missense mutation in ATR serine/threonine kinase (ATR), one of the two core genes of DDR (the other is serine/threonine kinase, ATM), was found in DSRCT4 of the MErT group. The function abrogation of this gene could allow tumors to escape checkpoint control and could lead to uncontrolled replication (see also references in Additional file 2: Table S2).
Tumor protein p53 (TP53), even not a canonical core gene of DDR, being a master regulator of transcription and clearly associated with DDR, was included in the subset of core genes. The mutation here found in DSRCT5 of the hybrid EMT group was a deleterious missense mutation that was localized in the canonical hotspot disabling the encoded protein. The allele frequency of this mutation was low (12.9%, Additional file 2: Table S1), but this value may be considered in agreement with the strong p53 nuclear immunostaining with a patchwork pattern in half of the tumor cells (data not shown).
Genes encoding RBPs
All six cases had at least one mutation affecting RBP genes. Among the 12 identified mutations, 2 were intronic, 1 was an in-frame deletion, and 6 were missense (Fig. 4b), being mutations in BCL2-associated transcription factor 1 (BCLAF1) and cleavage and polyadenylation factor subunit (PCF11) genes predicted to be damaging.
Beside BCLAF1 and PCF11, three other mutated genes [CWC22 spliceosome-associated protein homolog (CWC22), debranching RNA lariats 1 (DBR1), and adenosine deaminase domain containing 1 (ADAD1)] are involved in pre-mRNA production, defective DNA repair, miRNA and snoRNA regulation with prevalence in the early step of spliceosome editing. In particular, ADAD1 function has not yet been deciphered but, on the basis of the hypothesis that the adenosine deaminase acting on RNA (ADAR) enzyme family controls RNA editing, an alteration in one of its members may affect a wide range of RNA processing activities [24, 25]. The function of HIV-1 Tat specific factor 1 (HTATSF) gene is also largely unknown, but its mutation has been reported to parallel the decreased expression of many genes [26].
Zinc finger proteins (ZNFs) belong to a number of different structural families. Three out of ZNF mutated genes (ZNF254, ZNF600, and ZNF225) were annotated among the RBP subset because they belong to the class of Cys2-His2 (C2H2) ZNFs and thus are expected to act consistently [27].
Finally, two additional genes, U2 small nuclear RNA auxiliary factor (U2AF1) and RNA binding motif protein 45 (RBM45), were found mutated at an intron site, and such mutations might have a negative impact on translational efficiency, as reported [28, 29].
Other genes related to RNA machinery
This group included nine genes with a missense mutation, four of which with damaging effect, and two genes with a damaging nonsense mutation (Fig. 4b).
The genes affected by a damaging mutation were ribonucleotide reductase regulatory subunit M2 (RRM2), whose deregulation was reported to impair a key step in DNA synthesis; ARID1A, whose wild type-encoded protein acts as an epigenetic regulator; eukaryotic translation initiation factor 4 gamma 1 (EIF4G1), whose deregulation could negatively affect the correct mRNA circularization and/or translation activity downstream of serine/threonine-protein kinase mTOR; ZNF708, that, like RBPs, is reported to act as interaction module with DNA, RNA, proteins, and other molecules and whose deregulation could affect gene transcription, translation, and mRNA trafficking [30].
The remaining two genes [ubiquitin specific peptidase 9 X-linked (USP9X) and WW and C2 domain containing 1 (WWC1)], harbouring damaging mutations, are described in the following subsection since they are also involved in MErT/EMT.
Mutations in MErT/EMT genes
Eight mutated genes are reported to be directly involved in MErT/EMT. Four genes [transgelin (TAGLN1), ubiquitin specific peptidase 9, X chromosome (USP9X), WW and C2 domain containing 1 (WWC1), and transducing beta like 1 X-linked receptor 1 (TBL1XR1)], harbouring missense damaging mutations, were found in the MErT group, strongly supporting the hypothesis that such mutations induced changes consistent with an epithelial-like phenotype. In particular, the two mutated genes (USP9X and WWC1) present in both biological categories were found in the DSRCT2 case, paradigm of MErT being, according to IHC, the most enriched in epithelial-related molecules [9].
Two damaging mutations were also found in the actin like 8 (ACTL8) and glutamate metabotropic receptor 7 (GRM7) genes, with each one recorded in one of the cases of the Hybrid EMT group. Interestingly, aberrant expression of ACTL8, a cancer/testis antigen gene, is reported to associate with stem cell-like enrichment and an EMT signature [31], both of which are characteristics of DSRCT. As GRM7, it has been proposed that its silencing may provide a further mechanism that regulates MErT/EMT by inhibiting TGFbeta/SNAIL via AMPK activation.
A non-damaging missense mutation was detected in dehydrodolichyl diphosphate synthase subunit (NUS1) gene in a case belonging to the EMT group. High expression of this gene is reported to associate with EMT, and its silencing with MErT. Furthermore, missense non-damaging mutations were observed in two genes, deasmoglein2 (DSG2) and calcium-responsive transcription factor (CARF), whose involvement in MErT is indirectly suggested: for DSG2, by its belonging to a cadherin cell adhesion molecule superfamily; for CARF, by its contribution to WNT signalling activation.
Regarding the non-damaging splice-site mutation in TYRO3 protein tyrosine kinase (TYRO3), the aberrant SNAIL-mediated expression of TYRO3 is reported to be associated with EMT.
CNA landscape of DSRCT
The pattern of somatic CNAs showed several regions amplified or lost in a case-specific fashion in addition to large amplifications of the long arm of chromosome 1 recurring in more than 50% of cases and complete or partial loss of chromosome 6 present in 50% of cases (Fig. 5a). All CNA events, as details of category, chromosome location, gene in the region and cytoband, are reported in Additional file 2: Table S3.
CNAs in DDR and MErT/EMT genes
Focusing on gain and loss of copy number (Additional file 2: Table S4), we found that a sizable number of CNA-affected genes were recurrent across patients and affected both DDR and MErt/EMT; details for each of the genes cited below are reported in Additional file 2: Table S2.
Among the DDR category, the following genes displayed gain/amplification (Fig. 5b, left part): TATA box-binding protein-associated factor (TAF7), amplified in two cases, and reported to be involved in very early steps of transcription; high mobility group nucleosome-binding domain 1 (HMGN1) and bromodomain and WD repeat domain containing 1 (BRWD1), playing a role in chromatin structure and remodelling. A homozygous deletion of ataxin 2 (ATXN2), codifying for a RBP, was observed; it has been reported that silencing of ATXN2 gene, which is known to affect neurodegenerative diseases, led to disturbance in RNA transcription.
Overall CNAs mostly segregated in the MErT/EMT category (Fig. 5b, right part). In particular, in DSRCT2 we found high amplification of many genes associated to squamous or terminal squamous differentiation, late cornified envelope (LCE, 18 genes) gene family, small proline-rich protein family (SPRR, 11 genes), and cysteine rich C-terminal 1 (CRCT1), to sulfur hair keratin (keratin-associated protein, KRTAP, 33 genes) family, and brain-specific cadherin-like adhesion molecules (protocadherin, PCDH, 54 genes). Many of these genes showed recurrent gains (three copies) in DSRCT5 and 6, and genes of the PCDH family were also present in DSRCT6.
ETS proto-oncogene 2, transcription factor (ETS2) amplification was present in DSRCT4 of the MErT group. It has been reported, at preclinical level, that ETS2 is a specific master factor, able to promote hybrid EMT by directly binding (and thus preventing) miRNA-200 transcription.
FOXQ1 and FOXF2 genes on chromosome 6 were present in homozygous deletion in DSRTC5 of the hybrid EMT group and in heterozygous deletion in DSRTC2, 3, and 7.
It has been demonstrated that FOXQ1 is a critical mediator of EMT, and, in our cases, the null (DSRCT5) or attenuated (DSRCT2, 3, and 7) profiles dictated by the defective status of the gene were in line with both the gain/amplification of the epithelial-related genes found here (DSRCT2, 5, and 7) and the previously reported expression of E-cadherin/miRNA-200 module in DSRCT2 and 5 [9].
In vitro FOXF2 loss acts as an EMT-suppressing transcription factor whose deficiency induces EMT through twist family bHLH transcription factor 1 (TWIST) up-regulation, a finding consistent with zinc finger E-box binding homeobox 1 (ZEB1) expression observed in DSRCT3, 5, and 7 [9].
Imbalances of chromosomes 1 and 6
A recurrent gain of the long arm of chromosome 1 was identified in four cases (DSRCT2, 3, 5, and 7) (Fig. 6a), loss of the entire chromosome 6 in two cases (DSRCT3 and 7), and loss restricted to the long arm of chromosome 6 in one case (DSRCT5) (Fig. 7a and Additional file 1: Figure S3). Given the high frequency of these CNAs across the six cases, we hypothesized that these regions could contain genes relevant to DSRCT biology, and we performed a functional analysis through IPA.
Regarding chromosome 1, 410 genes were commonly amplified in the four cases (Additional file 2: Table S5). Remarkably, the majority of amplified genes was found to be associated with “Cell movement” (77 amplified genes) and “Cell migration” (68 amplified genes, also present in Cell movement) (both P < 0.001) (Fig. 6b and Additional file 2: Table S6). Among these genes, laminin subunit gamma 2 (LAMC2), paired related homeobox 1 (PRRX1), and regulator of G protein signaling 4 (RGS4) are reported to be greatly involved in the dynamic and reversible MErT/EMT process.
Among the 526 genes mapping on chromosome 6q and lost in half of the cases (Additional file 2: Table S7), the category “cellular assembly and organization, DNA replication, recombination, and repair” most significantly enriched in lost genes was involved in the DDR network with 21 gene members of the histone H1 family and related to formation of nucleosomes (P < 0.001) (Fig. 7b).
Finally, since the short arm of chromosome 6 carries the major histocompatibility complex (Additional file 2: Table S7) which encodes HLA class I antigens, mandatory for tumor cells to be recognized by cytotoxic T cells, the two monosomic cases (DSRCT3 and 7) are expected to show characteristics of immunoescape [32]. Further strength to a deficient immune response in these two cases is given by the loss of the following genes located on short arm of chromosome 6: proteasome subunit beta 8 (PSMB8) and proteasome subunit beta 9 (PSMB9), transporter 1, ATP-binding cassette subfamily B member (TAP1) transporter 2, ATP-binding cassette subfamily B member (TAP2), and TAP-binding protein (TAPBP) associated with antigen presentation; and interferon regulatory factor 4 (IRF4, also named MUM1) recently demonstrated to be fundamental in generation of Type 1 T helper (Th1) response [33].