Cell culture
The human breast cancer cell line MDA-MB-231 was obtained from the American Type Culture Collection (ATCC). The cell line was authenticated at ATCC before purchase by the standard short tandem repeat DNA typing and cultured in its standard medium as recommended by ATCC.
Sphere formation assay
The single-cell suspension was obtained by trypsinization. Clumped cells were excluded with a 40-μm sieve. Single cells were plated in ultra-low attachment 6-well plates (Corning, NY, USA) at a low density (1000 cells/well). The cells were maintained in serum-free Dulbecco’s modified eagle medium/nutrient mixture F12 (DMEM/F12, Gibco, Waltham, MA, USA) supplemented with B27 (Invitrogen, Waltham, MA, USA), 20 ng/mL epidermal growth factor (EGF, Sigma, Darmstadt, Germany), and 20 ng/mL basic fibroblast growth factor (bFGF, BD Biosciences, Franklin Lakes, NJ, USA) for 7–10 days. For sphere passage, the spheres were collected by centrifugation (1000 rpm, 5 min), dissociated with trypsin-ethylene diamine tetraacetic acid (EDTA), and mechanically dispersed. The resulting single cells were then centrifuged (1000 rpm, 5 min) to remove the enzyme and re-suspended in serum-free medium. The spheres were passed every 7–10 days, and only spheres bigger than 50 μm in diameter were included in the analyses.
Serial sphere formation assay
The single-cell suspension of MDA-MB-231 cells was obtained by trypsinization. Cells were seeded in an ultra-low attachment 96-well plate (1 cell/well). The cells were maintained in serum-free DMEM/F12 supplemented with B27, and 20 ng/mL EGF, 20 ng/mL bFGF. Only wells that initially contained a single cell were used for subsequent studies. For sphere passage, the single cell-derived sphere was sucked up by a micro-pipette, dissociated with a small amount of trypsin-EDTA, and mechanically dispersed. The resulting single cells were then re-suspended in serum-free medium. The spheres were passed every 7–10 days, and only spheres bigger than 50 μm in diameter were included in the analyses.
ALDEFLUOR assay by fluorescence activated cell sorting (FACS)
The ALDEFLUOR kit (STEMCELL, Vancouver, British Columbia, Canada, Cat. 01700) was used for isolating the cell population with high aldehyde dehydrogenase (ALDH) enzymatic activity. Cells were suspended in an ALDEFLUOR assay buffer containing ALDH substrate bodipy aminoacetaldehyde (BAAA, 1 mol/L per 1 × 106 cells) and incubated for 45 min at 37 °C. As negative controls, for each example of cells, an aliquot was treated with 50 mmol/L diethylaminobenzaldehyde (DEAB), a specific ALDH inhibitor. The ALDH-positive subpopulation was isolated by FACS.
Transwell invasion assay
Parental cells (MDA-MB-231) were obtained by trypsinization and resuspended in pure DMEM. Spheres bigger than 50 μm were obtained through a 40-μm cell strainer (Meilun Biotechnology, Dalian, Liaoning, China). Then the spheres were centrifuged (1000 rpm, 5 min), dissociated with trypsin-EDTA, and mechanically dispersed in pure DMEM. For every chamber, 30,000 cells were placed onto 1% matrigel (BD Biosciences)-coated membrane in the upper chamber (24-well insert, 8 μm, Corning, Cat. 3422). Medium with 10% fetal bovine serum (FBS, Gibco) was used as an attractant in the lower chamber. After being incubated for 24 h, cells that invaded through the membrane were fixed with 4% paraformaldehyde and stained with 0.1% crystal violet. The stained cell images were captured under a microscope (Olympus, Tokyo, Japan), and cells were counted for five random fields at ×10 magnification. Results are presented as mean ± standard deviation from at least three independent experiments.
RNA extraction, reverse transcription-PCR, and quantitative real-time PCR
Total RNA was extracted by using TRIzol reagent (Life Technologies, Waltham, MA, USA). cDNA was generated by reverse transcription-PCR using EasyScript One-Step gDNA Removal and cDNA Synthesis SuperMix Kit (TransGen Biotech, Beijing, China) according to the manufacturer’s instructions. Quantitative real-time polymerase chain reaction (qRT-PCR) was performed by using the chamQ Universal SYBR qPCR Master Mix (Vazyme, Najing, Jiangsu, China) in a MX3000p cycler (Stratagene, La Jolla, CA, USA). Changes of mRNA levels were detected by the 2−ΔΔCT method using Actin for internal crossing normalization. Detailed primer sequences for qRT-PCR are listed in Additional file 1: Table S1.
Western blot analysis
Samples were lysed on ice in RIPA buffer (50 mmol/L Tris [pH 8.0], 150 mmol/L sodium chloride, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate, and 1% NP-40) supplemented with protease inhibitors [1 mmol/L Na3VO4, 1 μg/mL leupeptin, and 1 mmol/L phenylmethanesulfonyl fluoride (PMSF)]. The protein concentration was detected by the Coomassie brilliant blue dye method. In all, equal amounts of protein per lane were run in 10% sodium dodecyl sulfate-polyacrylamide electrophoresis gels and subsequently transferred to a nitrocellulose membrane (Millipore, Darmstadt, Germany) via submerged transfer. After blocking the membrane with 5% milk at room temperature for 1 h, the membrane was incubated overnight at 4 °C with various primary antibodies. After incubation with peroxidase-conjugated secondary antibodies (Thermo Scientific, Waltham, MA, USA) for 1 h at room temperature, the signals were visualized using an enhanced chemiluminescence western blot detection kit (K-12045-D50; Apgbio, Beijing, China) according to the manufacturer’s instructions. The blots were developed using the Bio-Rad Molecular Imager instrument (Bio-Rad, Berkeley, CA, USA). The information regarding the antibodies used are listed as follows: mouse anti-human monoclonal ACTB antibody (Proteintech, Chicago, IL, USA, 66009-1-Ig), rabbit anti-human monoclonal NANOG antibody (Abcam, Cambridge, England, ab109250), mouse anti-human monoclonal SOX2 antibody (Santa Cruz Biotechnology, Santa Cruz, CA, USA, sc-365823).
Whole-genome sequencing and data processing in bulk cells
DNA extraction, library preparation, and sequencing
The genomic DNA of bulk cells (1 × 106 cells) was extracted using the ALLPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany, Cat. 80204) according to the manufacturer’s manual. Quantified 50 ng genomic DNA was used to prepare the paired-end library using the TruePrep DNA library Prep Kit V2 for Illumina (Vazyme, Cat. TD-501). The quality and concentration of DNA fragments in the DNA libraries generated were assessed using High-Sensitivity Bioanalyzer (Agilent, Santa Clara, CA, USA). The prepared library was then subjected to Illumina HiSeqXten Sequencer (San Diego, CA, USA) with the paired-end 150 bp read option.
Reads mapping and variants calling
The Feb. 2009 human reference sequence (GRCh37) was used in this study, and it was produced by the Genome Reference Consortium. BWA MEM (version 0.7.12) was used to align all paired-end reads to the Hg 19 reference genome with default parameters. We performed base quality score recalibration and local realignment using the Genome Analysis Toolkit (GATK, version 3.6). The duplicated reads were marked using the function “MarkDuplicates” of Picard Tools (version 1.126) and then removed. Following this, variants were called using SAMtools mpileup with the following parameters: -Q 30 –q 10. The variants [single nucleotide variant (SNV) and indel] were identified by VarScan (version 2.3.7) mpileup2cns with the following parameters: --min-coverage 2 --min-reads2 1 --variants 1 --P value 0.05 --min-var-freq 0.1.
Variant filtering
In each sample [the parental cells (2D) and derived spheres of the first generation (SP1) and fourth generation (SP4)], putative SNVs and indels were filtered with the following criteria: (1) calls falling on the mitochondria genome, Y chromosome, unknown chromosome, genomic SuperDups, and RepeatMasker regions (available on the download page at the University of California Santa Cruz website (http://www.genome.ucsc.edu/) were removed and (2) depth range from 10 to 200. Finally, we obtained 1,628,063 variant sites, including SNV and indel, existing in at least one sample.
Further variant selection for hotspot calling
We assumed that sample 2D possess the lowest proportion of BCSCs, while SP4 the highest; then, the proportion of the cell population carrying variants were increased from 2D to SP4, which was quantified by variant allele frequency (VAF). Thus, to select significantly increased sites based on VAF between 2D and SP4, we performed the Fisher’s exact test on the read counts supporting the reference and variation in each site. A total of 30,797 sites were considered significantly increased from 2D to SP4 and were selected for calling hotspots following the two conditions: P value less than 0.1 and the VAF of 2D less than the VAF of SP4.
Target deep DNA sequencing and data processing in bulk cells
Amplicon primer design
We selected 54 hotspots for the multi-PCR target validation design. ION AmpliSeq Designer (http://www.ampliseq.com) was able to successfully design amplicon primers for 97% of the targets. For hotspots with a length less than 500 bp, amplicon primers were designed to cover the whole regions, otherwise, amplicon primers were designed covering the SNV sites in the hotspots. According to this principle, we designed 128 amplicons to target these 54 hotspots (Additional file 2: Table S2).
Library preparation and sequencing for amplicons
The detailed protocol of target deep DNA sequencing was as follows:
-
A.
Multiplex PCR amplification: The 128 amplicons were amplified by multiplex PCR on Veriti 96-well Thermal Cycler (Applied Biosystems, Waltham, MA, USA), which was performed using 30 ng genomic DNA, 15 μL Primer mix/pool (2 pools in total), 10 μL Q5 reaction buffer (NEB, Ipswich, MA, USA, Cat. B9027S), 10 μL Q5 high GC enhancer (NEB, Cat. B9028A), 1.5 μL dNTPs mix (NEB, Cat. N0447S), 0.5 μL Q5 high-fidelity DNA polymerase (NEB, Cat. M0491L), and ddH2O to make the final reaction volume to 50 μL. The reaction system was incubated initially at 98 °C for 30 s. Fifteen cycles of PCR were performed at 98 °C for 10 s and 62 °C for 4 min. Then, the reaction was held at 4 °C.
-
B.
Column purification of PCR product: All PCR products were purified using DNA Purification Kit (TIANGEN, Beijing, China, Cat. DP214-03).
-
C.
End repair and A-tailing of DNA fragments: The mixture of 37.5 μL DNA, 5 μL Cut Smart (NEB, Cat. B7204S), 5 μL Adenosine 5′-Triphosphate (NEB, Cat. P0756L), 0.5 μL of 100 mmol/L dATP solution (NEB, Cat. N0440S), 1 μL T4 Polynucleotide Kinase (NEB, Cat. M0201L) and 1 μL 5 units Klenow exo-DNA polymerase (NEB, Cat. M0212L) was incubated at 37 °C for 1 h on Veriti 96-well Thermal Cycler.
-
D.
Column purification: The PCR products were purified with Universal DNA Purification Kit (TIANGEN, Cat. DP214-03).
-
E.
Adapter ligation: The mixture of 25 μL A-tailed DNA, 1 μL of 50 μmol/L multiplexing adapter, 3 μL of 10× T4 DNA ligase buffer (NEB, Cat. B0202), and 1 μL of 400 units/μL T4 DNA ligase (NEB, Cat. M0202L) was used in adapter ligation step followed by incubation at 16 °C overnight.
-
F.
Ampure cleanup of adapter-ligated reaction: We added 1 × volume (30 μL) of Agencourt AMPure XP DNA beads (BECKMAN, Brea, CA, USA, Cat. 15604000) and incubated at room temperature for 5–10 min and then placed on magnetic stand. We discarded the supernatant which contained primer dimers. Beads were washed twice with 200 μL of 80% ethanol for 30 s at room temperature and dried at room temperature for 5–10 min. Then, 25 μL Buffer EB was added to the beads, mixed up and down for ten times, incubated for 2 min at room temperature, and put on magnetic stand at room temperature for about 5 min. After that, 22 μL of supernatant was transferred to a new PCR tube.
-
G.
PCR amplification: PCR enrichment was conducted by using 22 μL Adapter-ligated DNA, 25 μL of 2 × NEB Next high-fidelity PCR master buffer (NEB, Cat. M0541L), 1.5 μL of 10 μmol/L MUP primer, 1.5 μL of 10 μmol/L barcode primer. The reaction was incubated initially at 98 °C for 3 min. Fifteen cycles of PCR were performed at 98 °C for 20 s, 65 °C for 15 s, and 72 °C for 20 s. The reaction was then held at 4 °C.
-
H.
Extraction and purification of the final library: electrophoresis was conducted in agarose gel with the PCR products from the last step and then extracting and purifying DNA from agarose gel using the gel extraction kit (TIANGEN, Cat. DP214-03).
-
I.
Quality control and sequencing of the final library: The quality and concentration of DNA fragments in the DNA libraries generated were assessed using High-Sensitivity Bioanalyzer. The prepared library was then subjected to Illumina HiSeqXten with the paired-end 150 bp read option.
Data processing
The sequence of primer regions was first trimmed off from the fastq data by PrimerTrim. (available at http://github.com/DMU-lilab). BWA MEM (version 0.7.12) was then used to align primer-removed reads to the Hg 19 reference genome with default parameters. We removed secondary alignments and alternative hits by SAMtools with the following parameters: x XA –F 0x100. Then, variants were called using SAMtools mpileup with the following parameters: -Q 30 -q 10. The variants (SNV and indel) were identified by VarScan (version 2.3.7) mpileup2cns with the following parameters: --min-coverage 60 --min-reads2 1 --variants 1 --P value 0.05 --min-var-freq 0.1.
Single-cell target deep DNA sequencing
Characterization of the cancer hotspot mutation (CHM) panel and amplicon design
The targeted genes and mutations are listed in Additional file 3: Table S3, including the whole exonic region of 48 cancer hotspot genes (from 50 cancer hotspot genes identified by the Mayo Clinic) and 1513 mutations. Of the 1513 mutations, 224 were identified from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) data, and 1286 were identified from the Catalogue Of Somatic Mutations In Cancer (COSMIC, https://cancer.sanger.ac.uk/cosmic/) data, and 3 were identified from both databases. ION AmpliSeq Designer (Thermo Fisher, http://www.ampliseq.com) was used to design 128 amplicons covering the WGS hotspot mutation (WHM) panel (Additional file 2: Table S2) and 3124 amplicons covering the hotspots in the CHM panel (Additional file 4: Table S4).
Single-cell isolation, genomic DNA extraction, and multiple displacement amplification (MDA)
Single cells or single spheres were sucked up by a micro-pipette. Whole genome amplification (WGA) was performed to these cells using the Discover-sc Single Cell Kit (Vazyme, Cat. N601-01) according to the manufacturer’s manual, and a reaction of human tissue genomic DNA was marked as a positive control. The amplified DNA products were then stored at − 20 °C.
Quantification and genome-integrity assessment of WGA products
The DNA concentration of the WGA products was measured using the Qubit Quantitation platform (Life Technologies, Invitrogen, Waltham, MA, USA). Ten housekeeping genes located on different chromosomes were selected to check the coverage of amplified products. WGA products of best performance in relation to housekeeping PCR (> 8/10) and Qubit assays (> 60 ng/μL) were selected for downstream experiments. All of the above steps were performed with a sample of genomic DNA from human tissue as a positive control.
Library preparation and sequencing for amplicons
Target regions were amplified by multiplex PCR in WGA products. Choosing the correct number of cycles for the multiplex PCR is critical based on the starting amount and coverage of WGA products. The quality and concentration of the DNA libraries generated was assessed using High-Sensitivity Bioanalyzer. The prepared target libraries were then subjected to Illumina HiseqXten with the paired-end 150 bp read option.
Single-cell RNA sequencing (scRNA-seq)
Generation of scRNA-seq libraries
The generation of single-cell cDNA libraries was implemented by the Discover-sc WTA Kit (Vazyme, Cat. N711-01) according to the manufacturer’s manual for single cell-derived spheres and single cells. Quantified 1 ng amplified cDNA was then used to prepare the paired-end library using TruePrep DNA library Prep Kit V2 for Illumina (Vazyme, Cat. TD-503). The quality and concentration of DNA fragments in the cDNA libraries generated was assessed using High-Sensitivity Bioanalyzer. Massively parallel RNA sequencing (RNA-seq) was performed on the Illumina HiSeqXten platform with paired-end 150-bp read-length by Berry Genomic Corporation (Beijing, China).
Data analysis
TopHat2 was used to align reads according to the University of California Santa Cruz hg19 reference genome, and the corresponding gene annotation format file from GENCODE was fed to the TopHat2 for defining transcript coordinates. Gene-level expression abundance (fragments per kilobase of exon per million fragments mapped) and the results of differential gene expression analysis were obtained from the Cufflinks package. By the comparison between BCSCs and non-BCSCs, we identified the differentially expressed gene set according to the fold change (FC) and false discovery rate (FDR). Genes with FDR < 0.05 and log2-transformed FC > 1 were considered to be highly expressed in BCSCs.
Gene set enrichment analysis
The gene expression dataset containing gene symbols and gene expression values of 8 single-cell RNA-seq samples were submitted to gene set enrichment analysis (GSEA) (version v2.2.1) software [14] according to the GSEA user guide. GSEA was performed with the gene set of MSigDB: Gene Ontology Biological Process. The nominal P value < 0.01 and FDR < 0.25 were used to investigate significantly enriched gene sets.
Kaplan–Meier (KM) survival analysis
To investigate the association between BCSC highly expressed genes and patient survival, we evaluated the relapse-free survival after surgery in all patients available in the Kaplan–Meier plotter online database [15] (http://kmplot.com/analysis/index.php?p=background). A user-selected probe set was chosen, and patients were grouped according to the optimized cut-off.
Interaction analysis in the STRING database
A newly identified BCSC marker gene set was submitted to the STRING database [16] (https://string-db.org/cgi) to identify associated genes and pathways. Interactions including curated databases and experimentally determined gene neighborhoods, gene fusions, gene co-occurrence, text mining, co-expression, and protein homology were investigated.
Survival analysis in pan-cancer
The table of survival z scores collapsed by cancer/cancer subtype was downloaded from the PREdiction of Clinical Outcomes from Genomic Profiles (PRECOG) database (https://precog.stanford.edu/index.php) [17]. For the z scores of the BCSC marker gene set in pan-cancer, we made a hierarchical clustering analysis in R (version 3.3.2) using the hclust function (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html).
Biomarker validation by SurvExpress
Nine breast cancer relapse datasets were analyzed for the BCSC marker gene set in the SurvExpress online database [18] (http://bioinformatica.mty.itesm.mx:8080/Biomatec/SurvivaX.jsp). A Cox regression model was used to generate 2 risk groups by splitting the samples at the median after ranking by their prognostic index, which were estimated using beta coefficients multiplied by gene expression values. The box plot obtained as the results of SurvExpress visualized the expression levels of each gene in the risk groups generated. The P value was obtained from a t test for two groups.
Analysis of differential gene expression between cancer and normal tissues from the TCGA dataset
Gene expression data of 38 cancer types were downloaded from FireBrowse (http://firebrowse.org). To ensure sufficient statistical power, the number of either normal or cancer samples was at least 5, and 22 of 38 cancer types meet the requirement and were used for the following analysis. Differential gene expression between normal and cancer samples were evaluated by t test.
Statistical analyses
Hotspot calling
After potential BCSC-associated SNV sites were identified by WGS in bulk cells, a statistical model was established to call hotspot regions where variants (30,797 variants in all) were densely distributed. When the mutations in a given length of DNA were considered as Poisson distributed, and the distance between two adjacent mutations followed an exponential distribution (Additional file 7: Fig. S1a). Then, the probability of a given distance could be calculated as follows:
$$P\left( {\text{x}} \right) = 1 - e^{{ -\uplambda{\text{x}}}} \left( {\uplambda = \frac{1}{{\bar{x}}}, {\text{x}} > 0} \right)$$
where x refers to the distance between two adjacent SNV sites and \(\bar{x}\) refers to the average distance of all two adjacent SNV sites in the genome.
Thereafter, hotspot regions were detected using Run-length encoding (RLE) algorithm (Additional file 7: Fig. S1b). Hotspots with length longer than 100,000 bp were removed, and 54 hotspots were obtained (Additional file 5: Table S5). Specifically, the median length of the hotspots was 318 bp, and most hotspots overlapped with intronic and intergenic regions (Additional file 7: Fig. S1c and S1d). Subsequently, we calculated the P value evaluating the significance of the hotspot regions as follows:
$$P\left( {X \ge m} \right) = 1 - P\left( {X < m} \right) = 1 - \mathop \sum \limits_{i = 0}^{m - 1} \frac{{\left( {\begin{array}{*{20}c} b \\ i \\ \end{array} } \right) \times \left( {\begin{array}{*{20}c} {a - b} \\ {n + m - i} \\ \end{array} } \right)}}{{\left( {\begin{array}{*{20}c} a \\ {n + m} \\ \end{array} } \right)}}$$
where a stands for the total number of distances of two adjacent SNV sites, b stands for the number of distances whose exponential distribution P value (Pexp) < 0.01 (in our case a = 30,774, b = 4003), n stands for the number of distances allowed within a hotspot whose Pexp ≥ 0.01, and m stands for the number of distances whose Pexp < 0.01. X is the random variable representing the number of distances with Pexp < 0.01 in a region. Under the minimum requirement (m = 5, n = 1) of our hotspot-calling algorithm, P was equal to 1.988 × 10−4.
Computation of the genetic distance between every two samples (single cells)
At a given position, the genetic distance between C1 and C2 was exemplified as follows. We defined C1 = (C1WA, C1WT, C1WG, C1WC) and C2 = (C2WA, C2WT, C2WG, C2WC), where C1WA refers to the weight (i.e., proportion) of read counts supporting base “A” of sample C1, and C2WT refers to the weight of read counts supporting base “T” of sample C2, as an analogy). Then, the genetic distance (d) between C1 and C2 was calculated by the Pythagorean formula:
$$d\left( {C1,C2} \right) = \sqrt {\left( {\mathop \sum \limits_{n = A,T,G,C} \left( {C_{1} W_{n} - C_{2} W_{n} } \right)} \right)^{2} }$$
Analysis of base-position differences between BCSCs and non-BCSCs (single cells)
-
Step 1. Acquiring for binary alignment (BAM) files.
BAM files were generated using the pipeline identical to the target deep DNA sequencing of bulk cells, as described under “Data processing” in the subsection “Target deep DNA sequencing and data processing in bulk cells”.
-
Step 2. Count data.
Nucleotides were counted from recalibrated BAM files using Rsamtools (http://bioconductor.org/packages/release/bioc/html/Rsamtools.html). Only positions in the target regions and covered by all samples were kept for further analysis. Here, the following was denoted for the considered position:
Total count (TC) = sum of A, T, C, and G counts
Major count (MC) = the highest nucleotide count
Background count (BC) = TC − MC
Subscript c = BCSCs
Subscript n = non-BCSCs
-
Step 3. Position error rate [19] (PER) of non-BCSCs (the two single cells).
At each position, we estimated the PER of non-BCSCs as follows:
$$PER_{n} = \frac{{\sum BC_{n} }}{{\sum TC_{n} }}$$
-
Step 4. Binomial analysis of BCSCs (the three spheres).
At each base position, we calculated the probability of BCc (PBC) from a binomial distribution with the parameter PER (corrected). PBC represents the statistical probability to observe the specific number of background allele at a position. PER was obtained as in Step 3 from the two non-BCSC single cells. The following is the PBC calculation formula:
$$P = {\text{binom}} . {\text{test}} \left( {\frac{{\sum BC_{c} }}{3},\frac{{\sum TC_{c} }}{3},PER_{n} + c\sqrt {\frac{{PER_{n} }}{{TC_{n} }}} ,{\text{alternative}} = {\text{"greater"}} }\right)$$
where c represents the quantile of order 1-alpha of the standard Gaussian, with c equaling to 1.64 (quantile of order 95% for the Gaussian) Binomial test was the R function to obtain an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment (https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/binom.test).
-
Step 5. Permutation.
The case group denoted the group of 3 BCSCs versus 2 non-BCSCs, and the other 9 (i.e., C
25
− 1) random arrangements were defined as permutation groups (Additional file 6: Table S6). The general approach for calculating the P values of permutation groups was the same as in Step 4.
-
Step 6. Case-permutation ratio (CPR).
After the computation of the P value of each position in each group, we counted the number of positions (NP) with P value less than the specific P value threshold (ranging from 0 to 0.1) in each group. To assess the difference in NP between the case group and each permutation group, we defined CPR as follows:
$${\text{CPR}} = \frac{{NP_{case} }}{{NP_{permutation} + NP_{case} }}$$
Pearson correlation analysis, Fisher’s exact test, and Student’s t test
Pearson correlation analysis and Fisher’s exact test were performed in R-2.3.2 using “cor()” and “fisher.test()” command, respectively. Student’s t test was performed in GraphPad Prism 5.0 (GraphPad Software, La Jolla, CA, USA).