### Patients

This is a multicentric population study using data from a high-quality tumour sample registry included in the European cancer registry-based study on survival and care of cancer patients (EUROCARE study) [1]. The data used from this registry corresponded to the period between January 2004 and December 2007. Patients with colon cancer treated with surgery with curative intent and lymphadenectomy, a complete anatomopathological report, and a clear clinical status at their last follow up were included. Patients with cancer of the rectum or caecal appendix, with metastases at diagnosis, scheduled surgery with palliative intention without lymphadenectomy, scheduled surgery without resection, incomplete anatomopathological reports, a dubious vital status at the last follow-up control, and those with insufficient or no monitoring were excluded. The study was approved by the institutional review board of the Hospital General de Castellon (PIC: 2013/2/CIR). All participating patients provided their written informed consent.

### Variables

The study variables were age, sex, tumour location, histology, differentiation grade, and the size, number of analysed LNs, number of positive LNs, TNM classification, condensed T and N stages, chemotherapy, FEP, OS, disease-free survival (DFS), overall recurrence, locoregional recurrence, metastasis, and follow-up time.

Because all the data in the tumour registry is coded according to the sixth edition of the Union for International Cancer Control (UICC) TNM classification, we had to adapt them to the new guidelines for the seventh edition. Thus, although the N category was easily adapted, the T category could not be adapted to the new classification because the tumour registry contained insufficient data. As in other population studies, to minimise the effects of possible misclassifications, we used condensed TNM stages. The recurrence variable included patients who presented locoregional recurrence and those who presented distant metastases.

### FEP

We used a well-known mathematical model based on Bayes’ theorem to calculate the various diagnostic test parameters (sensitivity, specificity, and predictive values). According to Bayes’ theorem, the FEP is the probability of LN involvement (N+) given a negative test result (n−), in other words, p(N+/n−), can be deduced from the following mathematical formula [8]:

$$p(N + \left| {n- } \right.) = \frac{{p(N+)*p(n - \left| {N+} \right.)}}{{\left[ {p(N+)*p(n - \left| {N+} \right.)} \right] + \left[ {p(N- )*p(n - \left| {N- } \right.)} \right]}}$$

In the Bayes’ theorem formula: p(N+) is the prevalence of pN1 cases in the series; p(N−) is the complement to p(N+); p(n−/N+) is probability of a false negative (1 − Sensitivity) and is calculated by obtaining the hypergeometric probability resulting from the consideration of (1) the total LNs analysed from all the patients in the series; (2) the total number of positive LNs obtained in the series; (3) the number of positive LNs in a specific patient (equal to 0 for pN0 cases); and (4) the number of LNs analysed in a specific patient; the Specificity is p(n−/N+) and equals 1 because the presence of false ganglionic positives is considered impossible.

Given that there is a substantially greater probability of patients misclassified as pN0 (because they had an insufficient number of LNs analysed) having pN1 rather than pN2 or pN3 tumours, we decided to calculate the FEP of pN1 incorrectly being classified as pN0. Thus, all the FEP calculations refer to this adjusted FEP, set to pN1. Once the FEP was obtained, Cumulative Sum (CUSUM) [9] curves were used to calculate the optimal cut-off points following the method described by Barrio et al. [10], to obtain three incorrect pN0 classification risk groups.

### Statistical analysis

Quantitative variables are expressed as the mean ± standard deviation (SD). Categorical variables are reported as frequencies and percentages. For the univariate analysis, the Chi square test (or exact Fisher test in small samples) was used to compare two qualitative samples; the Student t-test was used to compare two quantitative samples; and the ANOVA test was used to compare more than two quantitative samples.

The follow-up time we considered was from the date of surgery until the day of death, or the last day of follow-up in patients who did not die. This was because the tumour registry did not contain any clear definition of the date of diagnosis. Survival analysis was performed using the Kaplan–Meier method and the log-rank test was implemented to estimate the differences between groups in terms of OS and DFS. Probability values of *P *< 0.05 were accepted as the statistical significance cut-off level. Statistical analysis was carried out with the IBM SPSS Statistics^{®} program version 22 (IBM^{®}, Armonk, New York, USA). The CUSUM curves were calculated using the STATA^{®} program version 14 (StataCorp LP^{®}, College Station, Texas, USA).