COMPUTATIONAL REDISCOVERY OF PROSTATE-SPECIFIC ANTIGEN AS A PAN-CANCER TRANSCRIPTOMIC DISCRIMINATOR: A VARIANCE-BASED ANALYTICAL FRAMEWORK
Main Article Content
Abstract
Background: Tumor gene expression is largely determined by tissue of origin, as shown by TCGA RNA sequencing studies. The majority of computational studies are more towards classification than interpretability, and there is a missing link in the statistical validation of the genes that can contribute to inter-cancer variability. Although KLK3 is a famous tissue-specific biomarker, its formal recognition as the most variable and discriminative gene across the types of cancer has not been shown in a hypothesis-driven and assumption-tested model.
Methods: UCSC Xena provided the gene expression data of 801 tumor samples (PRAD, n=136; BRCA, n=300; LUAD, n=141; KIRC, n=146; COAD, n=78). The variance was computed with 20,531 genes that were cancer-free. Shapiro-Wilk and Levene tests were used to confirm the violation of normality and equal variance, hence the use of Kruskal-Wallis H as the main omnibus test and Dunn as the second test to compare directional PRAD with the Bonferroni correction and Welch t-tests. Sex-linked genes (RPS4Y1, XIST) were deleted.
Results: KLK3 was the most variable gene (variance = 44.76), followed by KLK2 (36.36) and SFTPB (34.50). Kruskal-Wallis confirmed highly significant differential expression (H = 495.12, p = 7.62 × 10⁻¹⁰⁶, η² = 0.617). Welch t-tests (adjusted p < 0.001; Cohen's d: 9.59-22.01) confirmed that KLK3 overexpression was confirmed in PRAD relative to all comparators. Dunn post-hoc had significant results in 9 out of ten of the comparisons; the only not significant one was that of KIRC and LUAD (adjusted p = 1.000).
Conclusion: This is the first assumption-verified computational validation that KLK3 is the most varying and discriminative gene among a five-cancer panel, recovered by a completely unsupervised approach based on variance, which supports interpretable and data-driven frameworks of pan-cancer biomarker discovery and tissue-of-origin classification.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.