INTERPRETABLE DIMENSIONALITY REDUCTION USING PCA AND DEEP UNDERCOMPLETE AUTOENCODERS FOR BIOMARKER DISCOVERY IN HEAD AND NECK SQUAMOUS CELL CARCINOMA
Main Article Content
Abstract
Background:
Head and neck squamous cell cancer (HNSC) is one of the most common cancers worldwide, and gene expression analysis can help in identifying the key genes involved in its pathogenesis. However, the dimensionality and the complexity of the high throughput cancer datasets make it challenging to identify the most important genes. In this study, we applied principal component analysis (PCA) as a base method of dimension reduction and compared its results with our implemented deep under-complete autoencoder (DUAE) model. Furthermore, clustering analysis was applied to analyze the common trends in the reduced expression data produced by PCA and DUAE to identify the key genes involved in HNSC.
Method:
This study used a head and neck cancer gene expression dataset (TCGA-HNSC) from TCGA (The Cancer Genome Atlas) and applied principal components analysis. We then built a deep under-complete autoencoder (DUAE) to reduce the genes. Next, we applied clustering analysis to identify groups of genes that exhibited similar expression patterns. Finally, we examined the reduced genes, identified through PCA and DUAE, through gene ontology analysis.
Result:
Our data analysis approach, DUAE, was able to significantly reduce the number of genes from 20,503 to 17,932 (12% reduction) as compared to PCA, which only reduced genes from 20,503 to 20,236. The clustering and gene ontology analysis showed that the refined genes were involved in crucial cellular processes such as signal transduction, and gene regulation pathways that may be crucial for HNSC oncogenesis.
Conclusion:
Our study highlights the potential of DUAE as a dimension reduction technique, which coupled with gene expression analysis can be used to identify the most significant genes associated with HNSC oncogenesis. These findings provide a stepping stone for future investigations aimed at uncovering the underlying molecular mechanisms of this disease.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.