Machine learning-based meta-analysis of colorectal cancer and inflammatory bowel disease
Files
Date
Authors
Keywords
Degree Level
Advisor
Degree Name
Volume
Issue
Publisher
Abstract
Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Despite extensive research efforts, the mechanism of CRC remains poorly understood, and genetic biomarkers discovered thus far have not provided proper insight into the dynamics of CRC. One reason might be that most analysis methods perform univariate analyses and do not investigate the combination of genes that lead to disease. To fill this gap, we employ SVFS (Singular-Vectors Feature Selection), as well as several other machine learning algorithms, to identify genes associated with CRC. We developed an ensemble classifier model using identified genes to validate our findings and distinguish CRC tumour samples from adjacent normals. We validated our findings on 13 independent datasets and achieved significant results on all of them (correctly diagnosing 1755 cases out of 1807 and 115 controls out of 119). Several identified genes by our methodology have previously been reported to be associated with CRC, while other genes are novel and should be further researched. Furthermore, the same pipeline was applied to. Inflammatory Bowel Disease (IBD) since patients with IBD are at substantial risk of developing CRC. Following significant results on validation sets of IBD using identified genes (correctly 212 IBD cases out of 231 and 51 healthy controls out of 54), we examined IBD-related genes in conjunction with CRC-related genes to gain a better insight into suspected genes. A Python implementation of our pipeline can be accessed publicly at https://github.com/AriaSar/CRCIBD-ML.
