Boosting Cancer Dataset Performance with Mutual Information-Based Feature Prioritization

Main Article Content

Fung Yuen Chin
Yong Kheng Goh

Abstract

In the field of statistical modelling, mutual information is a crucial and common concept, suitable for tasks such as selecting the most important features or classifying data into different categories. Feature selection addresses the challenge of high-dimensional data in building effective predictive models by identifying relevant attributes while mitigating the curse of dimensionality. Previous studies have benchmarked the effectiveness of statistical models against established results. To enhance this, a new benchmark method is proposed, exploiting ranking features via mutual information scores. Mutual information score is used to understand the relationship between underlying data and variables. The performance of the classification depends on its information content, which directly affects the performance of the statistical model. The technique simultaneously determines the optimal feature quantity to guide the feature selection process. The validation of these selected features is conducted through Z-score graphs. Experimental results show that this method can identify feature subsets better than using the full features. This advance promises to improve cancer analysis, enabling more sophisticated diagnostic and prognostic methods.

Downloads

Download data is not yet available.

Article Details

How to Cite
Chin, F. Y., & Goh, Y. K. (2024). Boosting Cancer Dataset Performance with Mutual Information-Based Feature Prioritization. Journal of Statistical Modeling &Amp; Analytics (JOSMA), 6(1). https://doi.org/10.22452/josma.vol6no1.2
Section
Articles