Early Detection of Breast Cancer: Comparative Analysis of Machine Learning and Deep Learning Algorithms
Jashanpreet Singh1, Syed Raahat2, Aditya Sharma3, Sourabh Dellu4, Saksham Arora5
1Jashanpreet Singh, Department of Computer Science and Engineering, Chandigarh University, Chandigarh (Punjab), India.
2Syed Raahat, Department of Computer Science and Engineering, Chandigarh University, Chandigarh (Punjab), India.
3Aditya Sharma, Department of Computer Science and Engineering, Chandigarh University, Chandigarh (Punjab), India.
4Sourabh Dellu, Department of Computer Science and Engineering, Chandigarh University, Chandigarh (Punjab), India.
5Saksham Arora, Department of Computer Science and Engineering, Chandigarh University, Chandigarh (Punjab), India.
Manuscript received on 07 November 2024 | First Revised Manuscript received on 14 November 2024 | Second Revised Manuscript received on 01 December 2024 | Manuscript Accepted on 15 January 2025 | Manuscript published on 30 January 2025 | PP: 1-8 | Volume-5 Issue-2, January 2025 | Retrieval Number: 100.1/ijpmh.B104705020125 | DOI: 10.54105/ijpmh.B1047.05020125
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | SSRN | Indexing and Abstracting
© The Authors. Published by Lattice Science Publication (LSP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Breast cancer classification remains a critical challenge in medical diagnostics due to the imbalanced nature of available datasets, where the minority (cancerous malignant) class is often overshadowed by the majority (benign) class. This study proposes a hybrid model based on logistic regression, enhanced with class balancing techniques and ant search optimization, to improve the identification of the malignant class. The model is compared with SVM, Random Forest, and KNearest Neighbors (KNN) across three stages: prediction before diagnosis, at diagnosis and therapy, and post-treatment outcomes. The experiments, conducted on the Jupyter platform using the Wisconsin breast cancer dataset, demonstrate that the hybrid model achieves a high accuracy of 92.98%, significantly reducing false negatives. The study highlights the strengths of logistic regression in providing interpretable results, crucial for clinical decision-making, especially when compared to more complex models like Artificial Neural Networks (ANN). This research offers a reliable and accurate tool for early breast cancer detection and prognosis, contributing to ongoing efforts to enhance patient outcomes through the integration of hybrid machine learning models in medical diagnostics.
Keywords: Breast Cancer, Wisconsin Breast Cancer Diagnosis (WBCD) Dataset, Machine Learning, Naïve Bayes Algorithm (NB), Support Vector Machine, Random Forests, Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Logistic Regression (LR).
Scope of the Article: Public Health