A Breast Cancer Diagnosis Based on Missing Value Imputation and REP Tree Technique
Breast Cancer Diagnosis
Keywords:
Breast cancer\, missing values, REP treeAbstract
Missing information or worth in a dataset can influence the execution of a classifier, which prompts trouble in separating useful data from datasets. The principal objective of our work is managing missing qualities by utilizing ascription strategies for it, talking about their ease of use, and discussing their relevance in the case of information sets with the goal of being more reasonable for information mining investigation and the arrangement. This paper likewise talks about the "REP Tree" information mining approach that has been used for breast cancer analysis in the wake of a preprocessing step (dealing with missing values), with the ultimate goal of improving the order of information (diagnosis) or conclusion. Experimental results proved that all used algorithms are efficient for dealing with missing values in the data set and diagnosis. We conducted an analysis of the Wisconsin dataset from UCI machine learning with the aim of developing accurate breast cancer prediction models using data mining techniques. The results of the proposed system demonstrated that data distortion can be reduced while classification dataset accuracy remains high. DI imputation was shown to be a superior strategy for impute missing values with accuracy 93%, thus the system successfully achieves its requirements.