Research Area:  Machine Learning
Objective Ovarian cancer (OC) is one of the most common types of cancer in women. Accurately prediction of benign ovarian tumors (BOT) and OC has important practical value. Methods Our dataset consists of 349 Chinese patients with 49 variables including demographics, blood routine test, general chemistry, and tumor markers. Machine learning Minimum Redundancy – Maximum Relevance (MRMR) feature selection method was applied on the 235 patients data (89 BOT and 146 OC) to select the most relevant features, with which a simple decision tree model was constructed. The model was tested on the rest of 114 patients (89 BOT and 25 OC). The results were compared with the predictions produced by using the risk of ovarian malignancy algorithm (ROMA) and logistic regression model. Results Eight notable features were selected by MRMR, among which two were identified as the top features by the decision tree model: human epididymis protein 4 (HE4) and carcinoembryonic antigen (CEA). Particularly, CEA is a valuable marker for OC prediction in patients with low HE4. The model also yields better prediction result than ROMA. Conclusion Machine learning approaches were able to accurately classify BOT and OC. Our goal is to derive a simple predictive model which also carries a good performance. Using our approach, we obtained a model that consists of just two biomarkers, HE4 and CEA. The model is simple to interpret and outperforms the existing OC prediction methods. It demonstrates that the machine learning approach has good potential in predictive modeling for the complex diseases.
Keywords:  
Author(s) Name:  Mingyang Lu,Zhenjiang Fan,BinXu,LujunChen,Xiao Zheng,Jundong Li,Taieb Znati,QiMi,Jingting Jiang
Journal name:  International Journal of Medical Informatics
Conferrence name:  
Publisher name:  Elsevier
DOI:  10.1016/j.ijmedinf.2020.104195
Volume Information:  Volume 141, September 2020, 104195
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S1386505620302781