Research Area:  Machine Learning
Background: The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.
Methods: In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.
Results: We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.
Conclusions: We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.
Keywords:  
Author(s) Name:  Wei Tse Li, Jiayan Ma, Neil Shende, Grant Castaneda, Jaideep Chakladar,Joseph C Tsai, Lauren Apostol, Christine O Honda, Jingyue Xu , Lindsay M Wong, Tianyi Zhang, Abby Lee, Aditi Gnanasekar, Thomas K Honda, Selena Z Kuo, Michael Andrew Yu, Eric Y Chang, Mahadevan Raj Rajasekaran, Weg M Ongkeko
Journal name:  BMC Med Inform Decis Mak
Conferrence name:  
Publisher name:  PubMed
DOI:  10.1186/s12911-020-01266-z
Volume Information:  Volume 20, Issue (1)
Paper Link:   https://pubmed.ncbi.nlm.nih.gov/32993652/