MARY KATHERINE Bradford
$217,865
UNIVERSITY OF SOUTH CAROLINA AT COLUMBIA
South Carolina
National Institute of Allergy and Infectious Diseases (NIAID)
Increasingly there have been reports of persistent symptoms and multi-organ multi-system manifestations (e.g., pulmonary, cardiovascular, renal, and neurological) among individuals who were recovered from the acute phase of COVID-19, denoted as Post-Acute Sequela of SARS-CoV-2 infection (PASC). Given that 76.7 million people are known to have been infected in the US as of February of 2022, millions of people will potentially experience PASC. This projected disease burden will have a profound public health impact with respect to patients' clinical outcomes and US health systems during post-COVID-19 care. Timely identification of individuals with PASC from existing COVID-19 cohorts and newly identified COVID-19 patients is urgently needed for PASC clinics and longitudinal cohort studies on PASC. Building on biomedical informatics methodologies, we propose a high- throughput and semi-supervised Deep Phenotyping approach to identifying individuals with PASC and characterizing their phenotypes. Our approach is based on a Graph representational model constructed based on the South Carolina COVID-19 Cohort (S3C), funded by the National Institute of Allergy and Infectious Diseases (NIAID) (R01A127203-4S1). S3C (n=~1,400, 000 COVID-19 patients by the February of 2022) is a multi-modal data repository consisting of EHR, health systems data, community-based health services data, and claims data, with complete temporal trajectory of every datum at individual-level. Building on top of the Graph model, we will detect phenotypes of candidate PASC patients by using unsupervised clustering algorithms. We will then identify and validate clinically plausible PASC cases and corresponding phenotypes by incorporating clinical evaluation and supervised algorithms. This study will result in a high-throughput algorithm application for identifying and characterizing PASC cases from COVID-19 EHR cohorts. The resulted EHR and machine learning models are interpretable, generalizable, and will form a foundation for testing and implementing in state-wide and national post-COVID clinics/programs.