QI Duan
$343,314
Yale University
Connecticut
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Privacy and security of personal information has become one of the major grand challenges in modern society, especially for healthcare studies. Re-identification risks and data breaches require new policies and regulations for data sharing across healthcare institutions and research laboratories. While policy cannot solve the problem on its own, advanced technologies that work hand in hand with policy are important to address the privacy/security concerns. Predictive analytics can support quality improvement, clinical research, and eventually impact patient health status. Extensive clinical variable information and voluminous data records from multiple institutions and laboratories are necessary to further improve the performance of modeling approaches and to identify medication-outcome associations for diseases. Nonetheless, the transfer of such sensitive data among institutions/laboratories can present serious privacy risks, which can jeopardize NIH’s mission. Aiming at mitigating the privacy problem while increasing predictive capability via cross-institutional modeling, prior studies proposed distributed methods to exchange only the predictive models, but not patient data. However, these methods still pose many challenges to the clinical cross-institutional learning problem, including the need for more comprehensive clinical variables and more patient records to achieve better prediction discrimination and build more generalizable models, the necessity for discovery/alleviation of data manipulation to increase the trustworthiness of the collaboratively trained models, and the requirement for more validation to ensure usability. In this proposal, we plan to develop SOCAL (Privacy-protecting Sharing Of Clinical data Across Laboratories), a distributed framework addressing these challenges by integrating vertical/horizontal modeling methods to include both more complete variables and more records, discovering/alleviating data manipulation incidents using models recorded on blockchain, and conducting controlled experiments and designing/testing a web portal with physician-researchers to increase the usability of the system. SOCAL will be evaluated on a Coronavirus Disease 2019 (COVID-19) dataset from five University of California (UC) Health medical centers. We expect the knowledge/capability of collaborative modeling can be improved, the trustworthiness of the learning process can be enhanced, and the framework will be ready for use. SOCAL is innovative because it will be a new integration methodology for vertical/horizontal modeling, a novel data manipulation resisting methods, and a hardened prototype for a practical blockchain application. We anticipate a powerful impact of the SOCAL framework to largely reduce the privacy concerns of predictive modeling tasks for various stakeholders, including healthcare providers, clinical researchers, and patients. Upon completion, SOCAL can accelerate the development of methods/technologies to increase willingness of institutions to participate in such a collaboration for improving the effectiveness of healthcare.