Wesley C Warren
Peter J Tonellato
Eduardo J. Simoes
University of Missouri-Columbia
Computer and Information Science and Engineering (CISE)
The COVID-19 pandemic has caused numerous deaths in the United States and around the world. Human genetic information may hold the answers for COVID-19 drug discovery. With the ability to sequence the human genome at low cost, this project aims to democratize genome sequence (GS) analysis on CloudLab (https://cloudlab.us/, an NSF-funded cloud computing research infrastructure) for accelerating the process of finding a cure for COVID-19. As a result, any researcher will be able to study differences in the genetic information of individuals affected by COVID-19 using CloudLab at no charge. This project will investigate efficient computing solutions to perform the data-intensive task of analyzing differences in individuals' genomes. It will also provide training opportunities for students.The project will democratize genome sequence (GS) analysis using CloudLab. Deep insights from genomic information of individuals can be extracted by researchers at scale. This work will lead to improved understanding of how commodity clusters, cloud infrastructure, and open-source software could be designed for large-scale GS storage, processing, and analysis. Specifically, it will result in new algorithms for exhaustive variant analysis (EVA), scheduling strategies for efficient execution of variant analysis tasks, and optimization techniques to speedup EVA and maximize resource utilization in a commodity cluster. It will provide low-level measurement data for network optimization when processing large-scale GS workloads.By empowering researchers with publicly available software tools and computing infrastructure for variant analysis at scale, this project could advance the understanding of how individuals respond to COVID-19 infection by uncovering deep relationships in genomic variants of individuals. It can enable new drug discovery and treatment strategies for COVID-19. A prototype of the tool will be made publicly available for research and education. The findings will be disseminated in the form of publications and software packages. New course modules will be developed; a workshop for high school students will be conducted.The project website is at https://github.com/MU-Data-Science/EVA. This repository will be maintained for 5 years after the completion of the project.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.