NSF
Award Abstract #2441449

CRII: III: Pursuing Interpretability in Utilitarian Online Learning Models

See grant description on NSF site

Program Manager:

Raj Acharya

Active Dates:

Awarded Amount:

$175,000

Investigator(s):

Yi He

Awardee Organization:

College of William and Mary
Virginia

Directorate

Computer and Information Science and Engineering (CISE)

Abstract:

In today's world, the real-time generation of enormous amounts of data has become commonplace, spanning domains such as e-commerce, social media, environmental science, urban disaster and pandemic monitoring, and many others. Such streaming data necessitate data mining (DM) models that can analyze them in time as they emerge, derive actionable insights, and make adjustments on the fly. For instance, predicting crowd movement due to public events (such as concerts, games, parades, and protests) based on data streaming from social media and city sensors can aid in reducing the traffic by steering clear of overcrowded areas. However, as DM models become more prevalent in practice, interpretability has emerged as a vital issue. User comprehension and trust in DM model outputs are critical for their acceptance in daily routines and workflows. Nonetheless, existing research on data streams has focused mainly on model accuracy, producing models that are too complex for human interpretation. This gap between DM researchers and practitioners calls for new research that optimizes model accuracy and interpretability simultaneously. This project aims to bridge the gap by developing novel online algorithms that are transparent to human users and can provide a complete explanation of the logic behind each prediction, earning the trust of human operators and increasing legal defensibility when used to support decision-making in crucial domains such as healthcare, economy, security, and social goods.

The overarching goal of this project is to advance interpretability research of online DM models through three research objectives: (1) understanding the dynamism of varying feature spaces and its impact on model structure; (2) quantifying model prediction uncertainty in the absence of adequate supervision labels; and (3) indexing and elucidating model inference paths. To achieve these objectives, the project will focus on four research thrusts. The first thrust will develop novel algorithms that capture and model the variation patterns of feature spaces through an expository feature correlation graph, allowing for joint learning of graphs and predictive models. The second thrust will focus on developing unsupervised methods to quantify the uncertainty of model predictions and identify geometric manifolds underlying data streams with memory-efficient structures. The third thrust will devise new systems to index, track, and illustrate the complete generation process of online predictions. The fourth thrust will establish evaluation metrics and protocols to standardize interpretability measurement in streaming data contexts. The project aims to contribute to interpretable data mining and machine learning research, which will help bridge the gap between data scientists and domain-specific forecasting experts. The educational component of the project will involve mentoring and educating researchers interested in pursuing DM careers in academia or industry, with a particular focus on underrepresented, financially disadvantaged, or disabled undergraduate students in computer science research. The project will also pioneer new classes at the forefront of data mining research and organize workshops at city libraries to engage with the broader public.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Back to Top