Improving AI models’ ability to explain their predictions | MIT News

Improving AI models’ ability to explain their predictions | MIT News

Improving AI models’ ability to explain their predictions | MIT News

https://news.mit.edu/2026/improving-ai-models-ability-explain-predictions-0309

Publish Date: 2026-03-09 00:00:00

Source Domain: news.mit.edu

In high-stakes settings like medical diagnostics, users often want to know what led a computer vision model to make a certain prediction, so they can determine whether to trust its output.

Concept bottleneck modeling is one method that enables artificial intelligence systems to explain their decision-making process. These methods force a deep-learning model to use a set of concepts, which can be understood by humans, to make a prediction. In new research, MIT computer scientists developed a method that coaxes the model to achieve better accuracy and clearer, more concise explanations.

The concepts the model uses are usually defined in advance by human experts. For instance, a clinician could suggest the use of concepts like “clustered brown dots” and “variegated pigmentation” to predict that a medical image shows melanoma.

But previously defined concepts could be irrelevant or lack sufficient detail for a specific task, reducing the model’s accuracy. The new method extracts concepts the model has already learned while it was trained to perform that particular task, and forces the model to use those, producing better explanations than standard concept bottleneck models.

The approach utilizes a pair of specialized machine-learning models that automatically extract knowledge from a target model and translate it into plain-language concepts. In the end, their technique can convert any pretrained computer vision model into one that can use concepts to explain its reasoning.

“In a sense, we want to be able to read the minds of these computer vision models. A concept bottleneck model is one way for users to tell what the model is thinking and why it made a certain prediction. Because our method uses better concepts, it can lead to higher accuracy and ultimately improve the accountability of black-box AI models,” says lead author Antonio De Santis, a graduate student at Polytechnic University of Milan who completed this research while…

Source