Lily Weng, assistant professor at the Halıcıoğlu Data Science Institute (HDSI), part of the School of Computing, Information and Data Sciences (SCIDS) at UC San Diego, recently presented her team’s work at the NeurIPS 2024 conference in Vancouver. She led a team that developed a powerful new framework that could dramatically improve the transparency and performance of artificial intelligence systems. The new method, called Vision-Language-Guided Concept Bottleneck Model (VLG-CBM), uses a unique combination of visual recognition and language understanding to help AI not only make better predictions, but also explain them in a way humans can understand. To conduct this work, the team used U.S. National Science Foundation (NSF) ACCESS allocations on the Expanse system at the SCIDS San Diego Supercomputer Center (SDSC) and Delta at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.
“As deep neural networks become more common in real-world applications — from healthcare to autonomous vehicles — the need for interpretable AI is more urgent than ever,” Weng said. “Traditional models often operate as ‘black boxes’ – producing highly accurate results without offering any insight into how or why decisions were made.”
She said that this lack of transparency has raised ethical and practical concerns — especially when AI is used in sensitive or high-stakes settings. To address this issue, earlier approaches introduced Concept Bottleneck Models (CBMs), which force the AI to make decisions based on human-understandable concepts. But these models still face two key challenges: inaccurate concept explanations and unintended “information leakage,” where models may rely on hidden patterns unrelated to the intended concepts.
Weng worked with two UC San Diego graduate students in HDSI and CSE — Divyansh Srivastava and Ge Yan — to create the team’s new VLG-CBM model, which tackles both issues head-on.

“We used our NSF ACCESS allocations on NCSA’s Delta and SDSC’s Expanse to combine vision-language technology with grounded object detection to create much more accurate and faithful concept predictions,” Weng said. “Using a technique that maps language-based concepts to specific visual elements in images, our model ensures that its reasoning matches what a human would actually see.”
To do this, VLG-CBM first generates a list of possible concepts using large language models, then confirms those concepts visually using an object detection model called Grounding-DINO. This approach helps the model stay focused on what is actually present in the image instead of guesswork from unrelated data.
To combat information leakage, Weng’s team first conducted a theoretical analysis to understand the root causes of concept leakage. Building on this insight, they introduced a new evaluation metric called Number of Effective Concepts (NEC) – the first metric to effectively and fairly compare different CBMs.
“NEC tracks how many concepts a model actually relies on to make its predictions,” Weng said. “By limiting this number, researchers can force the model to reason more like a human — using only relevant, understandable information — and not hidden data correlations.”
We used our NSF ACCESS allocations on NCSA’s Delta and SDSC’s Expanse to combine vision-language technology with grounded object detection to create much more accurate and faithful concept predictions.
-Lily Weng, assistant professor, Halıcıoğlu Data Science Institute (HDSI)
A related metric, Accuracy at NEC (ANEC), is also proposed to help gauge the model’s performance when using fewer concepts.
In extensive tests across five benchmark datasets, VLG-CBM outperformed existing models by wide margins — up to 51 percent better in some cases — while maintaining strong interpretability.
“This research presented at NeurIPS offers a big step forward in creating AI systems that are not only smart but also trustworthy,” said Rajesh Gupta, who is the interim dean for SCIDS, the HDSI founding director and a distinguished professor with the Computer Science and Engineering Department at UC San Diego Jacobs School of Engineering. “With tools like VLG-CBM, the future of AI could be as transparent as it is intelligent.”
Resource Provider Institution(s): National Center for Supercomputing Applications (NCSA), San Diego Supercomputer Center (SDSC)
Resources Used: Delta, Expanse
Affiliations: Halıcıoğlu Data Science Institute (HDSI) UC San Diego
Funding Agency: The research was supported by the NSF (grant nos. CCF-2107189, IIS-2313105 and IIS-2430539), the Hellman Fellowship, and the Intel Rising Star Faculty Award. Computational resources were provided by NSF ACCESS (allocation no. CIS230153).
Grant or Allocation Number(s): CCF-2107189, IIS-2313105, IIS-2430539, CIS230153
The science story featured here was enabled by the U.S. National Science Foundation’s ACCESS program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.