University of Chicago
Meshes of Midscale Models (M3) Initiative
Overview
Faculty at the Center for Translational Data Science (CTDS), the Department of Medicine, the Department of Computer Science, and the Data Science Institute at the University of Chicago have started an initiative called Meshes of Midscale Models (M3) to address some of the current challenges in applying AI to the fields of biology, medicine, and healthcare. Central to this initiative are: 1) growing the number of organizations that can develop and operate data commons containing high-quality biomedical data; 2) developing AI algorithms, methods and techniques for building machine learning, large language models (LLM), and large quantitative models (LQM) over the data in these commons (“AI Commons”); and 3) developing technology to support distributed and federated AI over multiple AI Commons (“AI Meshes”). To support these aims, CTDS is leading a collaboration to develop open-source software for 1), 2) and 3).
Challenges in Biomedical AI
The incorporation of AI in biomedical domains is hindered by several key obstacles:
Data Limitations: The scarcity and inaccessibility of high-quality labeled biomedical data impede the training of Large Language Models (LLMs), Generative AI (GenAI) models, and Large Quantitative Models (LQM). Much of this data is secured behind organizational firewalls due to privacy and compliance issues.
Limited understanding of embeddings for biomedical data: While embedding techniques have advanced for text and image data, translating these methods to complex biomedical data, such as those involving genes, proteins, receptors, etc., remains an active research area without equivalent success.
Economics of Data: Developing and maintaining biomedical data systems is expensive and demands a deeper understanding of the economics related to data value creation and sustainable data marketplaces.
Resource Constraints: Recruiting skilled personnel in this domain is challenging, slowing progress despite the urgent need for advancement.
M3 Approach
The CTDS M3 Initiative is addressing these challenges through a multidisciplinary, team-science approach that integrates expertise in biology and medicine, computer science, ML/AI, and economics in the following areas:
Development of Biomedical Data Commons: Develop open-source software (“Commons Services Operations Center or CSOC”) to simplify the process for organizations to set up, operate and monitor one, two, or more Gen3 data commons.
Develop specialized embeddings and architectures: Focus on the development of algorithms and systems creating advanced embeddings and neural network architectures suited for complex biomedical data.
Small and Midscale AI Models: Develop open-source software, algorithms and techniques for developing small and midscale machine learning, LLM and LQM over data in Gen3 data commons. These models offer significant advantages in cost-sensitive or low-resource environments compared to large proprietary models.
AI Commons and Meshes: Develop a new data platform—AI Commons—that facilitates the development of small and midscale models without extracting data from the commons. Develop AI Meshes that support distributed and federated AI, acting as a hub for specialized biomedical computations.
Data Economics: Explore the economics of data to create frameworks that maximize the value and impact of data investments, supporting marketplaces that generate revenue while maintaining privacy.
Foundational Research: Investigate theoretical foundations of AI as applied to biomedical data, focusing on scaling laws, integration with traditional biomedical discovery systems, and creating high-quality synthetic data.
Training Programs: Develop and operate a training program to equip researchers with the skills necessary for advancing AI systems in biomedicine.
M3 News
AACR News Release: Pretrained Machine Learning Models May Help Accurately Diagnose Nonmelanoma Skin Cancer in Resource-limited Settings.
For more information
Please contact: m3info@lists.uchicago.edu