3 Sources
3 Sources
[1]
Google Updates MedGemma With Imaging Support, Launches MedASR for Clinical Dictation | AIM
Both models are available via the AI community Hugging Face and Google Cloud's development platform Vertex AI. Google has released MedGemma 1.5, an updated version of its open healthcare-focused AI model. The company said the model expands support for CT scans, MRI and histopathology, and also introduced MedASR, a medical speech-to-text system for clinical dictation and healthcare workflows. Both models are available via the AI community Hugging Face and Google Cloud's AI development platform Vertex AI. MedGemma 1.5 4B is part of Google's Health AI Developer Foundations programme and is available for research and commercial use. "We are updating our open MedGemma model with improved medical imaging support," said Google in a blog post. "We also describe MedASR, our new open medical speech-to-text model." According to Google, MedGemma 1.5 improved baseline accuracy by 3% on disease classification tasks using CT scans and by 14% on MRI-based classification compared with the previous version. The company also reported gains in anatomical localisation in chest X-rays and structured data extraction from laboratory reports. The 4B-parameter model is designed to be compute-efficient and capable of running offline, while a larger 27B-parameter version remains available for text-heavy medical applications. MedGemma models can be deployed on Google Cloud through Vertex AI, the company said. Google also announced MedASR, an open automated speech recognition model fine-tuned for medical dictation. In internal tests, MedASR recorded a 5.2% word error rate on chest X-ray dictations, compared with 12.5% for a general-purpose speech recognition model. "While text is currently the primary interface for large language models, verbal communication remains crucial in many aspects of healthcare, including medical dictation and live conversations between patients and providers," the company said. Google also announced the MedGemma Impact Challenge, a hackathon hosted on the data science and machine learning platform Kaggle, with $100,000 in prizes to encourage developers to build healthcare applications using the models. The company said MedGemma is intended as a developer foundation model and should be validated and adapted before use in clinical settings. The launch comes amid intensifying competition in healthcare AI. OpenAI recently introduced ChatGPT Health, while Anthropic has rolled out Claude for Healthcare, pushing generative AI deeper into medical workflows.
[2]
Unlike OpenAI, Google's Healthcare Push Takes an Open-Source Approach
Google's open-source models are available for research and commercial use Google has introduced two new healthcare-focused artificial intelligence (AI) models, MedGemma 1.5 and MedASR, aimed at improving how medical images and clinical speech data are processed. The release of the open-source AI models marks the next step in the Mountain View-based tech giant's push in the healthcare space. Interestingly, unlike OpenAI, which is offering its ChatGPT for Healthcare as a commercial product for enterprise, the Gemini-maker has taken a community-focused approach by making MedGemma 1.5 and MedASR publicly available. Google Releases MedGemma 1.5 and MedASR AI Models In a blog post, Google Research detailed the new AI models and their capabilities. MedGemma 1.5 is the latest version of Google's open medical vision-language model. It is designed to analyse medical images alongside text, allowing it to interpret scans, answer questions about visual medical data, and support downstream research tasks. The updated version improves on earlier iterations by offering stronger multimodal reasoning, better handling of complex medical imagery, and increased flexibility for fine-tuning on specialised datasets. Google said MedGemma 1.5 can work with different types of medical images, including radiology scans and other clinically relevant visuals. The model is intended to support research use cases such as image-based question answering, report generation, and structured data extraction. The company maintained that the model is not designed to provide diagnoses or treatment recommendations and should be used as a supporting tool in research and development environments. Alongside MedGemma 1.5, Google also introduced MedASR, a medical automatic speech recognition model tailored for healthcare settings. MedASR is designed to convert spoken clinical conversations into text, with a focus on handling medical terminology, accents, and real-world clinical audio conditions. Google said the model aims to reduce errors commonly seen in general-purpose speech recognition systems when applied to healthcare use cases. The company noted that MedASR can be used for tasks such as transcribing doctor-patient interactions, clinical notes, and dictated reports. It is designed to be adaptable across different healthcare environments and can be fine-tuned for specific clinical workflows or documentation standards. Google said that all variants of MedGemma and the MedASR model can be accessed via the company's Hugging Face listing or the Vertex AI platform. Additionally, the tech giant's MedGemma GitHub repository is also available for developers who want to check out the tutorials. Both models come with a permissive licence allowing both research and commercial use cases.
[3]
MedGemma 1.5 & MedASR explained: High-dimensional imaging and medical speech-to-text
Google Health AI releases open models for imaging and clinical dictation The adoption of artificial intelligence in healthcare is accelerating at twice the rate of the broader economy, driven by the need for tools that can handle the complexity of medical data. To support this transformation, Google Research has released major updates to its open medical : MedGemma 1.5, a multimodal model with advanced imaging capabilities, and MedASR, a specialized speech-to-text model designed for the medical domain. Together, these models represent a shift from analyzing static, two-dimensional data to interpreting the high-dimensional, multimodal reality of clinical practice. Also read: Optical Illusions: Why AI models are succumbing to the same tricks we do MedGemma 1.5: Seeing in 3D and Time While the original MedGemma 1 could interpret standard 2D images like chest X-rays, medical diagnostics often rely on more complex data. MedGemma 1.5 (specifically the 4B parameter version) bridges this gap by introducing support for high-dimensional imaging. This update allows the model to interpret three-dimensional volume representations, such as Computed Tomography (CT) scans and Magnetic Resonance Imaging (MRI), as well as whole-slide histopathology imaging. Rather than looking at a single slice, developers can now build applications where the model analyzes multiple slices or patches to understand the full context of a scan. Internal benchmarks show significant gains, including a 14% improvement in classifying disease-related MRI findings compared to the previous version. Beyond 3D imaging, MedGemma 1.5 improves longitudinal analysis. In clinical settings, a patient's trajectory is often more important than a single snapshot. The new model excels at reviewing chest X-ray time series, allowing it to track changes over time with greater accuracy. Also read: Anthropic unveils medical AI suite to rival ChatGPT Health: Key features explained The model also boasts improved text-based reasoning, achieving a 90% score on the EHRQA benchmark (Electronic Health Record Question Answering), a 22% jump over MedGemma 1. This ensures that the model is as effective at parsing complex lab reports and medical records as it is at scanning images. MedASR: The Listener Medical documentation relies heavily on dictation, yet general-purpose speech models often stumble over complex medical terminology. MedASR addresses this by providing an open Automated Speech Recognition (ASR) model fine-tuned specifically for medical dictation. When compared to generalist models like Whisper (large-v3), MedASR demonstrated a drastic reduction in errors. It recorded 58% fewer errors on chest X-ray dictations and 82% fewer errors on diverse medical dictation benchmarks. For developers, MedASR is designed to work in tandem with MedGemma. A clinician could potentially dictate a query (processed by MedASR) regarding a specific CT volume (analyzed by MedGemma 1.5), creating a seamless multimodal workflow that mirrors natural clinical interaction. An open foundation for developers Both models are part of Google's Health AI Developer Foundations (HAI-DEF). They are released as open weights, allowing researchers and developers to fine-tune and adapt them for specific use cases - from anatomical localization to structuring data from messy lab reports. By providing these compute-efficient models via platforms like Hugging Face and Vertex AI, Google aims to lower the barrier to entry for creating next-generation medical applications that can see, listen, and reason more like a clinician.
Share
Share
Copy Link
Google has launched MedGemma 1.5, an updated healthcare AI model with support for CT scans, MRI, and histopathology, alongside MedASR, a medical speech-to-text system. The open-source AI models are available via Hugging Face and Vertex AI, marking Google's community-focused approach to healthcare AI as competition intensifies with OpenAI and Anthropic.
Google has released MedGemma 1.5 and MedASR, two open-source AI models designed to transform how medical professionals process clinical data. The updated MedGemma 1.5 expands support for CT scans and MRI, while MedASR tackles clinical dictation with specialized medical speech-to-text capabilities. Both models are available through Hugging Face and Vertex AI, part of Google's Health AI Developer Foundations programme for research and commercial use
1
.
Source: Digit
The release marks a notable strategic difference in the healthcare AI landscape. While OpenAI offers ChatGPT Health as a commercial enterprise product and Anthropic has rolled out Claude for Healthcare, Google has taken an open-source approach. This community-focused strategy allows developers to fine-tune and adapt the models for specific clinical workflows, lowering barriers to entry for building next-generation medical applications
2
.MedGemma 1.5 represents a significant upgrade from its predecessor, introducing high-dimensional imaging capabilities that extend beyond standard 2D chest X-rays. The 4B-parameter vision-language model now interprets three-dimensional volume representations, including CT scans, MRI, and whole-slide histopathology imaging. This multimodal model can analyze multiple slices or patches to understand the full context of a scan, rather than examining isolated images
3
.
Source: Gadgets 360
The performance improvements are substantial. Google reported that MedGemma 1.5 achieved a 14% accuracy gain on MRI-based disease classification compared to the previous version, and a 3% improvement on CT scan classification tasks. The model also demonstrates enhanced capabilities in anatomical localization in chest X-rays and structured data extraction from laboratory reports
1
.Beyond static imaging, MedGemma 1.5 excels at longitudinal analysis, tracking changes in chest X-ray time series with greater accuracy. The model achieved a 90% score on the EHRQA benchmark for Electronic Health Record Question Answering, representing a 22% jump over the original MedGemma. This ensures the model handles complex lab reports and medical records as effectively as it processes medical imaging analysis
3
.MedASR addresses a persistent challenge in healthcare documentation: accurately transcribing clinical conversations filled with complex medical terminology. This automated speech recognition model is fine-tuned specifically for medical dictation, designed to handle accents and real-world clinical audio conditions that often trip up general-purpose systems
2
.
Source: AIM
The performance difference is striking. In internal tests, MedASR recorded a 5.2% word error rate on chest X-ray dictations, compared to 12.5% for general-purpose speech recognition models. This translates to 58% fewer errors on chest X-ray dictations and 82% fewer errors on diverse medical dictation benchmarks when compared to models like Whisper (large-v3)
3
.Google emphasized that verbal communication remains crucial in healthcare, including medical dictation and live conversations between patients and providers. MedASR is designed to transcribe doctor-patient interactions, clinical notes, and dictated reports, with adaptability across different healthcare environments. The model can be fine-tuned for specific clinical workflows or documentation standards, making it practical for diverse medical settings
1
.Related Stories
Both models can be accessed via Google Cloud through Vertex AI, as well as through Hugging Face listings. The 4B-parameter version of MedGemma 1.5 is designed to be compute-efficient and capable of running offline, while a larger 27B-parameter version remains available for text-heavy medical applications. Google's MedGemma GitHub repository provides tutorials for developers looking to integrate these tools
2
.To accelerate adoption, Google announced the MedGemma Impact Challenge, a hackathon hosted on Kaggle with $100,000 in prizes. The competition encourages developers to build healthcare applications using the models, potentially creating seamless multimodal workflows where clinicians dictate queries processed by MedASR regarding specific CT volumes analyzed by MedGemma 1.5
1
.Google emphasized that MedGemma is intended as a developer foundation model and should be validated and adapted before use in clinical settings. The permissive license allows both research and commercial use cases, enabling healthcare organizations to customize the models for their specific needs. As healthcare AI adoption accelerates at twice the rate of the broader economy, these tools provide developers with the foundation to build applications that can see, listen, and reason more like clinicians
3
.Summarized by
Navi
1
Policy and Regulation

2
Technology

3
Policy and Regulation
