2 Sources
[1]
Disparate privacy risks from medical AI
Medical artificial intelligence (AI) models hold the promise to improve global access to high-quality diagnostics1. However, the training data underlying these models often contain sensitive patient information that may be exposed through privacy attacks2,3,4,5,6,7. Previous research has primarily quantified the success of these attacks in aggregate, across all records in a dataset. Thus, the privacy risk faced by individual patients, who often contribute multiple similar records to a training dataset, is poorly understood. Here we present one of the first patient-level privacy audits of AI models for medical diagnostic applications. We focus on membership inference attacks2,3,4 (MIAs), which seek to determine whether the data of a given individual were used to train a model. Across a diverse range of medical datasets, we show that MIAs can achieve near-perfect success rates for individual patients, even when the aggregate performance does not substantially deviate from random guessing. We further find that the number of patients with high attack success increases substantially with model capacity, and that underrepresented groups -- stratified by disease status, self-reported race, insurance, sex or imaging protocol -- face disproportionately high attack success. Together, our findings show that aggregate privacy metrics can severely underestimate individual privacy risk. Whether the disparate risk profiles we observe extend to attacks beyond MIAs remains an open question, motivating the further development of risk assessment and mitigation techniques that cater to all data-contributing patients. Medical artificial intelligence (AI) has immense potential to improve health outcomes, particularly in regions in which specialized medical expertise is scarce. At the same time, AI also poses new challenges and risks, including security vulnerabilities that arise when models are deployed. Untrusted users with access to an AI model may, by merely observing its predictions, steal its parameters or perform privacy attacks, which can extract sensitive details about the data used for model training. Privacy attacks against an AI model can enable detailed inferences about the individuals who contributed to its training data. For example, a membership inference attack (MIA) attempts to determine whether the data of a specific patient were included in the training dataset of a model. The extent to which this constitutes a privacy violation is nuanced and depends on factors such as the underlying training population and the deployment context of the model. Although inferring membership for a model trained on a general population may be benign, doing so for a model trained on a narrow, disease- or centre-specific cohort acts as a direct proxy for sensitive medical information. For example, a successful MIA against the model in ref. , which predicts anti-cancer immunotherapy efficacy from routine blood test data, reveals that an individual has cancer. The accelerating deployment of medical AI models trained on sensitive patient data calls for rigorous privacy risk assessments. However, previous studies primarily quantified the success rate of MIAs, in aggregate, across all records in a training dataset. This implicitly averages risk across records, thereby obscuring important information on record- and patient-level attack success. Consequently, the risk that an individual faces by contributing their personal data (often multiple records) to an AI training dataset is poorly understood. Given that medical data are a key target for cybercriminals, and pseudonymization alone is increasingly recognized as insufficient to prevent the re-identification of individuals in large, high-dimensional datasets, there is a need to improve our understanding of the threat that AI privacy attacks pose to individual patients. Here we show that deploying medical AI models without protective measures can pose substantial privacy risks to individual data-contributing patients. These risks are particularly acute when membership in a training population itself reveals sensitive medical information. Our privacy audit of AI models trained to perform standard diagnostic (supervised classification) tasks quantifies state-of-the-art MIA success at the resolution of individual data contributors. Using seven large datasets comprising real-world clinical data, including various types of medical images, electrocardiograms and electronic health records, we demonstrate that the success of a MIA is unequally distributed among data-contributing patients. We show that this disparity exists at two levels: (1) the individual patient level, at which some patients experience near-perfect attack success, whereas others remain essentially unaffected; and (2) the group level, at which patient groups underrepresented in a training dataset are often overrepresented among records most vulnerable to MIAs. Together, our results indicate that privacy attacks against AI models may be much more effective at compromising the privacy of individual data contributors than previously thought. This suggests that current AI privacy risk reporting practices may underestimate individual-level risk and thus motivates the integration of mathematically verifiable risk mitigation strategies such as differential privacy (DP) into medical AI model development workflows. Attacking AI by simple hypothesis tests A popular deployment strategy for AI models gives users access to a model through a prediction interface, which, for a given input (for example, the chest radiograph of a patient), returns a corresponding prediction (for example, a 78% chance of pneumonia). This black-box access to a model can be exploited by an untrusted user to conduct a MIA that shows the membership status of a target record, that is, whether the target record was a member of the training dataset of a model or not (Fig. 1a). To infer membership status, MIAs typically make use of the fact that AI models are often slightly more confident about their predictions on training than on non-training data. Likelihood-ratio MIAs (LR-MIAs), the current state-of-the-art in MIAs, frame membership inference as a simple vs. simple hypothesis testing problem on the prediction confidence provided by the target model. In essence, LR-MIAs compare the likelihood of the predicted confidence of the target model for the target record under the null (the target record was not a member) and the alternative hypothesis (the target record was a member). Here, the parameters of the distributions under the two hypotheses are specified by parametric fitting of sample confidence values obtained from reference models. Reference models are models assumed to be trained by the attacker and are ideally, but not necessarily, of similar architecture as the target model and trained on data similar to the training dataset of the target model. Note that objectively larger threats are posed by privacy attacks with stronger assumptions on a potential attacker, such as access to model parameters, access to parameter updates during model training or, furthermore, the ability to modify the model architecture. However, we do not consider them in this study as their strong assumptions are not realistic for careful, practical deployment scenarios. By contrast, the type of attack we consider here requires querying the target model only once (to obtain a prediction for the target record) and may thus be executed by any attacker posing as a real user of an AI system. Notably, as the attacks we study are executed against fully trained models, data-governance-preserving techniques such as federated/swarm-learning provide no protection. From aggregate to patient-level risk MIA performance is evaluated through a receiver operating characteristic (ROC) analysis on numerous repetitions of the MIA game scenario, in which an untrusted user is challenged to guess the membership status of a given record (Fig. 1a). In practice, owing to the computational cost of training AI models, attack success is typically evaluated using a single target model. More specifically, a target model is trained on a random subset of the training dataset, and subsequently, an ROC analysis is performed on the aggregated membership predictions for all records in the dataset (Fig. 1b). Although practical, this approach has a key shortcoming: it provides no indication of the performance of the attack for individual records or patients. To address this issue, we propose a simple technique for estimating record-, and by extension, patient-level vulnerability to LR-MIAs (Fig. 1c). In brief, using a large set of target models (N = 200) trained on random patient subsets, we estimate, for each training record, sampling distributions of the confidence of the target model under the null and alternative hypotheses in LR-MIAs. In other words, we estimate empirical distributions of confidence values as provided by target models, partitioned into models trained and not trained on the target record. Because these distributions are assumed to take Gaussian form in LR-MIAs, record-level attack success, as measured by the area under the ROC curve (AUC), can be calculated in closed form (Methods). A high AUC score, close to the maximum value of 1.0, suggests high privacy risk: a MIA for this record could achieve high sensitivity with little to no false positives. Notably, the record-level MIA AUC also offers a probabilistic interpretation: the record-level MIA AUC is the probability that a confidence score from a target model trained on the target record is larger than a score from a target model not trained on the target record. Correctly determining the membership status for one of the records contributed by an individual patient reveals the membership status of the patient. Thus, we compute patient-level scores by taking the maximum across all record-level scores for a given patient. The raw record-level scores and the average patient-level scores can be found in Extended Data Fig. 1. Notably, our technique for measuring record-level attack success reduces to estimating the bi-normal AUC from sample statistics and thus has desirable statistical properties. Its standard error at the record level can be computed in closed form (Methods). As expected, using a total of N = 200 target models evenly split between null and alternative hypotheses for each record, the standard error of the record-level MIA AUC is small across all records in the investigated datasets (Extended Data Fig. 2). Attacking open-source models Recent advances in attack design have made LR-MIAs much more practical. To illustrate the practical feasibility of conducting MIAs, we demonstrate attacks against two chest radiograph models from the TorchXrayVision library. We used the Robust Membership Inference Attack (RMIA), an improved LR-MIA that requires only one or two reference models, compared with more than 100 for the Likelihood Ratio Attack (LiRA). RMIA achieves this efficiency gain by effectively using reference data (data similar to the target record) alongside the target record to query the target model. Crucially, the attack does not require knowledge of the membership status of the reference data. We simulated a realistic attack setting in which an attacker lacks access to the training dataset of the target model to train reference models and is further constrained by computational resources. Specifically, we used only a single pre-trained PadChest model as a reference model to perform attacks against the CheXpert and MIMIC-CXR models of the library. In this setting, also known as an offline attack, an attacker incurs no computational cost in training the reference model. Instead, they simply need to obtain predictions from the reference model for both the target record and the reference data. This can be done efficiently on commodity hardware without a graphics processing unit (GPU). To conduct the attack, we queried the target model once to collect confidence values for all target records. Using this collection, we then computed RMIA test statistics for each target record by randomly selecting, independent of membership status, N = 500 confidence values from the other targets in this collection as reference data. This strategy would effectively conceal the additional reference data queries to the target model in a real attack. We evaluated attack success on a combined dataset of records from CheXpert and MIMIC-CXR (N = 25, 000 each), which were, respectively, labelled as members and non-members for the CheXpert model (v.v. for the MIMIC-CXR model). In this setting, RMIA achieved substantial aggregate success with respective AUC scores of 0.61 and 0.65 (Fig. 2a). Note that owing to the distribution shift between members and non-members, results from this evaluation setting are not directly comparable to the standard evaluation protocol in which members and non-members are sampled at random from the training dataset. Notably, however, such a distribution shift is expected in a real attack, and this setting is thus of high interest. Near-perfect success for some patients After demonstrating realistic attacks against two open-source models, we next investigated how effectively MIAs can compromise the privacy of individual patients. To this end, we measured patient-level MIA success across a diverse range of medical datasets using, for each, a large set of target models. Notably, we used state-of-the-art model training techniques (for example, data augmentation, weight decay and learning rate schedules) and furthermore, took explicit countermeasures to prevent overfitting, which is known to exacerbate privacy risks. As a result, the investigated target models, despite being trained on roughly half of the available data each, provide high diagnostic performance within a few percentage points of published baselines (Methods). Across all investigated datasets and models, we identified a small subset of patients who are highly vulnerable to LR-MIAs. This is indicated by empirical survival functions (eSF) of patient-level MIA AUC scores, which, for a given score, show the proportion of patients with this score or higher (Fig. 2b). By contrast, ROC curves of aggregate attack success and their corresponding AUC scores do not deviate substantially from the random-guessing baseline, thus incorrectly indicating a low attack vulnerability (Fig. 2c and Extended Data Fig. 1c-e). This suggests that average-case metrics of attack success, as used in the standard evaluation protocol, are unsuitable measures of privacy risk. They do not accurately reflect that some records or patients may be highly vulnerable, whereas the vast majority are not. For the two non-imaging datasets, MIMIC-IV-ED (electronic health records) and PTB-XL (electrocardiograms), we simulated attack settings in which an attacker only has partial access to the target record (Extended Data Fig. 3). Although MIA success generally decreases under partial data access, a subset of patients retain high AUC scores, even in settings in which the attacker has access to only basic clinical information -- such as a patients' age, sex, chief complaints and vital signs (MIMIC-IV-ED), or only the lead I signal from a 12-lead electrocardiogram (PTB-XL). We verified how resolvable the discovered vulnerabilities are by training models with different levels of record-level (ε, δ)-DP protection (Fig. 2d,e). As expected, we find that patient-level MIA risk decreases with stronger levels of privacy protection (smaller ε values). Moreover, in most scenarios, we observe no violation of the record-level DP guarantee (indicated by the square brackets in the panel legend), although many patients contributed multiple records. Violations are observed only for a subset of patients under strong privacy protection (ε = 1), in which some patients have MIA AUC scores exceeding the upper bound on the MIA AUC implied by the record-level DP guarantee. This behaviour is expected and could be mitigated by implementing patient-level DP accounting. Larger models, greater risks Many of the recent AI success stories have been driven not by methodological advances but by scaling up model and dataset sizes. In light of this scaling trend, we next investigated the impact of model capacity on MIA success. For Fitzpatrick 17k and CheXpert, we trained models with increasing capacity, including wide residual networks (WRN-28-2 and WRN-40-4) and vision transformers (ViT-B/16 and ViT-L/16). Where computationally feasible, vision transformers were trained on images of different sizes: 64 × 64 and 128 × 128 pixels; this is indicated by a trailing number behind the model name (for example, ViT-B/16-64 and ViT-B/16-128). We find that MIA success (both at the aggregate and patient levels) increases with model capacity. We observe that the relative share of patients highly vulnerable to MIAs increases greatly for larger models, often by an order of magnitude (Fig. 2f,g). For the dermatology dataset (Fitzpatrick 17k), increasing model capacity yields large gains in diagnostic performance with a pronounced increase between WRN-40-4 and ViT-B/16-128, which was pre-trained on a large dataset of more than 14 million natural images. However, simultaneously, the number of patients with near-perfect attack success (AUC score of 0.95 or higher) increases substantially: 0 (WRN-28-2), 1 out of 10,000 (WRN-40-4), 1 out of 1,000 (ViT-B/16-64) and 1 out of 10 (ViT-B/16-128). We observe a similar trend in the much larger dataset CheXpert, although attack success is generally lower. Notably, for CheXpert, vision transformer models do not achieve diagnostic performance competitive with WRN-based models. This is probably because of the diminished utility of natural-image pre-training for medical greyscale images. Attack success varies by subgroup Motivated by recent findings, which revealed that the diagnostic performance of AI models can differ across patient subgroups, we investigated whether differences in privacy risk exist between subgroups. To this end, we focused our analysis on the most vulnerable records (99th MIA AUC percentile) and compared how frequently a subgroup appears in this extreme-risk tail compared with the overall dataset. We did not consider differences in aggregate attack success, as we previously identified this metric as an unsuitable measure of privacy risk. We find that extreme MIA risk is unequally distributed across patient subgroups when stratifying by disease status, self-reported race, sex, imaging protocol or health insurance. More precisely, for most comparisons, we observe significant differences in subgroup composition between the most vulnerable records and the overall dataset (Fig. 3 and Extended Data Fig. 4). For example, in MIMIC-IV-ED, records from Black patients, patients with Medicaid insurance or patients diagnosed with cancer were observed more frequently than expected among the most vulnerable records (+31%, +126%, and +18% relative change to the overall dataset, respectively). Raw data on the composition of the extreme MIA risk tails as well as the overall datasets are provided in Supplementary Tables 3-16. To find factors that could explain the observed differences, we performed a post hoc test analysis and computed Pearson residuals for all subgroup comparisons (Fig. 3 and Extended Data Fig. 4). We primarily observe large, positive Pearson residuals for underrepresented groups in the datasets, suggesting that relative group size influences MIA risk. Consider, for example, EMBED, a mammography dataset comprising mostly negative findings, that is, unremarkable mammograms of healthy breasts with no indication of a tumour. Models for this dataset are trained to predict breast density, and thus never have direct access to tumour findings. Despite this, benign tumour findings (BI-RADS-2) and tumour findings suspicious of malignancy (BI-RADS-4) account for a disproportionately large share of the most vulnerable records (+60% and +1,179% relative change to the overall dataset, respectively). Similarly, otherwise relatively uncommon images of almost entirely fatty (BI-RADS-A) or extremely dense (BI-RADS-D) breasts also occur disproportionately frequently (+90% and +755%, respectively). To further investigate the relationship between group size and MIA risk, we conducted a meta-analysis of all computed Pearson residuals (Extended Data Fig. 5). Confirming previous observations, we find that large positive Pearson residuals occur mostly for small groups (those that contribute less than 20% of the records of a dataset). Moreover, we observe a weak to moderate negative correlation between group size and Pearson residuals. This suggests that the observed differences in MIA risk may, at least in part, be driven by group-size differences in the training data. Discussion We present data from the first patient-level privacy audit of medical AI models. Our findings confirm early observations of MIA risk heterogeneity and, at the same time, substantially advance previous AI privacy auditing efforts along three key dimensions. First, our work marks a shift towards patient-level risk assessment, which is crucial for real-world clinical datasets, in which individuals often contribute multiple, similar records. Second, we demonstrate that aggregate success rates, as used in the standard evaluation protocol and previous subgroup analyses, underestimate true privacy risks. Third, we confirm that MIA vulnerabilities previously observed on low-dimensional benchmark datasets are present, and arguably more critical, in large representative clinical datasets. Below, we briefly discuss our findings and their implications. The fact that MIAs can achieve near-perfect success rates for individual patients is not adequately captured by the standard evaluation protocol, which measures attack success in aggregate across records. This remains true even when evaluating aggregate attack success at very low false-positive rates (for example, 10), which is the current standard practice. Thus, reporting standards for AI privacy audits need to change. Audits should report the success of privacy attacks at the level of individual data contributors or, if the necessary patient- or person-level identifiers are unavailable, at the record level. We observed that the number of patients highly vulnerable to MIAs increases drastically for larger models. Although the magnitude of this change in patient-level risk was previously unknown, other works have also reported greater attack success against larger, more performant models. This observation that privacy risks grow with model size and predictive performance is explained by theoretical research, which postulates that, for long-tailed data distributions, fitting atypical records from the tail is necessary to achieve optimal performance on unseen data at test time. Our results provide further empirical support for this theory and, together, suggest that a trade-off between patient privacy and model performance is inevitable, particularly for rare diseases. Generally, as we found that the number of patients highly vulnerable to MIAs increases by orders of magnitude with larger models, we recommend carefully evaluating the need for the performance improvements they offer. We found substantial differences in the frequency with which patients from different subgroups experience extreme MIA risk. The fact that some of these groups (for example, self-reported race subgroups in chest radiographs) are not readily distinguishable by human experts raises concerns that MIA risk differences, which probably exist beyond the stratification variables we investigated, may pass unnoticed in practice. We found that the observed risk differences are driven, at least in part, by group-size differences in the training data. Groups of patients that are underrepresented in a model training dataset are often overrepresented among the records most susceptible to MIAs. By contrast, the opposite often holds for majority groups. This finding -- that a disproportionately large share of the AI privacy risk burden rests on underrepresented groups -- complements the existing literature on health inequalities, which has reported worse health outcomes and life expectancy for marginalized and minority groups. Our findings suggest that current trends in medical AI development and deployment could exacerbate these health inequalities. Previous research has shown that the diagnostic performance of AI models, which typically increases with the amount of suitable training data, can be significantly lower for underrepresented (minority) groups. Thus, there is a possibility of a vicious cycle in which minority groups place decreasing levels of trust in AI model performance and security, leading to a decreased willingness to contribute to model training datasets. MIAs facilitate data extraction attacks against generative AI models. Thus, our findings have potentially far-reaching implications for generative AI privacy risk assessments. Extraction attacks allow for high-fidelity reconstruction of full individual records from the training dataset of a model and have been demonstrated for large language models, diffusion-based image generation models and recently, aligned, production-level large language models. Although our study focused on discriminative (diagnostic) AI models, the type of attack we studied is generally applicable and can be used against generative models with little to no modification. We thus see the exploration of our proposed methodology for estimating record- and patient-level MIA success against generative models as an interesting direction for future research. Given the substantial computational resources this would require, exploring scalable approximation techniques is another valuable avenue to investigate. Unlocking the full potential of medical AI will require training models on vast medical datasets; this depends on gaining and upholding the trust of data-contributing patients. To this end, mathematically verifiable approaches to risk mitigation, such as DP, are emerging as the most promising solution. DP, by carefully perturbing parameter updates with white noise during model training or fine-tuning, limits the contribution of the data of any individual to the parameter update and, by extension, to the final model. This provably protects the privacy of any data-contributing patient, no matter how unique or atypical their data may be. Our experimental data confirmed that stronger levels of DP protection effectively reduce MIA success for all data-contributing patients. However, we also observed that mitigating MIAs requires stronger levels of DP protection than previously thought. Specifically, our results indicate that fully mitigating MIAs for all data-contributing patients requires implementing DP protection at the patient level rather than at the record level. Recent research has demonstrated that, in practice, AI models can be trained with strong privacy guarantees while incurring minimal degradation in predictive performance compared with a non-private model. We are thus optimistic that medical AI models protected by DP will have a significant positive impact on health outcomes globally without endangering the privacy of any data-contributing patient. In summary, we present evidence that MIAs can be highly effective at compromising the privacy of individual data-contributing patients. Given this vulnerability, medical AI models and their deployment contexts should be assessed for the sensitive information that attackers could obtain by successfully inferring training dataset membership. To prevent privacy harm, we recommend that vulnerable models be protected by verifiable risk mitigation strategies and/or strict access controls.
[2]
Medical diagnosis AIs can be tricked into telling whose data trained them
AI models used to help diagnose medical conditions have a problem: They're ready and willing to identify patients whose data was used to train them. German researchers reported in a Nature paper published Wednesday that discriminative AI models - those used to classify data and make predictions about new inputs based on their training sets - are particularly susceptible to membership inference attacks (MIAs) that query the models in an attempt to figure out whether a particular datapoint is included in their training sets. What that means for medical AI models is that any patient whose data is used to educate the bot could be exposed, leading to details about their medical history and diagnoses being leaked. In an analysis of seven medical AI datasets consisting of images, ECG records, and general electronic health records, the team determined that individual patients targeted by such attacks can be identified with "near-perfect attack success," which they explain flies in the face of how such models are evaluated for safety. "The fact that MIAs can achieve near-perfect success rates for individual patients is not adequately captured by the standard evaluation protocol, which measures attack success in aggregate across records," the researchers said. Based on their findings, they conclude, reporting standards for AI privacy audits need to change. It gets worse, too: Patients in the dataset are generally easy to identify and, unsurprisingly, those underrepresented in medical AI training data are even easier to finger than those whose data doesn't stand out. Underrepresented groups can include those in a number of sensitive categories: Race, insurance status, sex, the protocol used to conduct medical imaging, and certain disease statuses can all function as outliers that make it easier to identify individuals. "Generally speaking, privacy risks from MIAs become more severe as a model's training cohort becomes more specific," Technical University of Munich AI in Healthcare and Medicine chair and paper lead author Moritz Knolle told The Register in an email conversation. "You could imagine ... scenarios where membership in a training dataset reveals that someone has a dormant genetic condition such as Huntington's disease, depression, or attended a specific, specialised treatment clinic." In other words, exposing healthcare AI training data could be used to identify those with sensitive health conditions, spill secrets they may not want public, or otherwise fuel discrimination. To make things even worse again, the larger the dataset, the easier it is to expose records, and "the magnitude of this change in patient-level risk was previously unknown" in larger models. The privacy devil in the data details This is bad and all, but it's not necessarily the end of the world, as performing an MIA attack on a medical AI model supposes the attacker already has a few things at their disposal, namely at least some medical data belonging to the people they want to identify. "To conduct a MIA an attacker needs access to a target data point," Knolle confirmed to us while also noting that their paper revealed access to a full patient data point isn't needed, in contrast to what was previously believed. "In our paper we show that an attacker with partial access can still successafully conduct MIAs." The MIA attack itself, as detailed in the paper, relies on medical AIs being more certain of their predictions if the input data is already part of their training set. A potential attacker, then, simply peppers an AI model with obtained patient data, checks the confidence level, and surmises that said patient is part of the training data. "An attacker conducting a MIA does not need to know who the data belongs to that they are trying to conduct the MIA with," Knolle explained. "In fact, all the dataset we use in our study were anonymized." Anonymized in the datasets, but not the target data, that is. As explained in the paper their MIA attacks were largely error-free at the individual patient level, meaning confidence levels are an accurate way to figure out if a particular patient's data is part of a training set. "The attacker would simply need access to someone's blood test results, or part of these results" in order to infer inclusion, Knolle said. Of course, they have to get that data first, but given how frequently healthcare data is exposed in breaches, it's not exactly hard to imagine a bad actor getting ahold of something they can use. "Given that medical data is not always securely stored it is not unthinkable that an attacker could get access, for example, by gaining unauthorized access to the database of your general practitioner after they performed a routine blood test," Knolle said. How to protect patient data? Asked what he hopes this research accomplishes, Knolle told us he just wants the medical world to understand that AI training data needs to be better secured. "I hope that the medical AI community will start to take privacy risks seriously and that risk mitigation techniques are used in situations where they are necessary," Knolle said. The researchers make several recommendations for how to do this, like through the use of differential privacy frameworks that are designed to mathematically guarantee training data remains anonymous - a key consideration if medical AI firms want patients to trust them with their data. As mentioned above, the team also wants to see privacy audit standards change to consider individual-level data, not just aggregate privacy risks. Alternatively, medical AI training data could just be compiled so that underrepresented groups are better represented, Knolle said. "There are many situations where a successful MIA represents a small or negligible privacy violation," Knolle noted. "These are situations where AI models are trained on large, general populations in which both healthy and diseased individuals are represented in sufficient numbers." Representation, in other words, definitely matters when it comes to keeping patient data private. ®
Share
Copy Link
German researchers published findings in Nature showing AI models used for medical diagnosis can be exploited to identify patients whose data trained them. Membership inference attacks achieved near-perfect success rates for individual patients, with underrepresented groups facing disproportionately high privacy risks. The study calls for urgent changes to privacy audit standards.

AI models in healthcare designed to improve diagnostic accuracy carry severe privacy risks that could expose sensitive patient information, according to groundbreaking research published in Nature
1
. German researchers at the Technical University of Munich conducted one of the first patient-level privacy audits of medical AI, revealing that discriminative AI models used to classify data and make predictions are particularly vulnerable to membership inference attacks2
.The study examined seven large datasets comprising medical images, electrocardiograms, and electronic health records to assess how easily attackers could determine whether specific patient data was used to train AI models
1
. The findings expose a troubling reality: while aggregate privacy metrics might suggest minimal risk, individual privacy risks can be catastrophically high for certain patients.Membership inference attacks achieved near-perfect success rates for individual patients, even when aggregate performance across all records showed no substantial deviation from random guessing
1
. These attacks exploit a fundamental characteristic of medical AI: models demonstrate higher confidence in predictions when presented with data already part of their training set. Attackers can simply query a model with obtained patient data, check the confidence level, and accurately infer whether that patient contributed to the training dataset2
.The implications extend beyond mere data exposure. When a model is trained on a disease-specific cohort, successful membership inference attacks directly reveal sensitive medical information. For instance, identifying patients from training data for a model predicting cancer immunotherapy efficacy confirms that individual has cancer
1
. Lead author Moritz Knolle explained that membership in certain training datasets could reveal dormant genetic conditions such as Huntington's disease, depression, or attendance at specialized treatment clinics2
.The research uncovered disparate privacy risks from medical AI that disproportionately affect certain patient populations. Underrepresented groups in training data—stratified by disease status, self-reported race, insurance status, sex, or imaging protocol—face significantly higher attack success rates
1
. These outliers make it easier to identify individuals whose data stands out from the majority2
.The number of patients experiencing high attack success increases substantially with model capacity, meaning larger, more sophisticated models actually amplify individual privacy risks
1
. This finding contradicts assumptions that larger datasets provide better privacy protection through anonymity and highlights that the magnitude of patient-level risk in larger models was previously unknown2
.Conducting membership inference attacks requires attackers to possess at least partial patient data, though not necessarily complete records. The research demonstrated that attackers with partial access can still successfully execute these privacy attacks
2
. Given the frequency of healthcare data breaches, obtaining such information is increasingly feasible. Knolle noted that medical data is not always securely stored, and attackers could gain unauthorized access to databases maintained by general practitioners after routine blood tests2
.Crucially, attackers conducting membership inference attacks don't need to know the identity of data subjects initially. All datasets used in the study were anonymized, yet the attacks remained largely error-free at identifying patients from training data
2
. This challenges the assumption that pseudonymization alone prevents re-identification in large, high-dimensional datasets1
.The researchers conclude that aggregate privacy metrics severely underestimate individual privacy risk and call for immediate changes to privacy audit standards
1
. Standard evaluation protocols measuring attack success across all records fail to capture the near-perfect success rates achievable for individual patients2
.Knolle emphasized that privacy risks from membership inference attacks become more severe as a model's training cohort becomes more specific, potentially fueling discrimination or exposing secrets patients wish to keep private
2
. The study motivates further development of risk assessment and mitigation techniques that protect all data-contributing patients, particularly as medical artificial intelligence deployment accelerates globally1
. Whether these disparate risk profiles extend to privacy attacks beyond membership inference attacks remains an open question requiring continued investigation.Summarized by
Navi
[1]
1
Technology

2
Policy and Regulation

3
Technology
