Curated by THEOUTPOST
On Fri, 11 Apr, 8:01 AM UTC
3 Sources
[1]
New method efficiently safeguards sensitive AI training data
Caption: MIT researchers enhanced a data privacy technique so it is more computationally efficient and increases the accuracy of the AI algorithms to which it is applied. Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models -- but they often make those models less accurate. MIT researchers recently developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data, such as medical images or financial records, remain safe from attackers. Now, they've taken this work a step further by making their technique more computationally efficient, improving the tradeoff between accuracy and privacy, and creating a formal template that can be used to privatize virtually any algorithm without needing access to that algorithm's inner workings. The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks. They also demonstrated that more "stable" algorithms are easier to privatize with their method. A stable algorithm's predictions remain consistent even when its training data are slightly modified. Greater stability helps an algorithm make more accurate predictions on previously unseen data. The researchers say the increased efficiency of the new PAC Privacy framework, and the four-step template one can follow to implement it, would make the technique easier to deploy in real-world situations. "We tend to consider robustness and privacy as unrelated to, or perhaps even in conflict with, constructing a high-performance algorithm. First, we make a working algorithm, then we make it robust, and then private. We've shown that is not always the right framing. If you make your algorithm perform better in a variety of settings, you can essentially get privacy for free," says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework. She is joined in the paper by Hanshen Xiao PhD '24, who will start as an assistant professor at Purdue University in the fall; and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering at MIT. The research will be presented at the IEEE Symposium on Security and Privacy. Estimating noise To protect sensitive data that were used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. This noise reduces a model's accuracy, so the less noise one can add, the better. PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy. The original PAC Privacy algorithm runs a user's AI model many times on different samples of a dataset. It measures the variance as well as correlations among these many outputs and uses this information to estimate how much noise needs to be added to protect the data. This new variant of PAC Privacy works the same way but does not need to represent the entire matrix of data correlations across the outputs; it just needs the output variances. "Because the thing you are estimating is much, much smaller than the entire covariance matrix, you can do it much, much faster," Sridhar explains. This means that one can scale up to much larger datasets. Adding noise can hurt the utility of the results, and it is important to minimize utility loss. Due to computational cost, the original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. Because the new variant estimates anisotropic noise, which is tailored to specific characteristics of the training data, a user could add less overall noise to achieve the same level of privacy, boosting the accuracy of the privatized algorithm. Privacy and stability As she studied PAC Privacy, Sridhar hypothesized that more stable algorithms would be easier to privatize with this technique. She used the more efficient variant of PAC Privacy to test this theory on several classical algorithms. Algorithms that are more stable have less variance in their outputs when their training data change slightly. PAC Privacy breaks a dataset into chunks, runs the algorithm on each chunk of data, and measures the variance among outputs. The greater the variance, the more noise must be added to privatize the algorithm. Employing stability techniques to decrease the variance in an algorithm's outputs would also reduce the amount of noise that needs to be added to privatize it, she explains. "In the best cases, we can get these win-win scenarios," she says. The team showed that these privacy guarantees remained strong despite the algorithm they tested, and that the new variant of PAC Privacy required an order of magnitude fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees could withstand state-of-the-art attacks. "We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning," Devadas says. The researchers also want to test their method with more complex algorithms and further explore the privacy-utility tradeoff. "The question now is: When do these win-win situations happen, and how can we make them happen more often?" Sridhar says. "I think the key advantage PAC Privacy has in this setting over other privacy definitions is that it is a black box -- you don't need to manually analyze each individual query to privatize the results. It can be done completely automatically. We are actively building a PAC-enabled database by extending existing SQL engines to support practical, automated, and efficient private data analytics," says Xiangyao Yu, an assistant professor in the computer sciences department at the University of Wisconsin at Madison, who was not involved with this study. This research is supported, in part, by Cisco Systems, Capital One, the U.S. Department of Defense, and a MathWorks Fellowship.
[2]
New method efficiently safeguards sensitive AI training data
Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models -- but they often make those models less accurate. MIT researchers recently developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data, such as medical images or financial records, remain safe from attackers. Now, they've taken this work a step further by making their technique more computationally efficient, improving the tradeoff between accuracy and privacy, and creating a formal template that can be used to privatize virtually any algorithm without needing access to that algorithm's inner workings. The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks. They also demonstrated that more "stable" algorithms are easier to privatize with their method. A stable algorithm's predictions remain consistent even when its training data are slightly modified. Greater stability helps an algorithm make more accurate predictions on previously unseen data. The researchers say the increased efficiency of the new PAC Privacy framework, and the four-step template one can follow to implement it, would make the technique easier to deploy in real-world situations. "We tend to consider robustness and privacy as unrelated to, or perhaps even in conflict with, constructing a high-performance algorithm. First, we make a working algorithm, then we make it robust, and then private. We've shown that is not always the right framing. If you make your algorithm perform better in a variety of settings, you can essentially get privacy for free," says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework. She is joined in the paper by Hanshen Xiao PhD '24, who will start as an assistant professor at Purdue University in the fall; and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy. Estimating noise To protect sensitive data that were used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. This noise reduces a model's accuracy, so the less noise one can add, the better. PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy. The original PAC Privacy algorithm runs a user's AI model many times on different samples of a dataset. It measures the variance as well as correlations among these many outputs and uses this information to estimate how much noise needs to be added to protect the data. This new variant of PAC Privacy works the same way but does not need to represent the entire matrix of data correlations across the outputs; it just needs the output variances. "Because the thing you are estimating is much, much smaller than the entire covariance matrix, you can do it much, much faster," Sridhar explains. This means that one can scale up to much larger datasets. Adding noise can hurt the utility of the results, and it is important to minimize utility loss. Due to computational cost, the original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. Because the new variant estimates anisotropic noise, which is tailored to specific characteristics of the training data, a user could add less overall noise to achieve the same level of privacy, boosting the accuracy of the privatized algorithm. Privacy and stability As she studied PAC Privacy, Sridhar theorized that more stable algorithms would be easier to privatize with this technique. She used the more efficient variant of PAC Privacy to test this theory on several classical algorithms. Algorithms that are more stable have less variance in their outputs when their training data change slightly. PAC Privacy breaks a dataset into chunks, runs the algorithm on each chunk of data, and measures the variance among outputs. The greater the variance, the more noise must be added to privatize the algorithm. Employing stability techniques to decrease the variance in an algorithm's outputs would also reduce the amount of noise that needs to be added to privatize it, she explains. "In the best cases, we can get these win-win scenarios," she says. The team showed that these privacy guarantees remained strong despite the algorithm they tested, and that the new variant of PAC Privacy required an order of magnitude fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees could withstand state-of-the-art attacks. "We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning," Devadas says. The researchers also want to test their method with more complex algorithms and further explore the privacy-utility tradeoff. "The question now is, when do these win-win situations happen, and how can we make them happen more often?" Sridhar says.
[3]
New method efficiently safeguards sensitive AI training data
Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models -- but they often make those models less accurate. MIT researchers have recently developed a framework, based on a privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data -- such as medical images or financial records -- remains safe from attackers. Now, they've taken this work a step further by making their technique more computationally efficient, improving the trade-off between accuracy and privacy, and creating a formal template that can be used to privatize virtually any algorithm without needing access to that algorithm's inner workings. The team has utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks. They also demonstrated that more "stable" algorithms are easier to privatize with their method. A stable algorithm's predictions remain consistent even when its training data are slightly modified. Greater stability helps an algorithm make more accurate predictions on previously unseen data. The researchers say the increased efficiency of the new PAC Privacy framework, and the four-step template one can follow to implement it, would make the technique easier to deploy in real-world situations. "We tend to consider robustness and privacy as unrelated to, or perhaps even in conflict with, constructing a high-performance algorithm. First, we make a working algorithm, then we make it robust, and then private. We've shown that is not always the right framing. If you make your algorithm perform better in a variety of settings, you can essentially get privacy for free," says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework. She is joined in the paper by Hanshen Xiao Ph.D., who will start as an assistant professor at Purdue University in the fall; and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy. Estimating noise To protect sensitive data that was used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. This noise reduces a model's accuracy, so the less noise one can add, the better. PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy. The original PAC Privacy algorithm runs a user's AI model many times on different samples of a dataset. It measures the variance as well as correlations among these many outputs and uses this information to estimate how much noise needs to be added to protect the data. This new variant of PAC Privacy works the same way but does not need to represent the entire matrix of data correlations across the outputs; it just needs the output variances. "Because the thing you are estimating is much, much smaller than the entire covariance matrix, you can do it much, much faster," Sridhar explains. This means that one can scale up to much larger datasets. Adding noise can hurt the utility of the results, and it is important to minimize utility loss. Due to computational cost, the original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. Because the new variant estimates anisotropic noise, which is tailored to specific characteristics of the training data, a user could add less overall noise to achieve the same level of privacy, boosting the accuracy of the privatized algorithm. Privacy and stability As she studied PAC Privacy, Sridhar theorized that more stable algorithms would be easier to privatize with this technique. She used the more efficient variant of PAC Privacy to test this theory on several classical algorithms. Algorithms that are more stable have less variance in their outputs when their training data changes slightly. PAC Privacy breaks a dataset into chunks, runs the algorithm on each chunk of data, and measures the variance among outputs. The greater the variance, the more noise must be added to privatize the algorithm. Employing stability techniques to decrease the variance in an algorithm's outputs would also reduce the amount of noise that needs to be added to privatize it, she explains. "In the best cases, we can get these win-win scenarios," she says. The team showed that these privacy guarantees remained strong despite the algorithm they tested, and that the new variant of PAC Privacy required an order of magnitude fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees could withstand state-of-the-art attacks. "We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning," Devadas says. The researchers also want to test their method with more complex algorithms and further explore the privacy-utility trade-off. "The question now is, when do these win-win situations happen, and how can we make them happen more often?" Sridhar says.
Share
Share
Copy Link
MIT researchers have developed an enhanced version of the PAC Privacy framework, improving the balance between AI model accuracy and data privacy protection. The new method is more computationally efficient and can be applied to various algorithms without accessing their inner workings.
Researchers at the Massachusetts Institute of Technology (MIT) have made significant strides in safeguarding sensitive data used in AI training while maintaining model performance. The team, led by graduate student Mayuri Sridhar, has enhanced a privacy metric called PAC Privacy, making it more computationally efficient and improving the trade-off between accuracy and privacy in AI models 123.
Data privacy has long been a concern in AI development, with existing security techniques often compromising model accuracy. The enhanced PAC Privacy framework addresses this issue by efficiently estimating the minimum amount of noise needed to protect sensitive data without significantly impacting model performance 1.
The new variant of PAC Privacy offers several advantages over its predecessor:
Increased computational efficiency: The method now focuses on output variances rather than entire data correlation matrices, allowing for faster processing and scalability to larger datasets 2.
Anisotropic noise estimation: Unlike the original version that added uniform noise, the new variant tailors noise to specific data characteristics, resulting in less overall noise and improved accuracy 12.
Broader applicability: The researchers have created a formal template that can privatize virtually any algorithm without needing access to its inner workings 3.
The research team discovered a correlation between algorithm stability and ease of privatization. More stable algorithms, whose predictions remain consistent with slight modifications to training data, are easier to privatize using the PAC Privacy method 123.
The enhanced PAC Privacy framework has significant potential for real-world applications:
Protecting sensitive data: The method can safeguard various types of sensitive information, including medical images and financial records 123.
Improved privacy-utility trade-off: The new approach allows for better balance between data protection and model accuracy 12.
Withstanding attacks: The team demonstrated that the privacy guarantees could withstand state-of-the-art attacks in simulations 23.
Future research will focus on co-designing algorithms with PAC Privacy to enhance stability, security, and robustness from the outset. The team also plans to test the method with more complex algorithms and further explore the privacy-utility trade-off 123.
As Sridhar notes, "The question now is: When do these win-win situations happen, and how can we make them happen more often?" 123. This research opens up new possibilities for creating AI systems that are both highly accurate and respectful of data privacy, potentially revolutionizing the field of AI development and deployment.
Reference
[1]
[2]
[3]
Researchers at NYU Tandon School of Engineering have developed Orion, a novel framework that enables AI models to operate on encrypted data, potentially revolutionizing data privacy in artificial intelligence applications.
2 Sources
2 Sources
A University at Buffalo-led study introduces a novel encryption technique for AI-powered medical data, proving highly effective in detecting sleep apnea while safeguarding patient privacy.
2 Sources
2 Sources
MIT researchers have created a novel method to identify and remove specific data points in AI training datasets that contribute to bias, improving model performance for underrepresented groups while preserving overall accuracy.
3 Sources
3 Sources
Scientists at Los Alamos National Laboratory have created a novel AI defense method called Low-Rank Iterative Diffusion (LoRID) that effectively shields neural networks from adversarial attacks, setting a new benchmark in AI security.
2 Sources
2 Sources
Researchers at Boston University have developed a computational framework using AI techniques to protect privacy in voice-based cognitive health assessments, balancing data security with diagnostic accuracy.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved