We often harp on the spectacular utility of AIs like OpenAI's ChatGPT, Perplexity, Google's Gemini and NotebookLM tools, among others. Marveling at the convenience of having a super-smart assistant just a browser tab away is easy and intoxicating. Most of these tools can clean up messy meeting transcripts, plan travel itineraries, and make sense of my product user manuals. But while we're enjoying productivity, the business models these AI giants use remain the elephant in the room. I doubt the subscription fee for basic services even covers operating costs at this stage, when businesses are focused on getting consumers hooked. For users on the free tier, these AI giants rely entirely on consuming supplied information to get smarter about our usage.
Every prompt, document upload, and innocuous question is potential fuel for the next update, helping companies profile your work and who you are. We've seen this business model with social media companies, and AI operates on the same, painfully simple principles: user data, trends, and behavioral patterns are leveraged for monetization or development. As such, it is essential that users stop treating LLMs as confidants and personal diaries -- at least the cloud-based LLMs, that is.
I wish NotebookLM was easier to integrate with other tools
If I had three wishes, one would be better NotebookLM integrations.
Posts 2
By Mahnoor Faisal
Lack of privacy and the total loss of control
Your data isn't yours online
The moment you hit Enter after a prompt, you lose custody of your data. It is whisked away to an external server farm, usually in a database you can't see, managed by administrators you don't know. This sounds like dystopian fear-mongering, but it greatly resembles cloud storage services that spawned the idea of self-hosting. Providers like OpenAI have clarified that, by default, they may review conversations and utilize your input to train their models. While they offer opt-outs, they are buried in settings menus that most users never touch. Brilliant business ideas and sensitive client emails you ask AI to rephrase also become part of the corpus.
Under regulations like GDPR, we are supposed to have the "right to be forgotten," with options like account and user data deletion. But good luck enforcing that with an LLM, because unlike a traditional database where cells are wiped or overwritten, AI models "consume" the way a brain would form neural pathways when learning a new skill. It is incredibly difficult, if not technically impossible, to excise specific data points from a trained model reliably, violating data minimization principles because, apparently, you may not delete addresses from a neural network.
Moreover, LLMs are inference machines that rapidly adapt to infer sensitive details, such as your political leanings, health status, or location, from seemingly harmless input. You might ask for a recipe, but your phrasing could inadvertently signal your demographic or cultural background to a system designed to categorize and predict.
Security vulnerabilities are non-zero
LLMs are also on the internet, after all
If the privacy arguments feel abstract, security risks make them prone to failure just as much as any other online service. For instance, in March 2023, OpenAI suffered a significant glitch where users could see the titles of other users' chat histories. A lot of this information was understandably private and confidential to the user. However, the lapse still serves as a stark reminder that your private session is one SQL error away from becoming a public statistic.
Beyond accidental bugs, the LLMs themselves are leaky sieves. Prompt injection attacks are on the rise, where malicious actors trick an LLM into ignoring its safety guardrails and spitting out its training data. If the developers' personal information or proprietary code could be regurgitated verbatim to a stranger, targeting individual user accounts isn't far-fetched. We are effectively storing PII (Personally Identifiable Information) in a glass vault.
Legal compliance is still the Wild West
Constantly playing catch-up
Lastly, the legal landscape surrounding AI is all out of breath, trying to catch up with the pace of developments in Silicon Valley. We are currently operating in an unregulated space, and feeding personal data into a cloud LLM is a legal minefield, often breaching strict frameworks like GDPR or CCPA. Italy's data protection authority temporarily banned ChatGPT in 2023 because of data processing concerns and a lack of legal basis for collecting user data.
The compliance headache gets worse with shadow AIs. Employees, desperate to work faster, are bypassing IT governance and pasting confidential business information into consumer-grade AI tools. This "shadow" usage mocks data protection agreements, and companies are clueless that their secrets are leaking through a chatbot window, leaving them vulnerable to regulatory fines and other legal ramifications from clients. By design, regulations demand consent, portability, and erasure, and cloud-based LLMs are struggling to offer all three together.
There are better ways
Look, I'm not suggesting AI users relinquish their tools and go back to Google searches and offline spreadsheets. AI is too useful to ignore, but we're currently the best guards for our own data privacy. Keeping anything remotely sensitive -- medical data, legal documents, or your deepest personal thoughts -- off the cloud is highly advisable.
For the privacy-conscious who still want the superpowers of AI, the answer is self-hosting. We are living in a golden age of open-source models like Llama 3 and Mistral. You can download these models and run them entirely offline on your own hardware, including modest desktop setups, thanks to quantization. There's a modest learning curve, but the peace of mind is worth it.
I started self-hosting LLMs and absolutely loved it
Who needs OpenAI when your home lab can do the thinking for you?
Posts 13
By Raghav Sethi