Two multi-billion dollar minds have made their way to Capitol Hill: ChatGPT and Claude.
The large language models (LLMs) were provided to the federal government this month by their respective makers, OpenAI and Anthropic, for just $1. The partnerships come amid the Trump administration's push to accelerate public-sector artificial intelligence adoption.
By offering models that cost billions to train virtually for free to the country's 15th largest workforce, OpenAI and Anthropic have secured a spot in the legislative toolbox -- a position that could prove lucrative once free trials expire and workers grow reliant on their products.
As OpenAI unveils GPT-5, which it claims reaches "PhD-level" performance, and Anthropic launches Claude Opus 4, reportedly capable of running "long tasks" for seven hours, the question becomes: What does it mean for the government to have these capabilities at its fingertips?
In the best case, LLMs could save taxpayers billions annually by streamlining operations and enhancing policy outcomes. But processing sensitive data with products owned by private companies -- and relying on imperfect, at times hallucinatory outputs for decision-making -- is risky.
"I'm a little worried that just throwing LLMs at workers, telling them, 'you can use this now, it's super cheap, and do whatever,' isn't really going to improve efficiency or effectiveness much -- and it might introduce a bunch of new risks," says Mia Hoffman, a research fellow at Georgetown's Center for Security and Emerging Technology (CSET). "Obviously, LLMs are not God-like systems; they come with a bunch of issues."
The administration's AI Action Plan, published July 23, sets out further policy recommendations, including an "AI procurement toolbox" of approved vendors for agencies to use. The plan also mandates that "all employees whose work could benefit from access to frontier language models have access to, and appropriate training for, such tools."
Under the agreements, both ChatGPT and Claude are available to agencies, with Claude's access expected to extend to the judiciary and members of Congress "pending their approval."
This will likely begin with automating back-office operations, says Lindsay Gorman, managing director and senior fellow at the German Marshall Fund's technology program -- the "less glamorous components of federal government work."
The plan also calls for some agencies to pilot using AI to "improve the delivery of services to the public." Gorman says this could mean citizen-facing AI assistants, similar to how companies use chatbots for customer service.
She also foresees reasoning models having more "application-heavy" uses, such as accelerating scientific research. OpenAI claims its o3 and o4-mini models, included in ChatGPT Enterprise, are the first that can "think with images." In theory, users can input diagrams or sketches, and the models will analyze them during their reasoning process before answering. A report from The Information claims the new models can synthesize expertise across fields like nuclear fission or pathogen detection and then suggest new experiments or ideas.
More commonly, LLMs will likely be used to analyze datasets or summarize documents.
"You could imagine a House representative or a senator compiling research on particular bills or policy ideas to help inform their work," says Gorman.
Some of this is already happening. Federal agencies publish an inventory of how they deploy AI, which listed 2,133 use cases as of January. The Department of Justice already uses ChatGPT for things like generating content, prompt-based search, and analyzing audit reports. Its January inventory included 241 AI entries -- an increase of more than 1,500% from the year before.
U.S. Immigration and Customs Enforcement (ICE) listed 19 AI use cases as of January, including the "Investigative Prioritization Aggregator," which uses machine learning to rank targets for Homeland Security Investigations, by assigning them a score. ICE claims this is particularly critical in counter-opioid and fentanyl missions, where timely intelligence is essential. One could imagine this vast dataset being uploaded to an LLM for faster, more detailed analysis.
ICE has also repeatedly accessed a national, AI-powered camera network, via local and state law enforcement agencies, without establishing a formal contract with the software provider, 404 Media reported in May.
"If you're entering queries into a model, then the companies are going to have access to those queries. So how does that get protected?" says Gorman. "What guarantees are there that if a Senate staffer inputs potentially sensitive policy information, that data will be protected -- especially from foreign actors?" She adds that many startups lack strong safeguards for government data.
LLMs also create new attack vectors like "indirect prompt injection" and more opportunities for data leaks, says Laurence Sotsky, CEO of AI tax platform Incentify. "New entry points for hackers move faster than policy," he warns.
While models don't actually work like human brains, they still reflect and sometimes amplify societal biases. For example, a paper titled Gender Bias and Stereotypes in Large Language Models found that LLMs are 3-6 times more likely to assign stereotypical occupations by gender, such as "nurse" for women and "engineer" for men.
In one real-world case, Wired revealed that in Rotterdam, the Dutch government used a welfare fraud detection algorithm that gave higher risk scores to immigrants, single parents, women, young people, and non-Dutch speakers without valid reason. This led to disproportionate investigations and benefit suspensions for marginalized groups. After audits found the system opaque and discriminatory, it was halted in 2021.
To mitigate these risks, OMB requires agencies to identify "High-Impact AI" -- systems whose outputs form the principal basis for decisions with legal or significant effects on civil rights or safety -- and conduct annual risk assessments. Perhaps unsurprisingly, high-impact cases are concentrated: The DOJ and Department of Homeland Security make up only 4% of agencies but account for 45% of these cases.
That means LLMs could, in theory, be used as the basis for consequential decisions affecting civil rights or public health. Gorman warns that fully automating certain decisions -- especially in the judicial system, where they must be appealable -- would carry "extremely high" risks.
Still, she believes the near-term priority will be working out some of the "kinks" in mission-support workflows, a lower-stakes teething period that she considers "wise."