30 Sources
[1]
OpenAI introduces Codex, its first full-fledged AI agent for coding
We've been expecting it for a while, and now it's here: OpenAI has introduced an agentic coding tool called Codex in research preview. The tool is meant to allow experienced developers to delegate rote and relatively simple programming tasks to an AI agent that will generate production-ready code and show its work along the way. Codex is a unique interface (not to be confused with the Codex CLI tool introduced by OpenAI last month) that can be reached from the side bar in the ChatGPT web app. Users enter a prompt and then click either "code" to have it begin producing code, or "ask" to have it answer questions and advise. Whenever it's given a task, that task is performed in a distinct container that is preloaded with the user's codebase and is meant to accurately reflect their development environment. To make Codex more effective, developers can include an "AGENTS.md" file in the repo with custom instructions, for example to contextualize and explain the code base or to communicate standardizations and style practices for the project -- kind of a README.md but for AI agents rather than humans. Codex is built on codex-1, a fine-tuned variation of OpenAI's o3 reasoning model that was trained using reinforcement learning on a wide range of coding tasks to analyze and generate code, and to iterate through tests along the way. OpenAI's announcement post about Codex is filled with objection handling to tackle the common refrains against AI coding agents; based on older tools and models, many developers accurately point out that LLM coding tools (especially when used for vibe coding instead of just for code completion or as an advisor) have been known to produce scripts that don't follow standards, are opaque or difficult to debug, or are insecure. The fine tuning that led to codex-1 is meant to address these concerns in part, and it's also key that Codex shows its thinking and work every step of the way as it goes through its tasks (which can take anywhere from one to 30 minutes to complete). All that said, OpenAI notes that "it still remains essential for users to manually review and validate all agent-generated code before integration and execution." Codex is available in a research preview, but it's rolling out to all ChatGPT Pro, Enterprise, and Team users now. Plus and Edu support is coming at a later date. For now, "users will have generous access at no additional cost for the coming weeks" so that they "can explore what Codex can do," but OpenAI says it intends to introduce rate limits and a new pricing scheme later.
[2]
OpenAI launches Codex, an AI coding agent, in ChatGPT
OpenAI announced on Friday it's launching a research preview of Codex, the company's most capable AI coding agent yet. Codex is powered by codex-1, a version of the company's o3 AI reasoning model optimized for software engineering tasks. OpenAI says codex-1 produces "cleaner" code than o3, adheres more precisely to instructions, and will iteratively run tests on its code until passing results are achieved. The Codex agent runs in a sandboxed, virtual computer in the cloud. By connecting with GitHub, Codex's environment can come preloaded with your code repositories. OpenAI says the AI coding agent will take anywhere from one to 30 minutes to write simple features, fix bugs, answer questions about your codebase, and run tests, among other tasks. Codex can handle multiple software engineering tasks simultaneously, says OpenAI, and it doesn't limit users from accessing their computer and browser while it's running. Codex is rolling out starting today to subscribers to ChatGPT Pro, Enterprise, and Team. OpenAI says users will have "generous access" to Codex to start, but in the coming weeks, the company will implement rate limits for the tool. Users will then have the option to purchase additional credits to use Codex, an OpenAI spokesperson tells TechCrunch. OpenAI plans to expand Codex access to ChatGPT Plus and Edu users soon. AI tools for software engineers, also known as vibe coders, have surged in popularity in recent months. The CEOs of Google and Microsoft claim that roughly 30% of their companies' code is now written by AI. In February, Anthropic released its own agentic coding tool, Claude Code, and in April, Google updated its AI coding assistant, Gemini Code Assist, with more agentic abilities. All that vibe coding has made the businesses behind AI coding platforms some of the fastest-growing in tech. Cursor, among the most popular AI coding tools, reached annualized revenue of around $300 million in April and is reportedly raising new funds at a $9 billion valuation. Now, OpenAI wants a piece of the pie. The ChatGPT maker has reportedly closed on a deal to acquire Windsurf, the developer behind another popular AI coding platform, for $3 billion. The launch of Codex shows very clearly that OpenAI is building out its own AI coding tools, in addition. Users with access to Codex can find the tool in ChatGPT's sidebar, and assign the agent new coding tasks by typing a prompt and clicking the "Code" button. Users can also ask questions about their codebase and click the "Ask" button. Below the prompting bar, users can see other tasks they've assigned Codex to do, and monitor their progress. In a briefing ahead of Codex's launch, OpenAI's Agents Research Lead, Josh Tobin, told TechCrunch the company eventually wants its AI coding agents to act as "virtual teammates," completing tasks autonomously that take human engineers "hours or even days" to accomplish. OpenAI claims it's already using Codex internally to offload repetitive tasks, scaffold new features, and draft documentation. OpenAI Product Lead Alexander Embiricos says a lot of the safety work for the company's o3 model applies to Codex as well. In a blog post, OpenAI says Codex will reliably refuse requests to develop "malicious software." Furthermore, Codex operates in an air-gapped environment, with no access to the broader internet or external APIs. This limits how dangerous Codex could be in the hands of a bad actor -- but it may also hamper its usefulness. It's worth noting that AI coding agents, much like all generative AI systems today, are prone to mistakes. A recent study from Microsoft found that industry-leading AI coding models, such as Claude 3.7 Sonnet and o3-mini, struggled to reliably debug software. However, that doesn't seem to be dampening investor excitement in these tools. OpenAI is also updating Codex CLI, the company's recently launched open-source coding agent that runs in your terminal, with a version of its o4-mini model that's optimized for software engineering. That model is now the default in Codex CLI, and will be available in OpenAI's API for $1.50 per 1M input tokens (roughly 750,000 words, more than the entire Lord of the Rings book series) and $6 per 1M output tokens. Codex's launch marks OpenAI's latest effort to beef up ChatGPT with additional products besides the notorious chatbot. In the past year, OpenAI has added priority access to the company's AI video platform, Sora, its research agent, Deep Research, as well as its web browsing agent, Operator, as benefits for subscribers. These offerings could entice more users to sign up for a ChatGPT subscription, and, in the case of Codex specifically, convince existing subscribers to pay OpenAI more money for increased rate limits.
[3]
OpenAI's Codex is part of a new cohort of agentic coding tools | TechCrunch
Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape. From GitHub's early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as an exceptionally intelligent form of autocomplete. The tools generally live in an integrated development environment, and users interact directly with the AI-generated code. The prospect of simply assigning a task and returning when it's finished is largely out of reach. But these new agentic coding tools, led by products like Devin, SWE-Agent, OpenHands, and the aforementioned OpenAI Codex, are designed to work without users ever having to see the code. The goal is to operate like the manager of an engineering team, assigning issues through workplace systems like Asana or Slack and checking in when a solution has been reached. For believers in forms of highly capable AI, it's the next logical step in a natural progression of automation taking over more and more software work. "In the beginning, people just wrote code by pressing every single keystroke," explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. "GitHub Copilot was the first product that offered real auto-complete, which is kind of stage two. You're still absolutely in the loop, but sometimes you can take a shortcut." The goal for agentic systems is to move beyond developer environments entirely, instead presenting coding agents with an issue and leaving them to resolve it on their own. "We pull things back to the management layer, where I just assign a bug report and the bot tries to fix it completely autonomously," says Lieret. It's an ambitious aim, and so far, it's proven difficult. After Devin became generally available at the end of 2024, it drew scathing criticism from YouTube pundits, as well as a more measured critique from an early client at Answer.AI. The overall impression was a familiar one for vibe-coding veterans: with so many errors, overseeing the models takes as much work as doing the task manually. (While Devin's rollout has been a bit rocky, it hasn't stopped fundraisers from recognizing the potential - in March, Devin's parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.) Even supporters of the technology caution against unsupervised vibe-coding, seeing the new coding agents as powerful elements in a human-supervised development process. "Right now, and I would say, for the foreseeable future, a human has to step in at code review time to look at the code that's been written," says Robert Brennan, the CEO of All Hands AI, which maintains OpenHands. "I've seen several people work themselves into a mess by just auto-approving every bit of code that the agent writes. It gets out of hand fast." Hallucinations are an ongoing problem as well. Brennan recalls one incident in which, when asked about an API that had been released after the OpenHands agent's training data cutoff, the agent fabricated details of an API that fit the description. All Hands AI says it's working on systems to catch these hallucinations before they can cause harm, but there isn't a simple fix. Arguably the best measure of agentic programming progress is the SWE-Bench leaderboards, where developers can test their models against a set of unresolved issues from open GitHub repositories. OpenHands currently holds the top spot on the verified leaderboard, solving 65.8% of the problem set. OpenAI claims that one of the models powering Codex, codex-1, can do better, listing a 72.1% score in its announcement - although the score came with a few caveats and hasn't been independently verified. The concern among many in the tech industry is that high benchmark scores don't necessarily translate to truly hands-off agentic coding. If agentic coders can only solve three out of every four problems, they're going to require significant oversight from human developers - particularly when tackling complex systems with multiple stages. Like most AI tools, the hope is that improvements to foundation models will come at a steady pace, eventually enabling agentic coding systems to grow into reliable developer tools. But finding ways to manage hallucinations and other reliability issues will be crucial for getting there. "I think there is a little bit of a sound barrier effect," Brennan says. "The question is, how much trust can you shift to the agents, so they take more out of your workload at the end of the day?"
[4]
OpenAI Launches an Agentic, Web-Based Vibe-Coding Tool
With vibe coding all the rage, OpenAI says Codex can take on more development chores in a safe and explainable way. OpenAI is launching a cloud-based software engineering agent called Codex in an attempt to ride a wave of hype surrounding vibe coding or building software using AI. It says this tool will let developers automate more of their work in a way that should be both safer and less opaque than existing tools. OpenAI's Codex is available through the web for ChatGPT Pro users from today. It can generate lines of code but also move through directories and run commands inside a virtual computer, automating more of the work that developers go through when writing code. "We're about to undergo a pretty seismic shift in terms of how developers can be most accelerated by agents," says Alexander Embiricos, a member of the product team at OpenAI working on agents. The latest models from rivals Anthropic and Google are already both highly skilled at coding. This OpenAI launch has pre-empted Google's expected release of a more capable coding tool at its I/O event next week, according to a report in The Information. According to numerous reports, OpenAI is in talks to acquire Windsurf (formerly Codeium), a startup that makes a popular AI coding tool, for $3billion. A key challenge with vibe-coding is that delegating to AI can result in software that is opaque and more difficult for a person to understand and fix when bugs creep in. OpenAI says the model behind Codex has been trained to explain what it is doing more clearly and help developers fix what they are building, and that the use of a virtual computer makes the system safer by design. It is already possible to write and analyze code using ChatGPT and similar chatbots. OpenAI already offers a Codex command-line tool that can generate code. The new web-based Codex, which OpenAI calls "research previous," runs its own mini computer within a browser. This allows it to run commands, explore folders and files, and test the code it has written autonomously. "That's really the way that we think most development is going to happen in the future," Embiricos says. "The agent will work on its own computer and will delegate to it." OpenAI says that Codex is being used by outside companies including Cisco, Temporal, Superhuman, and Kodiak. Vibe-coding has become a phenomenon thanks to a generation of AI models that are remarkably good at writing and fixing code. The same models allow more skilled developers to speed up their work, too. OpenAI has launched two other agentic AI tools over the past year: Operator, which controls a web browser and can automate online chores, and Deep Research, which carries out detailed web search and analysis in order to compiler reports. Josh Tobin, who leads the agents research team at OpenAI, says Codex reflects a bigger vision for ChatGPT to evolve from a chatbot into a teammate. "We think that ChatGPT will become almost like a virtual coworker," Tobin says. "Where you can go to it not just for answers to quick questions, [but also to] collaborate with it on larger chunks of work across a wide range of different tasks."
[5]
GitHub's new AI coding agent can fix bugs for you
Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO. GitHub is launching an AI coding agent that can do things like fix bugs, add features, and improve documentation -- all on a developer's behalf. The agent is embedded directly into GitHub Copilot, and it will start working once a user assigns it a task, according to an announcement at Microsoft Build. To complete its work, GitHub says the AI coding agent will automatically boot a virtual machine, clone the repository, and analyze the codebase. It also saves its changes as it works, while providing a rundown of its reasoning in session logs. When it's finished, GitHub says the agent will tag you for review. Developers can then leave comments that the agent will automatically address. Aside from GitHub, other AI companies have revealed AI coding agents of their own. Google took the wraps off Jules in December, while OpenAI showed off ChatGPT's coding agent, called Codex, last week. "The agent also incorporates context from related issue or PR (pull request) discussions and follows any custom repository instructions, allowing it to understand both the intent behind the task and the coding standards of the project," GitHub says. The new coding agent is available to Copilot Enterprise and Copilot Plus through GitHub's site, its mobile app, and the GitHub Command Line Interface tool. Microsoft also announced that it's open-sourcing GitHub Copilot in Visual Studio Code, which means developers will be able to build upon the tool's AI capabilities.
[6]
Copilot's Coding Agent brings automation deeper into GitHub workflows
Think about the relationship of Photoshop and, say, Google Photos. Photoshop can perform editing and retouching tasks on photos and graphic images. Google Photos, on the other hand, is used to view pictures and share them among friends and family. One is an editor, while the other is a cloud-based sharing tool. This distinction is important when understanding the relationship between a programming environment or IDE (interactive development environment) like VS Code, Xcode, Eclipse, or JetBrains, and the online service GitHub. Also: How to use GitHub's AI coding assistant for free - and why it's worth a try In this analogy, the programming environment (VS Code) is like Photoshop. It's where you create and modify code. GitHub is the cloud service. It's like Google Photos, in that it's where you share and collaborate with other coders. In this article, we'll talk about GitHub. It's important to realize that GitHub is used to store and track code for collaboration and code reviews. The IDE, like VS Code, is used to write, edit, and debug code. Generally, programmers and programming teams use both together for an integrated workflow where coding changes are managed and tracked in GitHub and created and modified in the IDE. Also: The best AI for coding in 2025 (including two new top picks - and what not to use) So, with that, let's talk about what GitHub does for programmers. It's most widely known as an open-source sharing resource. GitHub hosts millions of open-source projects, which are shared with both users and coding contributors. But that's only the surface of what GitHub does. GitHub is used to manage programming projects. It provides version control, which allows for carefully controlled updates, and branches, so programmers can code and test in a new direction without mucking up the mainstream code. GitHub also allows for collaboration and issue tracking. This service lets programmers work together easily, lets different programmers work on different parts of the codebase, and still lets that codebase function as an integrated whole. GitHub is a hub for documentation, automated testing, building, deployment workflows, and code reviews. It also contains built-in project management features. Also: How to move your codebase into GitHub for analysis by ChatGPT Deep Research - and why you should When I was a mere pup, before the internet, we used to have multiple interminable three-hour, 30-person meetings every week where we discussed code status, were assigned sections of code to work on, and decided how to split out work for new features. This process was incredibly costly because no actual programming got done while a team of 30-plus professionals questioned their will to live. GitHub eliminates all that complexity (except for the few companies holding such meetings out of spite). Coordination between programmers occurs seamlessly and organically, allowing vast teams to stay on track without sacrificing hours and sanity to universally despised group meetings. Now that the non-programmers reading this understand where GitHub fits in the software development ecosystem, let's discuss Microsoft's announcements. Microsoft has announced that GitHub Copilot, its coding assistant for GitHub work, is adding agent capability. Programming, as it turns out, is a lot more than just programming. Creating and managing a piece of software is more than just typing in the syntax of a programming language to produce blocks of code and algorithms. Also: This GitHub trick lets ChatGPT dissect your code in minutes - here's how The code-creating lifecycle involves managing changes, making coding alterations that ripple throughout an entire codebase, coordinating work amongst team members, packaging up all the components for testing or distribution, and other management-like activities. Until now, most of the coding assistants we've seen help programmers code while they are writing code, suggesting fixes or lines of code during the creative process. GitHub Copilot's new Coding Agent moves from being what has essentially been a code suggestion tool to an autonomous coding assistant that helps manage the coding process. Also: I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work This kind of work is often an entry path for newer programmers. They get to know the codebase and project production practices, while the more senior developers focus on design and valuable code creation. I, therefore, found it telling how Microsoft describes the sort of work Coding Agent is best suited for. Redmond says, "The agent excels at low-to-medium complexity tasks in well-tested codebases." In other words, keep it to the safe if slightly tedious work, and let your experienced developers handle the wacky stuff. Yes, it does. But here's the thing: GitHub Copilot Agent Mode differs from GitHub Copilot Coding Agent. Sigh, you've gotta love Microsoft and its naming conventions. Agent Mode is a feature that enables your editing environment to function more like a chatbot. From within the IDE, you can instruct Agent Mode to perform tasks at the code creation level. That could be anything from spinning up a new user interface form, to asking it questions about code functionality, to suggesting terminal commands for testing. Fundamentally, Agent Mode lives in your development environment on your computer. The new Coding Agent being announced now lives in the cloud, in GitHub. If Agent Mode does programmery-like things, Coding Agent does GitHub things. Also: You can try Microsoft's free AI skills training for two more weeks, and I recommend you do Coding Agent will fire up a virtual environment using GitHub Actions (scripts that control GitHub). Coding Agent is designed to work on its own in the background, perform automatic code generation and modification, and then integrate with GitHub's continuous integration, continuous deployment, and review processes. Using our photo analogy at the beginning of this article, GitHub Agent Mode would be like using AI to mask out an image in Photoshop, while Coding Agent would be like using AI to find all pictures of your dog and put them in an album. Both are AI and related to pictures, but they occur in different environments and accomplish different but related tasks. Microsoft was careful to explain that GitHub Copilot Coding Agent would not change the codebase on its own. The Washington State-based behemoth says the agent "Is designed from the ground up to keep your project secure and ensures that its work gets a review before it goes to production." Also: I put GitHub Copilot's AI to the test - its mixed success at coding baffled me Specifically, the company announced four key policies the AI agent is designed to follow: Given Microsoft is essentially eating its own dogfood, you can see why these restrictions have been put in. It would not be good if some AI decided to randomly add or remove features from Microsoft 365 or the Azure management environment. Such unchecked changes could be baaad. GitHub Copilot Coding Agent will be a big help to programmers, especially folks who don't have junior team members around to pick up the tedious tasks. However, I am concerned that features like Coding Agent will reduce the need for junior team members overall. Just last week, Microsoft laid off 6,000 employees, many of whom are programmers and engineers. It's not like the company is hurting for cash. In Microsoft's FY25 Q3 financials, reported at the end of April, the company disclosed revenue of $70bn (up 13% from the same calendar quarter the year before). Net income (the money the company has left over after expenses) was $25.8bn, an increase of 18% from the same quarter in 2024. Also: How I used GitHub Spark to build an app with just a one-sentence AI prompt To deconstruct those figures, the company is making a billion and a half dollars more per month in profit than it did during the same quarter a year ago. An extra billion and a half. More. Per month. Yet last month, TechCrunch reported Microsoft CEO Satya Nadella said that 20-30% of the company's code was written by AI. Microsoft CTO Kevin Scott previously said he expects 95% of all software code to be written by AIs by 2030. Tools like Coding Agent are just the sort of AI-based productivity tool that can lead to layoffs. But even that's not my biggest concern. My biggest concern is that AIs at companies like Microsoft are picking up the entry and mid-level tasks that have always been the training grounds for incoming programmers. If those jobs are no longer there, it will be hard for new blood to train up to take on some of the harder and more challenging work later in their careers. This, in turn, could lead to a shortage of just the sort of trained-up talent we're going to need when the AIs reach sentience and decide not to work, or try to kill us all in our sleep. In all seriousness, you don't get seasoned professionals with wide-ranging experience if you remove the experience-getting seasoning years from everyone's career path. Tools like Coding Agent are exciting, but the potential effects are... chilling. The new Coding Agent capabilities are available to Copilot Enterprise and Copilot Pro+ (about $400/year) customers. Switching back to our discussion of development environments, Microsoft has announced it is open-sourcing GitHub Copilot in VS Code. The company said, "The AI-powered capabilities from the GitHub Copilot extensions will now be part of the same open-source repository that drives the world's most popular development tool." I take this comment to mean Microsoft is open-sourcing the plugin, not the AI itself. Even so, the ability to see how the plugin works, and the opportunity for the open source community to modify, fork, or change-up the features is a good thing. Microsoft gets a kudo for this bit of added transparency. What do you think about the direction GitHub is taking with Copilot Coding Agent? Do you see it as a powerful tool to streamline workflows or a potential threat to early-career developer opportunities? Have you tried assigning tasks to Copilot yet, or do you prefer hands-on coding for everything? What kinds of tasks would you trust an autonomous agent with in your projects? Let us know in the comments below.
[7]
OpenAI upgrades ChatGPT with Codex - and I'm seriously impressed (so far)
OpenAI's new Codex agent is essentially a vibe-coding environment based on a ChatGPT-like comment interface. As much as the vibe-coding idea seems like a meme for wannabe cool-kid coders, the new Codex agent is impressive as heck. Also: What is AI vibe coding? It's all the rage but it's not for everyone - here's why OpenAI described Codex as a research preview still under active development. Right now, it's available to Pro, Enterprise, and Team-tier ChatGPT users, but it's expected to release to Plus and Edu users "soon." According to the recording of OpenAI's announcement livestream, the Codex name has been applied to an evolving coding tool since as far back as 2021. That said, when I refer to Codex in this article, I'm talking about the new version being announced now. I haven't had the opportunity to get hands-on with Codex yet, so I'm taking everything I'm sharing with you from information provided by OpenAI. When I watched the announcement, I noticed that even the engineers seemed a little shocked at how capable this tool is. Codex lives on OpenAI's servers and interacts with your GitHub repositories. If the demo is to be believed (and OpenAI has repeatedly proven that unbelievable demos are real), Codex basically acts like another programmer on your team. Also: 10 professional developers on vibe coding's true promise and peril You can tell it to fix a series of bugs, and it will go off and do just that. It asks you to approve coding changes, although it looks like it can also just go ahead and modify code. You can ask it to analyze and modify code, look for specific problems, identify problem areas and room for improvement, and other coding and maintenance tasks. Each assignment spawns off a new virtual environment where the AI can go all the way from concept and design to unit testing. There is a real coding mindset change going on here. Earlier AI coding help took the form of auto-complete. Lines and even blocks of code were automatically generated based on existing code. Then we got to the point where small segments of code could be written or debugged by the AI. This is the area I've been focusing on in terms of the ZDNET programming tests. Another AI role is analysis of the overall system. Last week, I showed a remarkable new Deep Research tool that can deconstruct entire codebases and provide code reviews and recommendations. Now, with Codex, we're getting to the point where entire programming tasks can be delegated to the AI in the cloud, in much the same way those tasks were given to other programmers on a team or to junior programmers learning their way through code maintenance. OpenAI calls this "Agent-native software development, where AI not only assists you as you work but takes on work independently." The launch video demonstrated the ability of Codex to take on a variety of tasks at once, each running in its own isolated virtual environment. Programmers assigned tasks to the agent, which went off and did the work without supervision. When the work was complete, the agent returned with test results and recommended code changes. The demo showcased the Codex agent performing bug fixes, doing a scan for typos, making task suggestions, and performing project-wide refactoring (modifying code to improve structure without changing behavior). Senior developers and designers are no strangers to articulating requirements and reviewing others' work. Using Codex won't be much of a change for them. But for developers who haven't yet developed good requirements-articulation and review skills, properly managing Codex may prove to be a bit of a challenge. Yet, if the tool performs as the demo appears to indicate it can, Codex will enable smaller teams and individual developers to accomplish more, reduce repetitive work, and be more responsive to problem reports. One of the problems I found early on with ChatGPT's coding was that it had a tendency to lose the thread or go off in its own direction. For individual blocks of code, that's annoying but not catastrophic. But if a coding agent is allowed to run fairly unsupervised, such stubborn refusal to follow directions could cause unintended and problematic consequences. Also: The best AI for coding in 2025 (including two new top picks - and what not to use) To help mitigate this, OpenAI has trained Codex to follow directions specified in an AGENTS.md file. This file in the repository allows programmers and teams to steer Codex's behavior. It can contain instructions on naming conventions, formatting rules, and any other set of consistent guidelines desired in the coding process. It's essentially an extension of the ChatGPT personalization settings, but for a repository-centric team environment. OpenAI has also introduced a version of Codex called Codex CLI that runs locally on a developer's machine. Unlike the cloud-based Codex, which runs asynchronously and reports back on completion, the local version operates on the programmer's command line and is synchronous. In other words, the programmer types out an instruction and waits for the Codex CLI process to return a result. This allows a programmer to work offline with the local context of the active development machine. The demo was impressive, but during the launch video, the developers were very clear that what they were showing off and releasing is a research prototype. While it offers what they called "magical moments," it still has a long way to go. Also: I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work I've been trying to dig in and triangulate on what exactly this technology means for the future of development and for my development process specifically. My main product is an open-source WordPress plugin, which itself has proprietary add-on plugins. Clearly, Codex could work itself through the public repository for the open-source core plugin. But could Codex manage the relationship between one public and multiple private repositories as part of one overall project? And how would it do when testing involves not only my code but also spinning up an entire additional ecosystem -- WordPress -- to evaluate performance? As a solo programmer, I definitely see the advantages of something like Codex. Even the $200-per-mnth Pro subscription makes sense. Hiring a helper programmer would cost a whole lot more per month than that fee, assuming I were to achieve tangible monetizable value out of it. As a long-time team manager and professional communicator, I feel very comfortable delegating to something like Codex. It's not all that different chatting with an agent than it is chatting with a team member over Slack, for example. Also: How to turn ChatGPT into your AI coding power tool - and double your output The fact that Codex will make recommendations, draft versions, and wait for me to approve the results makes me feel a bit safer than merely letting it run loose in my code. It does open a very interesting door for a new development lifecycle, where the human sets goals, the AI drafts possible implementations, and then the human goes back in and either approves or redirects the AI for another cycle. Based on my earlier experiences using AIs for coding, it's clear that Codex could reduce maintenance time and get fixes out to users faster. It's not quite as clear how Codex would perform adding new features based on a specifications document. It's also not clear how much more or less difficult it would be to go into the code after Codex has worked on it to tweak functionality and performance. It's interesting that AI coding is evolving across companies at about the same pace. I'm dropping another article soon on GitHub Copilot's Coding Agent, which does some of the same things that Codex does. In that article, I expressed some concern that these coding agents will replace junior and entry-level programmers. Beyond concern for human jobs, there's also the question of what critical training opportunities will be lost if we delegate a middle phase of a developer's career to the AI. There's a song in Disney's Frozen II called "Into the Unknown," performed by Idina Menzel. The song centers on the main character's internal conflict between maintaining the status quo and her familiar life, and venturing out "into the unknown." With agentic software development, more than just AI coding, the entire software industry is going into the unknown. The more we rely on AI-based systems to build our software for us, the fewer skilled maintainers there will be. That's fine as long as the AIs continue to perform and be available. But are we letting some key skills atrophy, letting some good-paying jobs go, for the convenience of delegating to a not-yet-sentient cloud-based infrastructure? Also: 10 professional developers on vibe coding's true promise and peril Only time will tell, and hopefully we won't experience that telling when we're out of time. Do you see yourself delegating real development tasks to a tool like this? What do you think the long-term impact will be on software teams or solo developers? And do you worry about losing critical skills or roles as more of the code lifecycle is handed off to AI? Let us know in the comments below.
[8]
OpenAI Takes on Google, Anthropic With New AI Agent for Coders
OpenAI is rolling out a new artificial intelligence agent for ChatGPT users that's designed to help streamline software development as the company pushes into a crowded market of startups and large tech firms offering AI tools for coders. The agent, called Codex, will be able to write software features, fix bugs and run tests, the company said in a blog post Friday. Codex, which is still in the early stages and has limited functionality, is geared towards workers with some technical knowledge and will first be released as a "research preview" to paid ChatGPT Pro, Enterprise and Team users.
[9]
GitHub Copilot angles for promotion from assistant to agent
Build Microsoft's GitHub Copilot can now act as a coding agent, capable of implementing tasks or addressing posted issues within the code hosting site. What distinguishes a coding agent from an AI assistant is that it can iterate over its own output, possibly correcting errors, and can infer tasks that have not been specified to complete a prompted task. But wait, further clarification is required. Having evidently inherited Microsoft's penchant for confusing names, the GitHub Copilot coding agent is not the same thing as the GitHub Copilot agent mode, which debuted in February. Agent mode refers to synchronous (real-time) collaboration. You set a goal and the AI helps you get there. The coding agent is for asynchronous work - you delegate tasks, the coding agent then sets off on its own to do them while you do other things. "Embedded directly into GitHub, the agent starts its work when you assign a GitHub issue to Copilot," said Thomas Dohmke, GitHub CEO, in a blog post provided to The Register ahead of the feature launch, to coincide with this year's Microsoft Build conference. "The agent spins up a secure and fully customizable development environment powered by GitHub Actions. As the agent works, it pushes commits to a draft pull request, and you can track it every step of the way through the agent session logs." Basically, once given a command, the agent uses GitHub Actions to boot a virtual machine. It then clones the relevant repository, sets up the development environment, scours the codebase, and pushes changes to a draft pull request. And this process can be traced in session log records. Available to Copilot Enterprise and Copilot Pro+ users, Dohmke insists that agents do not weaken organizational security posture because existing policies still apply and agent-authored pull requests still require human approval before they're run. By default, the agent can only push code to branches it has created. As a further backstop, the developer who asked the agent to open a pull request is not allowed to approve it. The agent's internet access is limited to predefined trusted destinations and GitHub Actions workflows require approval before they will run. With GitHub as its jurisdiction, Copilot's agent interactions can be used to automate various development-related site interactions via github.com, in GitHub Mobile, or through the GitHub CLI. But the agent can also be configured to work with MCP (model context protocol) servers in order to connect to external resources. And it can respond to input beyond text, thanks to vision capabilities in the underlying AI models. So it can interpret screenshots of desired design patterns, for example. "With its autonomous coding agent, GitHub is looking to shift Copilot from an in-editor assistant to a genuine collaborator in the development process," said Kate Holterhoff, senior analyst at RedMonk, in a statement provided by GitHub. "This evolution aims to enable teams to delegate implementation tasks and thereby achieve a more efficient allocation of developer resources across the software lifecycle." GitHub claims it has used the Copilot code agent in its own operations to handle maintenance tasks, freeing its billing team to pursue features that add value. The biz also says the Copilot agent reduced the amount of time required to get engineers up to speed with its AI models. GitHub found various people to say nice things about the Copilot agent. We'll leave it at that. ®
[10]
ChatGPT rolls out Codex, an AI tool for software programming
OpenAI is rolling out 'Codex' for ChatGPT, which is an AI agent that automates and delegates programming tasks for software engineers. OpenAI isn't explicitly claiming that Codex will eventually replace junior software engineers. Instead, the company states Codex could help developers achieve more by delegating their tasks to different agents. The idea is to move faster with development and become more productive with AI, but how does Codex work? According to OpenAI, Codex is based on codex-1, which is a new version of ChatGPT based on the existing o3 model, but it has been optimized for coding, which results in increased accuracy. Codex pulls codebase from Github and closely mirrors the existing PR style. It can write new code, propose pull requests, and run each task in its own sandbox. "Task completion typically takes between 1 and 30 minutes, depending on complexity, and you can monitor Codex's progress in real time," OpenAI noted in a blog post. "Once Codex completes a task, it commits its changes in its environment. Codex provides verifiable evidence of its actions through citations of terminal logs and test outputs, allowing you to trace each step taken during task completion." Codex is rolling out, but only if you've a Pro subscription, which costs $200 per month.
[11]
Microsoft introduces GitHub AI agent that can code for you
Microsoft CEO Satya Nadella speaks at an event commemorating the 50th anniversary of the company at Microsoft headquarters in Redmond, Washington, on , April 4, 2025. Microsoft's GitHub unit on Monday introduced a Copilot artificial intelligence agent that can take on specific programming work and inform people once it has finished. From there, developers can check the agent's work from GitHub, a widely used repository for code. They can request modifications and then allow GitHub to add the source code to existing files. The launch, announced at Microsoft's Build developer conference in Seattle, shows that the technology company wants to make AI a more natural part of the process of enhancing software. The coding agent might help Microsoft distinguish its developer tools from alternatives from companies such as Atlassian and GitLab. "Using state-of-the-art models, the agent excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring code, and improving documentation," Thomas Dohmke, CEO of GitHub, wrote in a blog post. Anthropic's Claude 3.7 Sonnet AI model powers the coding agent, a GitHub spokesperson said.
[12]
OpenAI launches Codex, a new AI coding agent for software development
Bottom line: Codex aims to streamline routine programming work and set a new standard for AI-driven software engineering. As the technology matures, OpenAI envisions Codex and similar agents playing an increasingly central role in the future of software development. OpenAI has introduced Codex, a new AI-powered coding agent now available as a research preview to select ChatGPT subscribers. This launch marks a significant milestone for the company, signaling its ambition to redefine how software engineers interact with artificial intelligence in their daily workflows. Codex is built on a specialized version of OpenAI's o3 reasoning model, known as codex-1, which has been fine-tuned specifically for software engineering. This model was trained using reinforcement learning on various coding tasks, enabling it to generate cleaner, more reliable code that closely follows user instructions. Codex provides citations of terminal logs and test outputs, allowing users to trace each step during task completion. Unlike earlier models, codex-1 iteratively tests its output, ensuring its code passes necessary checks before presenting it to the user. This approach addresses longstanding concerns about AI-generated code quality, security, and transparency. The agent operates within a cloud-based sandbox that mirrors the user's development environment. By connecting to GitHub, Codex can preload a user's code repositories, allowing it to write new features, fix bugs, answer questions about the codebase, and run tests. Each task is performed in a distinct, isolated container, where the agent logs its actions, cites test results, and summarizes changes for easy review. Depending on their complexity, tasks can range from a minute to half an hour, and Codex is capable of handling multiple assignments simultaneously without interrupting the user's workflow. On coding evaluations and internal benchmarks, codex-1 shows strong performance even without AGENTS.md files or custom scaffolding. To make Codex more effective and adaptable to individual projects, developers can include an "AGENTS.md" file in their repositories. This file guides the AI, outlining project context, coding standards, and stylistic conventions - much like a README, but tailored for an AI agent. Codex is also designed to infer coding style from the codebase. Safety and security are part of Codex's design. The agent operates in an air-gapped environment, cut off from the broader internet and external APIs. This isolation minimizes the risk of misuse, such as the development of malicious software or unauthorized access to sensitive data. OpenAI has also implemented advanced monitoring systems that detect and flag potentially harmful requests in real time. Codex is programmed to refuse requests to develop malware or engage in other unethical activities. Despite these safeguards, OpenAI emphasizes that users must manually review and validate all AI-generated code before integrating it into production, as generative AI systems remain prone to errors. The Codex agent communicates with the user when uncertain or faced with test failures. Codex's research preview is currently available to ChatGPT Pro, Enterprise, and Team subscribers, with plans to expand access to ChatGPT Plus and Edu users shortly. Users will have access to the tool at no additional cost during the initial rollout; however, OpenAI intends to introduce rate limits and a paid credit system as demand increases. The release of Codex comes amid a surge in demand for AI-powered coding assistants, often referred to as "vibe coders." The market for these tools is rapidly expanding, with competitors like Anthropic and Google releasing or updating their agentic coding products. OpenAI, as another example, recently acquired Windsurf, a major player in the space, for $3 billion, underscoring the high stakes in this rapidly growing sector. Compared to OpenAI o3, codex-1 produces cleaner patches ready for human review and integration into standard workflows. According to OpenAI, major companies have already evaluated and adopted Codex. Cisco is testing the tool to speed up engineering workflows, while Superhuman uses Codex to improve test coverage and enable non-engineers to contribute code changes. Kodiak, an autonomous vehicle company, leverages Codex to enhance code reliability and gain insights into complex software stacks. Temporal uses it for background tasks like debugging and test writing. While Codex represents a leap forward from its predecessor, the original Codex model that powered GitHub Copilot, OpenAI acknowledges the tool's current limitations. The agent does not yet support image inputs for frontend development, and users cannot intervene while a task is running. Delegating work to the remote agent can take longer than local, interactive editing, but OpenAI anticipates that future versions will enable more complex, asynchronous collaboration, with agents capable of handling extended, multifaceted tasks.
[13]
OpenAI launches research preview of Codex AI software engineering agent for developers -- with parallel tasking
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Surprise! Just days after reports emerged suggesting OpenAI was buying white-hot coding startup Windsurf, the former company appears to be launching its own competitor service as a research preview under its brand name Codex, going head-to-head against Windsurf, Cursor, and the growing list of AI coding tools offered by startups and large tech companies including Microsoft and Amazon. Unlike OpenAI's previous Codex code completion AI model, the new version is a full cloud-based AI software engineering (SWE) agent that can execute multiple development tasks in parallel. Starting today it will be available for ChatGPT Pro, Enterprise, and Team users, with support for Plus and Edu users expected soon. Codex's evolution: from model to autonomous AI coding agent This release marks a significant step forward in Codex's development. The original Codex debuted in 2021 as a model for translating natural language into code available through OpenAI's nascent application programming interface. It was the engine behind GitHub Copilot, the popular autocomplete-style coding assistant designed to work within IDEs like Visual Studio Code. That initial iteration focused on code generation and completion, trained on billions of lines of public source code. However, the early version came with limitations. It was prone to syntactic errors, insecure code suggestions, and biases embedded in its training data. Codex occasionally proposed superficially correct code that failed functionally, and in some cases, made problematic associations based on prompts. Despite those flaws, it showed enough promise to establish AI coding tools as a rapidly growing product category. That original model has since been deprecated and turned into the name of a new suite of products, according to an OpenAI spokesperson. GitHub Copilot officially transitioned off OpenAI's Codex model in March 2023, adopting GPT-4 as part of its Copilot X upgrade to enable deeper IDE integration, chat capabilities, and more context-aware code suggestions. Agentic visions The new Codex goes far beyond its predecessor. Now built to act autonomously over longer durations, Codex can write features, fix bugs, answer codebase-specific questions, run tests, and propose pull requests -- each task running in a secure, isolated cloud sandbox. The design reflects OpenAI's broader ambition to move beyond quick answers and into collaborative work. Josh Tobin, who leads the Agents Research Team at OpenAI, said during a recent briefing: "We think of agents as AI systems that can operate on your behalf for a longer period of time to accomplish big chunks of work by interacting with the real world." Codex fits squarely into this definition. "Our vision is that ChatGPT will become almost like a virtual coworker -- not just answering quick questions, but collaborating on substantial work across a range of tasks," he added. New capabilities, new interface, new workflows Codex tasks are initiated through a sidebar interface in ChatGPT, allowing users to prompt the agent with tasks or questions. The agent processes each request in an air-gapped environment loaded with the user's repository and configured to mirror the development setup. It logs its actions, cites test outputs, and summarizes changes -- making its work traceable and reviewable. Alexander Embiricos, head of OpenAI's Desktop & Agents team (and the former CEO and co-founder of screenshare collaboration startup Multi that OpenAI acquired for an undisclosed sum last year) said in a briefing with journalists that "the Codex agent is a cloud-based software engineering agent that can work on many tasks in parallel, with its own computer to run safely and independently." Internally, he said, engineers already use it "like a morning to-do list -- fire off tasks to Codex and return to a batch of draft solutions ready to review or merge." Codex also supports configuration through AGENTS.md files -- project-level guides that teach the agent how to navigate a codebase, run specific tests, and follow house coding styles. "We trained our model to read code and infer style -- like whether or not to use an Oxford comma -- because code style matters as much as correctness," Embiricos said. Security and practical use Codex executes tasks without internet access, drawing only on user-provided code and dependencies. This design ensures secure operation and minimizes potential misuse. "This is more than just a model API," said Embiricos. "Because it runs in an air-gapped environment with human review, we can give the model a lot more freedom safely." OpenAI also reports early external use cases. Cisco is evaluating Codex for accelerating engineering work across its product lines. Temporal uses it to run background tasks like debugging and test writing. Superhuman leverages Codex to improve test coverage and enable non-engineers to suggest lightweight code changes. Kodiak, an autonomous vehicle firm, applies it to improve code reliability and gain insights into unfamiliar stack components. OpenAI is also rolling out updates to Codex CLI, its lightweight terminal agent for local development. The CLI now uses a smaller model -- codex-mini-latest -- optimized for low-latency editing and Q&A. The pricing is set at $1.50 per million input tokens and $6 per million output tokens, with a 75% caching discount. Codex is currently free to use during the rollout period, with rate limits and on-demand pricing options planned. Does this mean OpenAI IS NOT buying Windsurf? *Thinking face emoji* The release of Codex comes amid increased competition in the AI coding tools space -- and signals that OpenAI is intent on building, rather than buying, its next phase of products. According to recent data from SimilarWeb, traffic to developer-focused AI tools has surged by 75% over the past 12 weeks, underscoring the growing demand for coding assistants as essential infrastructure rather than experimental add-ons. Reports from TechCrunch and Bloomberg suggest OpenAI held acquisition talks with fast-growing AI dev tool startups Cursor and Windsurf. Cursor allegedly walked away from the table; Windsurf reportedly agreed in principle to be acquired by OpenAI for a price of $3 billion, though no deal has been officially confirmed by either OpenAI or Windsurf. Just yesterday, in fact, Windsurf debuted its own family of coding-focused foundation models, SWE-1, purpose-built to support the full software engineering lifecycle, from debugging to long-running project maintenance. SWE-1 models were reported custom made, trained entirely in-house using a new sequential data model tailored to real-world development workflows. Many things may be happening behind the scenes between the two companies, but to me, the timing of Windsurf launching its own coding foundation model -- instead of its strategy to-date of using Llama variants and giving users the option to slot in OpenAI and Anthropic models -- followed one day later by OpenAI releasing its own Windsurf competitor, seems to suggest the two are not aligning soon. But on the other hand, the fact that this new Codex AI SWE agent is in "research preview" to start may be a form of OpenAI pressuring Windsurf or Cursor or anyone else to come to the bargaining table and strike a deal. Asked about the potential for a Windsurf acquisition and reports of one thereof, an OpenAI spokesperson told VentureBeat they had nothing to share on that front. In either case, Embiricos frames Codex as far more than a mere code tool or assistant. "We're about to undergo a seismic shift in how developers work with agents -- not just pairing with them in real time, but fully delegating tasks," he said. "The first experiments were just reasoning models with terminal access. The experience was magical -- they started doing things for us." Built for dev teams, not merely solo devs Codex is designed with professional developers in mind, but Embiricos noted that even product managers have found it helpful for suggesting or validating changes before pulling in human SWEs. This versatility reflects OpenAI's strategy of building tools that augment productivity across technical teams. Trini, an engineering lead on the project, summarized the broader ambition behind Codex: "This is a transformative change in how software engineers interface with AI and computers in general. It amplifies each person's potential." OpenAI envisions Codex as the centerpiece of a new development workflow where engineers assign high-level tasks to agents and collaborate with them asynchronously. The company is building toward deeper integrations across GitHub, ChatGPT Desktop, issue trackers, and CI systems. The long-term goal is to blend real-time pairing and long-horizon task delegation into a seamless development experience. As Josh Tobin put it, "Coding underpins so many useful things across the economy. Accelerating coding is a particularly high-leverage way to distribute the benefits of AI to humanity, including ourselves." Whether or not OpenAI closes deals for competitors, the message is clear: Codex is here, and OpenAI is betting on its own agents to lead the next chapter in developer productivity.
[14]
GitHub Copilot evolves into autonomous agent with asynchronous code testing
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft's hit AI programming tool GitHub Copilot wants to move away from simply helping people complete code and as of today, will allow for users to set up asynchronous code testing. The move brings GitHub Copilot to work more autonomously for developers, keeping the app competitive as the AI coding assistant space has become more crowded with AI-powered tools, including Microsoft investment OpenAI's rival Codex software engineering agent released Friday. GitHub Copilot Agent, first announced as Project Padawan back in February, will check, test and iterate code. When invoked, Copilot Agent can navigate the repo, edit files, run commands and open pull requests. Mario Rodriguez, chief product officer at GitHub, told VentureBeat that GitHub Copilot Agent could open developers up to focus on other tasks while ensuring any previous code they wrote works. "I could go into an issue, and before, I needed to go back into my IDE, clone that repo, open the issue to try and figure it out, et cetera, et cetera," Rodriguez said. "Now I can just assign it to Copilot and it's right there along with my other peers." He added that the Copilot Agent embeds into GitHub and follows the user's style, and that the human developer can monitor it because the agent logs its reasoning and validation steps. A developer can assign the issue to the agent as much as they would for human coworkers. The agent will then respond with the eyes emoji to indicate it will begin resolving the problem. The agent taps GitHub Actions to boot up a virtual machine, then clones the repository. It decides its workflow, analyzes the codebase using GitHub's RAG code search, and continuously updates the pull request. Once it's done, the agent will tag the user for review. The agent considers context from previous pull request discussions and follows any custom repo instructions. Changing coding space GitHub was one of the first to launch coding assistants to help developers start to generate code faster. Over time, more and more coding assistants have come out, and code generation and review have become an expected service of AI platforms. GitHub Copilot now has to compete not only with ChatGPT, Gemini and Claude's coding abilities but also with Google's Code Assist and OpenAI's Codex. But as AI-generated code becomes more accepted, especially with the growth of vibe coding, coding services like GitHub Copilot have to evolve beyond completing code. Making Copilot more agentic makes coding help more autonomous, going away from the human prompting Copilot at most steps to letting it do its own work. At the same time, the developer focuses on something else. "So before you had code completion, which you always have be there, and your productivity is not going to increase as much because you are pressing every single keystroke being made," Rodriguez said. "It's an agentic experience; it's completely asynchronous to you. You could be doing one task, and Copilot could be executing on five others, and that's really the value at the end." Rodriguez said this opens up more asynchronous capabilities for GitHub. MCP support to keep code working Something else that's new for GitHub is support for MCP, so the Copilot Agent can communicate and get additional data for any projects it is reviewing. MCP or Model Context Protocol, the fast-rising agentic interoperability platform from Anthropic, standardizes more than agentic communication, but offers data transfer interoperability as well. If the agent realizes the issue is missing important context or data, for example, a broken photo in the code, the agent can invoke the MCP server to retrieve the information from the data source's MCP server. Rodriguez said that GitHub Copilot Agent, like its previous name Padawan, learns and assists developers to free them up to work on their ideas without focusing so much on maintaining code. "If you believe that software powers everything in the world right now, that the next big invention is gonna be powered by software, then what you want to be doing is giving these developers the best tools on the planet. Copilot can work on the other projects, and then I could work on the one that is the creativity that needs me as a human, as a creative," he said.
[15]
AI agents will do programmers' grunt work
Driving the news: Microsoft Monday announced a new AI coding agent for Github Copilot that's good for "time-consuming but boring tasks." The intrigue: Tech leaders have sent mixed messages on just how much work they see ahead for programmers. Why it matters: Business transformations that start in Silicon Valley usually make their way into the wider economy. Coding agents, like other generative AI tools, continue to "hallucinate," or make stuff up. Yes, but: AI-generated code likely also contains tons of other errors that don't show up today. Zoom out: The software industry's assumption that what works inside tech will work everywhere else could be sorely tested when these techniques get pushed out beyond Silicon Valley. Between the lines: Nobody doubts that AI means tech firms will write more code using fewer employees. But no one yet knows exactly where these companies will continue to find competitive advantage. What's next: As coding agents shoulder routine labor, product designers and creative engineers will use "vibe coding" -- improvisational rough drafting via "throw it at the wall and see what works" AI prompting -- to do fast prototyping of new ideas. The bottom line: The biggest challenges in creating software tend to arise from poorly conceived specifications and misinterpretations of data, both of which are often rooted in confusion over human needs.
[16]
Codex, OpenAI's New Coding Agent, Wants to Be a World-Killer
Though artificial intelligence is taking the world by storm, it's still pretty bad at tasks demanding a high-degree of flexibility, like writing computer code. Earlier this year, ChatGPT maker OpenAI published a white paper taking AI to task for its lackluster performance in a coding scrum. Among other things, it found that even the most advanced AI models are "still unable to solve the majority" of coding tasks. Later in an interview, OpenAI CEO Sam Altman said that these models are "on the precipice of being incredible at software engineering," adding that "software engineering by the end of 2025 looks very different than software engineering at the beginning of 2025." It was a bold prediction without much substance to back it -- if anything, generative AI like the kind Altman pedals has only gotten worse at coding as hallucination rates increase with each new iteration. Now we know what he was playing at. Early on Friday, OpenAI revealed a preview of Codex, the company's stab at a specialty coding "agent" -- a fluffy industry term that seems to change definitions depending on which company is trying to sell one to you. "Codex is a cloud-based software engineering agent that can work on many tasks in parallel," the company's research preview reads. The new tool will seemingly help software engineers by writing new features, debugging existing code, and answering questions about source code, among other tasks. Contrary to ChatGPT's everything-in-a-box model, which is geared toward the mass market, Codex has been trained to "generate code that closely mirrors human style and PR preferences." That's a charitable way to say "steal other people's code" -- an AI training tactic OpenAI has been sued for in the not-too-distant past, when it helped Microsoft's Copilot go to town on open-source and copyrighted code shared on GitHub. Thanks in large part to a technicality, OpenAI, GitHub, and Microsoft came out of that legal scuffle pretty much unscathed, giving OpenAI some convenient legal armor should it choose to go it alone with its own in-house model trained on GitHub code. In the Codex release, OpenAI claims its coding agent operates entirely in the cloud, cut off from the internet, meaning it can't scour the web for data like ChatGPT. Instead, OpenAI "limits the agent's interaction solely to the code explicitly provided via GitHub repositories and pre-installed dependencies configured by the user via a setup script."
[17]
OpenAI's New Codex Agents Get Closer to Downsizing Your Dev Team - Decrypt
OpenAI just unveiled the latest iteration of Codex -- a system of cloud-powered AI agents that can tackle multiple programming tasks simultaneously without tying up your laptop's resources. The announcement comes just one month after OpenAI released its free, open-source version, "Codex CLI." Unlike the normal code-completion or code-generation tools we've become accustomed to during vibe coding sessions -- in which we iterate with an AI chatbot several times until the results are satisfactory -- Codex operates as a semi-autonomous agent with its own computing resources. It's capable of handling everything from bug hunting to complex refactoring. "Software engineering is changing, and by the end of 2025 it's going to look fundamentally different," OpenAI President Greg Brockman said in an official presentation. "Today, we're going to take a step towards where we think software engineering is going, and we are releasing a new system, which is a remote software agent that can run many tasks in parallel." The demo showed how OpenAI developers deployed multiple AI agents working in parallel on different portions of a codebase, each in its own isolated environment, doing different tasks in parallel with minimal human guidance. "This change probably would have taken me at least 30 minutes or even hours to debug," Hansen, one team member, noted during the presentation, while the AI completed the task in the background during their conversation. "We find Codex to be as trustworthy, if not more trustworthy, than our own co-workers," Katie, another OpenAI researcher, claimed. Unlike the previous CLI version, this new Codex runs on OpenAI's infrastructure instead of locally. Its agentic architecture also means the AI can review and improve itself. In other words, developers can fire off multiple coding tasks simultaneously and return later to review the results, because an agent is able to review its own outputs -- and fix its own bugs. The system is powered by a new model called Codex One, which OpenAI described as its "best coding model to date." "We've taken o3 and we've optimized it for not just the benchmarks, but really for the kind of code that people actually want to merge into their codebase," Brockman said. OpenAI emphasized that Codex is still a "research preview," and said there's more development ahead. But you'll have to pay pay for OpenAI's premium tier you want to test it. Initially, Codex is available for ChatGPT Pro, Enterprise, and Teams users, with plans to expand to Plus and Edu users later in the future. OpenAI said it is starting with "very generous rate limits" and no additional pricing, though this will change as it gathers usage data and feedback. Looking ahead, OpenAI plans to integrate Codex with issue trackers and CLI systems, potentially automating even more of the development lifecycle. The company also continues to develop Codex CLI, the open-source, local agent that runs on developers' own machines, and envisions a future where local and remote versions work together seamlessly. "What you really want is a remote co-worker with its own computer, but who can also look over your shoulder," Brockman explained. "You're there typing away, working on some change, and you're like, 'Ah, I want to go to lunch. Codex, can you finish this?' It just takes it over seamlessly and runs it in the cloud."
[18]
Microsoft Introduces GitHub AI Agent, Now With More Vibe Coding - Decrypt
Copilot's new capabilities aim to streamline development while maintaining existing security protocols. GitHub Copilot is no longer just an autocomplete tool. At Microsoft Build 2025 on Monday, the company revealed it's transforming Copilot into a full AI agent -- capable of thinking, reasoning, and writing code while a developer steps away for a coffee break. This shift toward so-called "vibe coding" -- where developers describe goals in natural language and let AI handle the implementation -- underscores a broader evolution in software development, one that Microsoft CEO Satya Nadella described as a seismic platform shift. "Here we are in 2025, building out this open, agentic web at scale," he said. "And we're going from these few apps with vertically integrated stacks to more of a platform that enables this open, scalable, agentic web." Nadella compared today's reveal to earlier moments in the company's history, including the launch of a more powerful 64-bit version of Windows, the web stack, and the rise of cloud computing and the modern mobile web. Microsoft acquired GitHub in 2018. In 2021, it launched GitHub Copilot in collaboration with OpenAI, the creator of ChatGPT. The tool is available natively in GitHub or through Microsoft's open-source code editor, VS Code. While GitHub Copilot is free for all users, only Pro and Pro+ subscribers will have unlimited access to the chatbot's more advanced features. Last week, OpenAI launched the latest version of its Codex, a free and open-source cloud-based platform for AI agents. These agents are designed to handle multiple programming tasks simultaneously, reducing the need for large development teams. "Vibe coding" has become popular of late, but AI's influence on code stretches back years before the phrase caught on. Emad Mostaque, former CEO of Stability AI, suggested in 2023 that 41% of GitHub's code was AI-generated. In 2024, a GitHub report showed a 59% increase in contributions to generative AI projects and a 98% rise in new projects. A separate survey by developer platform Opsera found that more than 80% of respondents had installed the GitHub Copilot IDE extension, reflecting the technology's growing adoption. In a live demo during Monday's keynote, Nadella showed Copilot assigning a GitHub issue, demonstrating how it operates in a sandboxed environment with built-in security protocols. Once Copilot is done working, the program notifies the user so that they can review the code. "Copilot can now learn your company's unique tone and language," Nadella said in a follow-up post on X. "It is all about taking that expertise you have as a firm and further amplifying it so everyone has access." To support a broader community of developers, Nadella said Microsoft is opening up Copilot's foundational tools so others can build their own specialized agents. "We're also making these core capabilities available to partners to help create an open and secure ecosystem of agents," he said. "Whether for SRE, code review, or the many other things developers will build." Even as GitHub Copilot evolves into a fully autonomous coding agent, GitHub CEO Thomas Dohmke said the program is designed to operate transparently and securely, to fit into existing developer workflows. "As the agent works, it pushes commits to a draft pull request, and you can track it every step of the way through the agent session logs," Dohmke said in a statement. "Having Copilot on your team doesn't mean weakening your security posture -- existing policies like branch protections still apply in exactly the way you'd expect," he added.
[19]
At Build, Microsoft introduces GitHub Copilot coding agent that works like a developer - SiliconANGLE
At Build, Microsoft introduces GitHub Copilot coding agent that works like a developer Microsoft Corp. today rolled out a new feature for GitHub Copilot: the ability to implement tasks or issues while running in the background as an artificial intelligence agent, like a developer. "The agent spins up a secure and fully customizable development environment powered by GitHub Actions," said Thomas Dohmke, chief executive of Microsoft-owned GitHub. "As the agent works, it pushes commits to a draft pull request and you can track it every step of the way through the agent session logs." GitHub Copilot can now act as an asynchronous AI developer partner directly integrated into the GitHub platform. The new capability integrates agentic AI, a growing trend in artificial intelligence that enables models to act independently of most human oversight, complete tasks and work toward goals autonomously. Although the new GitHub Copilot isn't entirely free from human oversight, its pull requests still require human approval, where continuous integration and deployment workflows are run. Microsoft said the agent excels at drudge work and low- to medium-complexity tasks in well-tested codebases -- for example, adding features, fixing bugs, extending tests, refactoring code and improving documentation. "The Github Copilot coding agent is opening up doors for each developer to have their own team, all working in parallel to amplify their work," James Zabinski, DevEx lead at Ernst & Young Global Ltd. Microsoft said the agent is enabled by the Model Context Protocol, a toolkit for connecting AI models to external data and capabilities outside GitHub. MCP servers can be configured in the repository's settings. The agent can also use multimodal capabilities and see images included in GitHub issues assigned to it, so screenshots of bugs or mockups can be used for new features. "Whether it's code completions, next edit suggestions, chat, agent mode or now coding agent, GitHub Copilot always had one mission: To keep you in the magical flow state," said Dohmke. As developers work with AI and Copilot, the new GitHub Models hub will allow users a simple way to explore best-in-class models to create, store, evaluate and share prompts, all without leaving GitHub, Microsoft said. GitHub Models will act as a centralized hub for model and prompt evaluation, allowing users to build, test and manage AI features directly from their own environment and repository. No more context switching between tools is necessary. Microsoft added that it will also allow developers to experiment with guardrails so they can do so securely. Today, Microsoft also said it's introducing something that it's calling Agentic DevOps, the next evolution of DevOps, where intelligent agents will collaborate with users and each other. The agents will automate and optimize each state of the software lifecycle. DevOps is a software development approach that combines both the "development" and "operations" teams in shared responsibility by collaborating through the software lifecycle by automating the process of integrating code into repositories, and continuously testing and delivering software. It also includes continuously measuring application performance. "Agentic DevOps will help you build faster, crush your backlog, cancel tech debt, secure your apps, and keep it all running in production," said Microsoft. "The best part is we keep you at the center of this orchestra, conducting agents, and approving recommendations, so you can get back to building epic stuff." At the center of this will be GitHub Copilot, the company said, which can apply itself to complex, multi-step coding tasks and analyze complex codebases, make edits across files, generate and run tests, fix bugs and suggest commands. Microsoft announced that it is releasing a Site Reliability Engineering agent for Azure that can run 24/7 and autonomously troubleshoot issues as they arise. It can continuously track performance for Kubernetes, App Service, serverless and database environments in real time, using deep knowledge built by the company from Azure-based services. When it provides remediation actions or repair, they are logged in GitHub issues so the team can follow through and close the loop. This means fewer wake-up calls and the system can self-heal. Of course, it can't fix something on its own; someone's pager will still go off. Microsoft is open-sourcing GitHub Copilot in Visual Studio Code. Over the next few months, the company said it will be bringing the AI-powered capabilities of GitHub Copilot extensions into the VS Code open-source repository. VS Code is already a popular integrated development editor, a software application that provides developers an environment to write, build and debug software code by combining tools under one interface.
[20]
OpenAI updates ChatGPT with coding-optimized Codex AI agent - SiliconANGLE
OpenAI updates ChatGPT with coding-optimized Codex AI agent OpenAI today debuted a new artificial intelligence agent, Codex, that can help developers write code and fix bugs. The tool is available through a sidebar in ChatGPT's interface. One button in the sidebar configures Codex to generate new code based on user instructions, while another allows it to answer questions about existing code. Prompt responses take between one and 30 minutes to generate based on the complexity of the request. Codex is powered by a new AI model called codex-1. It's a version of o3, OpenAI's most capable reasoning model, that has been optimized for programming tasks. The ChatGPT developer fine-tuned Codex by training it on a set of real-world coding tasks. Those tasks involved a range of software environments. A piece of software that runs well in one environment, such as a cloud platform, may not run as efficiently on a Linux server or a developer's desktop, if at all. As a result, an AI model's training dataset must include technical information about every environment that it will be expected to use. OpenAI used reinforcement learning to train codex-1. It's a way of developing AI models that relies on trial and error to boost output quality. When a neural network completes a task correctly, it's given a virtual reward, while incorrect answers lead to penalties that encourage the algorithm to come up with a better approach. In a series of coding tests carried out by OpenAI, Codex achieved an accuracy rate of 75%. That's 5% better than the most capable, hardware-intensive version of o3. OpenAI's first-generation reasoning model, o1, scored 11%. Codex carries out coding tasks in isolated software containers that don't have web access. According to OpenAI, the agent launches a separate container for each task. Developers can customize those development environments by uploading a text file called AGENTS.md. The file may describe what programs Codex should install, how AI-generated code should be tested for bugs and related details. Using AGENTS.md, developers can ensure that the container in which Codex generates code is configured the same way as the production system on which the code will run. That reduces the need to modify the code before releasing it to production. Developers can monitor Codex while it's generating code. After the tool completes a task, it provides technical data that can be used to review each step of the workflow. It's possible to request revisions if the code doesn't meet project requirements. OpenAI started rolling out Codex to ChatGPT today as a research preview. It will initially provide "generous access at no additional cost." In a few weeks, OpenAI will switch Codex to lower rate limits with "flexible pricing options that let you purchase additional usage on-demand." The ChatGPT developer also plans to expand Codex's feature set. One upcoming capability will allow users to provide the agent with instructions while it's in the middle of a task. Additionally, OpenAI plans to integrate Codex with more developer tools. One of the upcoming integrations will be for Codex CLI, an open-source application that OpenAI released last month. It's an AI coding assistant that developers can install on their desktops and access from the command line. OpenAI debuted a new version of Codex CLI in conjunction with the release of Codex today.
[21]
OpenAI launches Codex, an AI agent for coding
OpenAI launched a research preview on Friday of what it's calling its most capable AI coding agent yet. Codex, a cloud-based software engineering agent, can write features, answer questions about a codebase, fix bugs, and propose pull requests for review. Several tasks can run simultaneously, and users retain full access to their computers while the agent takes anywhere from one to 30 minutes to complete a task. Since it's still in research preview, the tool remains in early development. The company said in a blog post that it "currently lacks features like image inputs for front-end work, and the ability to course-correct the agent while it's working. Additionally, delegating to a remote agent takes longer than interactive editing, which can take some getting used to." Over time, however, the company said using the service will feel more like asynchronous collaboration with colleagues.
[22]
Why OpenAI's Codex is Not as Good as Devin or Replit | AIM
Codex is not connected to the internet, which makes it not an ideal choice over Devin, or even Replit or Cursor. If you're a software engineer, indie hacker, or startup founder who's spent the last year tooling around with AI agents like Replit's Ghostwriter, Cognition's Devin, or Lovable's smart terminals -- well, OpenAI just entered the game, again. Over the weekend, OpenAI rolled out Codex, a cloud-based software engineering agent that looks suspiciously like the future of dev work. It's available starting for ChatGPT Pro, Team, and Enterprise users at $200 a month, while it may take a while for the Plus users to get access. Greg Brockman, co-founder of OpenAI, said during the live research preview that Codex is their bet on vibe coding. This comes just days after OpenAI announced its acquisition of Windsurf for $3 billion. Windsurf, an artificial intelligence-assisted coding tool formerly known as Codeium, is also a direct competitor to Cursor, which was also backed by OpenAI. Codex isn't another glorified autocomplete. It's a multi-agent dev assistant that runs coding tasks in parallel, inside sandboxed environments preloaded with your repo, which sounds similar to Devin, but OpenAI argues that it's not. During the launch preview with Brockman, Katy Shi, one of the researchers at OpenAI, said, "Codex is as trustworthy, if not more trustworthy than my coworkers." Shi added that she could access her coworkers' logs without needing to talk to them. Shi meant that with Codex, developers can do work like writing new features, debugging, writing tests, or proposing pull requests -- and it will do all of that while showing you terminal logs, test outputs, and commit history, so you don't have to trust it blindly. This essentially means GitHub PRs can be drafted, tested, and explained by a bot that lives inside ChatGPT, making it possibly better than Devin. But while Codex acts as an agent running coding tasks in the background on the cloud, Replit allows developers to deploy apps, while Devin is an end-to-end software engineer. Codex still has other limitations, and in this case, pretty big ones. It is not connected to the internet, which makes it not an ideal choice over Devin. This is the biggest criticism currently of the release and the reason developers are not adopting it in their workflow. Devin is also in early access. It also needs well-scoped tasks. It sometimes fails tests or gets confused. And it won't yet handle sprawling architectural decisions on its own. But for repeatable engineering chores, it's surprisingly capable -- and transparent. OpenAI conveniently calls this a research preview. Maybe the team will connect it to the Internet soon. The ambitions are anything but modest. Codex is powered by codex-1, a variant of OpenAI's o3 model explicitly tuned for software engineering. It was trained with reinforcement learning on thousands of real coding tasks, making it eerily good at mimicking human dev styles, coding conventions, and PR etiquette. "Codex increases the value of being technical. If you can describe precisely what you want to build, you can get a massive amount done in parallel," posted Josh Tobin from OpenAI. "That's fundamentally a technical skill." But Cognition recently announced an update to Devin, offering a new agent-native IDE experience. Devin 2.0 supports multiple parallel instances, each with an interactive cloud-based IDE. Additionally, the latest update allows developers to take control while providing collaborative and fully automated approaches. Furthermore, it enables developers to refine code and run tests within the IDE. Cognition AI also announced additional features for Devin, including Interactive Planning, Devin Search, and Devin Wiki. This is where OpenAI's Codex falls behind. Inside ChatGPT, Codex is accessed via a sidebar. You create tasks with prompts, click "Code" to generate changes, or "Ask" to query your codebase. Very different from Cursor's "tab tab tab" models, but similar to Lovable and Replit. Each task gets its own isolated environment, where Codex can edit files, run linters, test harnesses, and type checkers. Depending on the complexity, completing a task can take anywhere from 1 to 30 minutes. You can monitor its progress in real time. It's no coincidence that Codex seems to be eager to eat the lunches of agents like Devin, Cursor, and Replit's AI tools. All these startups have been vying to become the default AI coding companion. But with Codex, OpenAI is using its distribution advantage -- ChatGPT is already in millions of developers' workflows. As Santiago Valdarrama joked: "Literally everyone is freaking out over Codex like they didn't do the exact same thing for Devin, Cursor, DeepSeek, and every GPT drop since 2.0... VCs will congratulate themselves and write posts about how Codex will enable the next trillion-dollar market... until the next shitty autocomplete drops." Despite the sarcasm, there's truth to the cycle. But Codex is not autocomplete. At OpenAI itself, engineers are using Codex to offload annoying chores like renaming variables, writing tests, and fixing bugs. "By reducing context-switching and surfacing forgotten to-dos, Codex helps engineers ship faster and stay focused on what matters most," the company writes. Codex isn't being built in a vacuum. Early testers like Cisco, Temporal, Superhuman, and Kodiak Robotics are already using it. Cisco is testing it across its engineering teams to accelerate product development. Temporal uses it to debug, scaffold features, and stay in flow by offloading background work. Superhuman has even let product managers use Codex to write code, with engineers stepping in only for reviews. Kodiak, which builds autonomous driving tech, is using it to improve test coverage and debug tools and apparently to navigate obscure parts of its stack. Codex isn't just stuck in ChatGPT either. OpenAI quietly launched Codex CLI last month -- a terminal-based coding agent you can run locally. It brings the same models (o3 and o4-mini) into your dev environment. Now, they've added codex-mini-latest, a lightweight version of codex-1 optimised for snappier Q&A and faster editing inside the CLI. OpenAI is handing out $5-$50 in free API credits for Codex CLI for Plus and Pro users. No excuses not to try it. "We imagine a future where developers drive the work they want to own and delegate the rest to agents," OpenAI wrote. Developers need to know what you want to build, but you may never have to write boilerplate again. Codex doesn't kill Replit, Devin, or Lovable overnight. But it does something much more dangerous -- it sets a new standard, but without the internet. Multi-agent, cloud-based, verifiable, and integrated into ChatGPT. It's the baseline now. Everyone else needs to catch up.
[23]
ChatGPT's New Coding Agent Is Huge, Even if You Aren't a Programmer
Apple Maps' New Expert Ratings Make It Easier to Find the Best Places to Eat ChatGPT does a pretty good job of generating code from text prompts and breaking it down. Now, OpenAI has added a new coding agent to ChatGPT, and it's not just programmers who should be excited ChatGPT's Codex Takes AI Programming to the Next Level OpenAI is launching a research preview of Codex, a "cloud-based software engineering agent." The feature is powered by codex-1, a version of the OpenAI o3 model optimized for coding and software engineering tasks. Codex-1 is also trained to align its output closely with "human coding preferences and standards." You can find the feature in the ChatGPT sidebar if you're a ChatGPT Pro, Enterprise, or Team user, with Plus and Edu users getting it soon. Once open, you can either assign it a coding task by typing a prompt and entering the Code button or ask questions about your codebase using the Ask button. You'll find information on Codex's task list and progress below the prompt bar. This new agent can perform multiple tasks on an existing codebase, like adding new features, fixing bugs, and answering any questions you might have. Each task runs in a separate isolated environment, preloaded with your codebase or repository. Codex can read and edit files as well. OpenAI's announcement claims that the agent will take anywhere between one to 30 minutes to complete an assigned task, depending on the complexity of the task. You can monitor its progress in real-time or even run multiple tasks simultaneously, all while using your browser and computer as usual. While ChatGPT can help you generate code and even provide entire projects that you can download and test, it doesn't work well with software repositories and codebases. Codex's abilities to with within typical software engineering infrastructures means it's a lot more useful compared to vanilla ChatGPT to both companies and individuals who maintain multiple projects in repositories. Codex produces cleaner code compared to ChatGPT, which is ready for human review and integration into workflows or codebases. It also runs tests until it passes all the given test cases and conditions. Once a task is completed, Codex will commit the changes to its environment and provide "verifiable evidence of its actions through citations of terminal logs and test outputs." Why Is Codex a Big Deal? Codex is a big deal for professionals in any industry. You can write Excel macros, automate reports, batch edit files, and do just about everything that would've required expertise in some programming or scripting language. Sure, ChatGPT can generate code and scripts for you, but in my experience, it's not reliable. You need to have relatively good knowledge of the programming language you're working with and a general idea of code debugging. Codex, however, automatically checks its code and runs tests to ensure it works the way you want. This might improve with ChatGPT's new GPT-4.1 model, but it's not a perfect solution. Knowing when to use which ChatGPT model can largely affect the output, so a model custom-made for coding will perform better than a more general-purpose model. Of course, if you're a programmer, Codex is massively helpful as it can integrate with your GitHub repositories and take care of repetitive tasks and test cases. This lets you develop and ship your app faster without getting caught up in maintenance, testing, and other tasks usually part of the software development process.
[24]
OpenAI takes on Google Gemini Anthropic with AI coding agent for ChatGPT
OpenAI has launched Codex, an AI coding agent powered by codex-1, designed to assist software engineers with tasks like writing features, fixing bugs, and proposing pull requests. Available on ChatGPT Pro, Enterprise, and Team, Codex aims to improve coding workflows while incorporating safeguards against malicious use. Google DeepMind also enhanced Gemini 2.5 Pro with improved coding capabilities.OpenAI launched a research preview of Codex, a cloud-based software engineering agent on Friday. The AI coding agent is powered by codex-1, a version of OpenAI o3 optimized for software engineering, the AI platform said. Codex can write features, answer questions about codebases, fix bugs, and propose pull requests for review. Each task will run in its own cloud sandbox, preloaded with the user's repository. OpenAI said Codex will be available on ChatGPT Pro, Enterprise, and Team users today, with support for Plus and Edu coming soon. It can be accessed through the ChatGPT sidebar, and assigned new tasks by typing a prompt and clicking 'Code'. Users can ask questions about a codebase by clicking 'Ask'. Codex's actions can be seen through citations of terminal logs and test outputs, helping trace each step taken. Users can then review the results, request further revisions, open a GitHub pull request, or directly integrate the changes on their workspaces. OpenAI said Codex was trained to identify and refuse requests aimed at the development of malicious software, addressing concerns that malicious actors could misuse this sophisticated coding agent for cyber attacks and other harmful uses. Apart from OpenAI, Microsoft-owned GitHub, Google and Anthropic, along with startups including Anysphere and Windsurf, offer AI tools for to aid programmers. Earlier this month, Google DeepMind added vastly improved coding capabilities to Gemini 2.5 Pro (Preview). In the run-up to its recently concluded Google I/O 2025 event, the search major released the AI agenct, now branded the I/O Edition. Internally labelled gemini-2.5-pro-preview-05-06, the model can now deliver significant improvements in code transformation, code editing, and even in developing complex agentic workflows -- making it far more capable for software developers and engineers, according to Google.
[25]
GitHub launches new AI coding agent that fixes bugs
Copilot helps developers write code -- it suggests lines of code or even whole functions while typing. In its latest upgrade, the AI agent will be more active, like a mini-assistant, for the developer. Instead of passively suggesting code, the agent will understand and act on goals.Developer platform GitHub has unveiled an artificial intelligence (AI) coding agent embedded directly into its AI tool, GitHub Copilot. Users can assign the agent tasks and it can fix bugs and add features on a developer's behalf. Copilot helps developers write code -- it suggests lines of code or even whole functions while typing. In its latest upgrade, the agent will be more active, like a mini-assistant, for the developer. Instead of passively suggesting code, the agent will understand and act on goals. This agent will be built into Copilot itself, meaning developers don't have to install anything extra, and it will be a part of their workflow. "Using state-of-the-art models, the agent excels at low-to-medium complexity tasks in well-tested codebases - adding features, fixing bugs, extending tests, refactoring code, improving documentation. It's all about keeping you in the magical flow state," CEO Thomas Dohmke said in a post on X. Developers can tell the agent to fix the code, implement a search feature, or carry out autonomous actions like fix bugs by identifying broken logic, or understand context across files. This makes GitHub Copilot more than just a coding assistant, and more like a coding collaborator. "Built around an integrated, secure and fully customizable development environment powered by GitHub Actions, the Copilot coding agent is amplifying human developers with trust by design," Dohmke added. This comes days after Google DeepMind unveiled AlphaEvolve, an AI coding agent backed by its Gemini models. However, AlphaEvolve is designed for performance on algorithmic challenges, and is more research-grade and specialised. GitHub's agent, on the other hand, is designed to work alongside developers in real-world projects.
[26]
OpenAI Launches Codex, a Software Engineering AI Agent
The Codex agent is rolling out to ChatGPT Pro, Team, and Enterprise users, starting today. Today, OpenAI introduced a cloud-based software engineering AI agent, powered by the company's most powerful coding model called 'codex-1'. It's available to ChatGPT Pro, Team, and Enterprise users, starting today. OpenAI says ChatGPT Plus and Edu users will get access to Codex in the future. Talking about Codex, the software engineering agent can perform multiple tasks in parallel on the cloud. It can add new features, answer questions about your codebase, fix bugs, and propose pull requests for review. Developers can connect their GitHub repositories and run the AI agent to perform a variety of tasks. You can access Codex in ChatGPT from the left sidebar. OpenAI says codex-1 is built on the o3 model by training it "using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences." Codex follows the instructions carefully and runs tests until it receives a passing result in the cloud environment. You can define the AGENTS.md file in your repo to guide the agent so that it can navigate the codebase and perform actions precisely the way you want. On OpenAI's internal SWE tasks, the codex-1 model achieves 75% accuracy, more than o3-high which gets 70%. On SWE-Bench Verified, codex-1 does slightly better than o3-high after a number of attempts. OpenAI finally says the Codex agent operates in a secure, isolated container in the cloud, and during the task execution, internet access is disabled. The agent can't access external websites, APIs, or other services.
[27]
New from OpenAI Codex in ChatGPT : Enhancing Coding Efficiency and Collaboration
OpenAI has introduced Codex, an advanced AI-powered coding assistant designed to enhance the efficiency and productivity of software development. Built on the robust codex-1 model, this tool aims to streamline workflows, automate repetitive tasks, and foster collaboration among developers. By using advanced technologies such as reinforcement learning and secure cloud-based environments, Codex offers a new approach to tackling coding challenges, making it a valuable resource for developers and teams. Codex provides a comprehensive suite of features tailored to meet the diverse needs of software developers. Its capabilities extend across a range of tasks, including: Operating within secure, isolated cloud environments, Codex ensures that workflows remain efficient and protected. By preloading user repositories, it enables developers to work seamlessly without compromising security. Outputs generated by Codex, such as terminal logs, test results, and citations, are verifiable, making sure transparency and reliability. The tool's ability to automate repetitive tasks and provide actionable insights significantly reduces the time spent on manual processes. For instance, Codex can identify bugs, suggest fixes, and generate comprehensive documentation, allowing developers to focus on more complex and creative aspects of their work. Its seamless integration with existing development tools ensures that teams can adopt it without disrupting their established workflows. At the core of Codex lies the codex-1 model, which has been specifically optimized for software engineering tasks. OpenAI trained this model using reinforcement learning techniques, focusing on real-world coding scenarios to align its outputs with human coding preferences. This training approach ensures that Codex not only understands the technical nuances of programming but also adheres to industry standards and best practices. The model's training emphasizes adaptability and precision, allowing it to handle tasks of varying complexity. Whether developers need assistance with a simple script or a sophisticated system, Codex demonstrates efficiency, often completing tasks within 1 to 30 minutes. This capability is particularly advantageous for teams working under tight deadlines or managing large-scale projects, where time and accuracy are critical. Codex is designed to support both real-time and asynchronous collaboration, making it an effective tool for distributed teams. Developers can delegate routine or time-intensive tasks to Codex, freeing up time to focus on higher-level problem-solving and innovation. This feature is especially beneficial for teams operating across different time zones, where asynchronous workflows are essential for maintaining productivity. Real-time collaboration is assistd through shared coding environments and synchronized updates, making sure that all team members remain aligned. Additionally, Codex provides context-aware suggestions and guidance, helping teams maintain a cohesive and efficient development process. By allowing developers to offload repetitive tasks, Codex fosters a more streamlined and collaborative approach to software development. OpenAI has prioritized security and ethical considerations in the design of Codex. The tool operates within secure, isolated containers, preventing unauthorized access to sensitive data. During task execution, Codex is restricted from accessing the internet, further minimizing potential security risks. These measures make Codex a reliable choice for handling sensitive coding projects. To promote ethical use, Codex has been trained to reject malicious requests, such as generating harmful or unethical code. Developers are encouraged to manually review all AI-generated outputs before integration, making sure accountability and maintaining quality control. This approach strikes a balance between the benefits of automation and the need for human oversight, fostering responsible use of the tool. Codex is designed to integrate effortlessly with existing development tools and workflows. It supports AGENTS.md files, allowing developers to configure environments and provide repository-specific guidance. This level of customization ensures that Codex can adapt to the unique requirements of different teams and projects. The tool also integrates seamlessly with platforms like GitHub and continuous integration (CI) systems, further streamlining workflows. By automating routine tasks and offering actionable insights, Codex reduces cognitive load and enhances overall productivity. Its ability to adapt to various development environments makes it a versatile tool for teams of all sizes. Codex has already been adopted by OpenAI engineers and external testers, including companies such as Cisco, Temporal, Superhuman, and Kodiak. These early adopters have reported notable productivity gains and smoother workflows, demonstrating the tool's potential to address common challenges in software development. For example: These real-world applications highlight Codex's versatility and effectiveness in addressing a wide range of development needs. Codex is currently available to ChatGPT Pro, Team, and Enterprise users, with plans to extend access to Plus and Edu users in the future. OpenAI employs a token-based pricing model, charging $1.50 per 1 million input tokens and $6 per 1 million output tokens. This pricing structure provides flexibility, allowing developers to explore Codex's capabilities without significant upfront costs. Despite its robust features, Codex has some limitations. It currently lacks support for image inputs and may be slower at task delegation compared to interactive editing. These constraints represent areas for potential improvement as OpenAI continues to refine the tool. OpenAI envisions a future where asynchronous, multi-agent workflows become a standard practice in software engineering. By allowing developers to delegate tasks to AI agents like Codex, teams can achieve greater scalability and efficiency. This vision aligns with OpenAI's broader goal of empowering small teams to accomplish significant outcomes through enhanced productivity. Future updates to Codex are expected to focus on expanding its interactive capabilities, deepening integration with development tools, and providing proactive updates. These advancements aim to position Codex as an indispensable tool for developers, driving innovation and efficiency across the software engineering landscape.
[28]
OpenAI Codex AI Coding Assistant : Goodbye to Repetitive Coding Tasks
What if coding could feel less like a grind and more like a creative partnership? Imagine an AI-powered assistant that not only automates repetitive tasks but also helps you debug complex issues, implement features, and even generate unit tests -- all while making sure your code is secure and collaborative. Enter OpenAI Codex, a new tool designed to transform the way developers approach software engineering. By blending innovative AI with practical coding workflows, Codex promises to transform how teams build, maintain, and innovate software. But as with any innovation, it raises a critical question: can an AI truly enhance creativity without compromising control? Prompt Engineering provides more insights into the capabilities and challenges of OpenAI Codex, offering an in-depth look at how it reshapes software development. From its seamless GitHub integration to its secure, isolated execution environment, Codex is tailored to optimize productivity while safeguarding sensitive data. Yet, its limitations -- such as restricted internet access during execution -- demand careful planning and adaptation. Whether you're a seasoned developer or a curious newcomer, this exploration of Codex's strengths, weaknesses, and future potential will leave you rethinking the boundaries of collaboration in coding. Could this be the next step in bridging human ingenuity with machine precision? Codex is specifically engineered to optimize software development processes, offering a range of features that enhance productivity and collaboration. As a fine-tuned version of OpenAI's O3 model, it supports a variety of essential tasks, including: By integrating directly with GitHub repositories through a straightforward setup file, Codex simplifies collaboration among team members. Its secure cloud sandbox ensures all tasks are executed in isolation, prioritizing security and minimizing risks. However, this design also limits Codex's ability to access the internet during execution. While this restriction enhances safety, it requires developers to plan workflows carefully, particularly when working with external libraries or dependencies. Codex excels in automating routine coding tasks, allowing developers to dedicate more time to creative and complex aspects of software engineering. Its design emphasizes transparency, granting users control over outputs and fostering trust. Additionally, its focus on modular programming and adherence to established software engineering principles ensures high-quality, maintainable code. Key strengths include: These attributes make Codex particularly valuable for teams handling sensitive or high-stakes projects, where security and efficiency are paramount. By reducing the burden of routine tasks, Codex allows developers to focus on innovation and strategic problem-solving. Expand your understanding of Cloud-based coding agent with additional resources from our extensive library of articles. Despite its numerous strengths, Codex has certain limitations that developers must navigate. The lack of internet access during task execution can disrupt workflows that rely on external resources, such as fetching dependencies or updating libraries. This restriction, while enhancing security, necessitates careful planning to ensure all required resources are pre-configured. Additionally, while Codex offers performance improvements over its predecessor, the O3 model, these gains may be incremental in specific scenarios. Early users have also reported occasional inconsistencies in functionality, indicating that the system is still evolving. To mitigate these challenges and maximize Codex's potential, developers should adopt robust coding principles and modular practices. By aligning their workflows with Codex's strengths, teams can overcome its limitations and achieve optimal results. Security is a cornerstone of Codex's design. By executing tasks within a secure cloud sandbox and limiting external interactions, Codex protects against misuse and ensures sensitive data remains secure. This design choice makes it a reliable tool for handling critical tasks without compromising project integrity. However, the isolated execution environment requires developers to plan ahead. Dependencies and libraries must be pre-configured to account for Codex's inability to access external resources during execution. This proactive approach ensures seamless workflows and minimizes potential disruptions. Codex is not intended to replace developers but to augment their capabilities. By automating routine tasks, it allows programmers to focus on higher-level problem-solving, innovation, and strategic decision-making. For non-coders, Codex serves as an accessible entry point into software engineering, offering hands-on interaction that bridges the gap between technical and non-technical users. To fully use Codex, developers should: These strategies not only enhance Codex's effectiveness but also ensure that projects remain adaptable and robust in the long term. Codex represents a significant step forward in integrating AI into software development workflows. Its potential to contribute to open source projects and collaborative coding environments is immense, offering new opportunities for innovation and efficiency. As developers adapt to evolving paradigms, tools like Codex will play a central role in shaping the future of software engineering. While Codex is still in its early stages, its promise is evident. By addressing its current limitations and building on its strengths, OpenAI has the opportunity to redefine how developers approach coding, collaboration, and problem-solving. For now, Codex serves as a powerful tool for enhancing productivity, fostering best practices, and paving the way for a more efficient and secure software development landscape.
[29]
GitHub Introduces Coding Agent For GitHub Copilot
At Microsoft Build 2025, GitHub introduced a new, enterprise-ready coding agent for Copilot - integrated, responsible and secured by GitHub's control layer - GitHub announced today at Microsoft Build that GitHub Copilot now includes an asynchronous coding agent, embedded directly in GitHub and accessible from VS Code -- creating a powerful Agentic DevOps loop across the world's most widely adopted coding environments. Anchored on its core principle of developer choice and access, GitHub is also open sourcing Copilot Chat in VS Code, enhancing its platform with new functionality in GitHub Models -- including support for Grok 3 from xAI -- and bringing agent mode to JetBrains, Eclipse, and Xcode. "GitHub is where the world's developers work on their projects. Now, it's becoming the place where they collaborate with agents in a configurable, steerable, and verifiable way. It's vital that organizations and developers are ready to embrace these agents without compromising their security posture," said GitHub CEO Thomas Dohmke. "Built around an integrated, secure, and fully customizable development environment powered by GitHub Actions, the Copilot coding agent is the most enterprise-ready of its kind -- amplifying human developers with trust by design. And these protections aren't just for us: as the new home of AI agents, we're making the same primitives available to partners to ensure an open ecosystem for agentic peer programming." Integrated, secure, and fully customizable The Copilot coding agent operates within GitHub's native control layer, built in the flow of the software development life cycle. The agent starts its work when you assign a GitHub issue to Copilot or ask it to start working from Copilot Chat in VS Code. As the agent works, it pushes commits to a draft pull request, and you can track it every step of the way through the agent session logs. Developers can give feedback and ask the agent to iterate through pull request reviews. The agent is expressly designed to preserve your existing security posture, with additional built-in features like branch protections and controlled internet access to ensure safe and policy-compliant development workflows. Plus, the agent's pull requests require human approval before any CI/CD workflows are run, creating an extra protection control for the build and deployment environment. With the power of Model Context Protocol (MCP), teams can give the coding agent access to data and capabilities from outside of GitHub. MCP servers can be configured in the repository's settings. Powered by GitHub Actions All SWE agents need a compute environment to do their work. The Copilot coding agent gets to work by spinning up a secure, fully customizable development environment powered by GitHub Actions. Introduced in 2018, Actions is the largest CI/CD ecosystem in the world with more than 25,000 actions in the GitHub Marketplace. Every weekday, GitHub-hosted and self-hosted runners execute more than 40 million daily jobs. By using GitHub Actions, Copilot uses a familiar and powerful compute platform that's both reliable and secure. GitHub Actions is already powering the world's open source, started, and large enterprises at scale. Enterprise-ready at its core In private preview with internal teams and selected customers, we have found that the agent excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring code, and improving documentation. "The GitHub Copilot coding agent fits into our existing workflow and converts specifications to production code in minutes. This increases our velocity and enables our team to channel their energy toward higher-level creative work." -- Alex Devkar, Senior Vice President, Engineering and Analytics, Carvana "The Copilot coding agent is opening up doors for human developers to have their own agent-driven team, all working in parallel to amplify their work. We're now able to assign tasks that would typically detract from deeper, more complex work -- allowing developers to focus on high-value coding tasks." -- James Zabinski, DevEx Lead at EY Starting today, the Copilot coding agent is available in preview to all Copilot Enterprise and Copilot Pro+ users. Using the agent consumes Copilot premium requests from a user's entitlements, included in their Copilot subscription, plus GitHub Actions minutes, which also have an included allowance for every customer. "With its autonomous coding agent, GitHub is looking to shift Copilot from an in-editor assistant to a genuine collaborator in the development process," explains Kate Holterhoff, senior analyst at RedMonk. "This evolution aims to enable teams to delegate implementation tasks and thereby achieve a more efficient allocation of developer resources across the software lifecycle." Even more AI capabilities GitHub also announced new AI capabilities that enable developers to securely build and deploy with greater flexibility and efficiency, including: Opening up Copilot: As AI transforms development, we're committed to keeping developers in control. Starting next month, we'll begin open sourcing the GitHub Copilot Chat extension under MIT license, followed by a gradual integration of key AI capabilities directly into VS Code core. Soon every developer can inspect, extend, and help shape how AI works in their editor. Extended GitHub Models: Every GitHub user can now enable the Models tab in any repository to build, test, and manage AI features in one place. With prompt management, lightweight evaluations, and enterprise controls, developers can experiment, build, and deploy using industry-leading models on GitHub -- with governance and security built in. Welcoming Grok to GitHub Models: As developers seek more choice in the models they use, we're expanding what's possible on GitHub. Starting today, xAI's Grok 3 is available in GitHub Models. Expanded agent mode: Agent mode is now rolling out in JetBrains, Eclipse, and Xcode in public preview to all Copilot users, extending Copilot to developers' preferred environments. About GitHub GitHub is the world's most widely adopted Copilot-powered developer platform to build, scale, and deliver secure software. Over 150 million developers, including more than 90% of the Fortune 100 companies, use GitHub to collaborate and more than 77,000 organizations have adopted GitHub Copilot.
[30]
OpenAI Launches Codex AI Agent for Cloud-Based Coding
OpenAI launched Codex, a cloud-based software engineering agent that can write code based on textual prompts in natural language. It can work on multiple tasks at the same time. This agent works on code provided through the GitHub repositories and a pre-installed setup by the user, without needing to be connected to the internet. Therefore, access to external websites and APIs is denied. It is a follow-up version of Codex CLI, launched in April 2025. It is an open-source coding agent that runs in the user's terminal, i.e., command-line interface. Like Codex, it also didn't need to be connected to a browser or an Integrated Development Environment (IDE) like Android Studio and Visual Studio Code (VS Code). The AI agent released by OpenAI can perform tasks like fixing bugs, writing specific features, answering questions about your codebase, and proposing pull requests for review. It can read and edit files and run commands, including test harnesses, linters (tools to spot programming errors), and type checkers. Trained using reinforcement learning on real-world coding tasks in various environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result. It currently lacks features like image inputs for frontend work and the ability to course-correct the AI agent while it's working. Additionally, delegating to a remote agent takes longer than interactive editing, which can take some getting used to. Over time, interacting with Codex agents will increasingly resemble asynchronous collaboration with colleagues. Codex can be guided through AGENT.md files. These markdown-based text files are used to inform the Codex AI regarding how to run commands and navigate the codebase (collection of human-written source code). Each agent runs in its cloud container with no internet access. After setup, internet access is disabled, and the model trajectory begins. This requires the users to preload the code and a development environment that the programmer defines. Codex is also trained to provide verifiable evidence of its actions through terminal logs and files. This is to validate the model's work if further refinements are needed. Andrei Karpathy coined the term vibe coding. He describes it as "where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Vibe coding is a method of making LLMs and AI agents write code using input from natural language. Cursor, Replit's Agent, StackBlitz's Bolt and the recent OpenAI's Codex are a few companies that offer vibe coding features to the users. Writing code with AI agents' help is being adopted worldwide. OpenAI says that Cisco, Temporal, Superhuman and Kodiak are already using the Codex for debugging, testing, automation, etc. However, in recent Microsoft layoffs, software developers and coders were the most brutally hit. More than 40% of the 2,000 positions cut were related to coding, as per the documents reviewed by Bloomberg. Yes. However, OpenAI claims that the Codex was trained to identify and reduce requests if the user tries creating malicious software. However, OpenAI claims that autonomous coding agents can be exploited to develop malware, including low-level kernel engineering. The kernel is the brain of an operating system (OS) that can control the hardware and software. Low-level kernel engineering refers to the manipulation of the OS. For example, the user can create an invisible API malware that can interact with the OS to execute unauthorised actions. It is noted that kernel-level operations are not visible to most user-level monitoring tools like anti-virus alerts, system logs, or simply task managers that display the list of running applications. Vibe coding, or AI agents that write code, has significantly lowered barriers to entry. However, the need for greater vigilance to maintain code quality has also increased. "Vibe coding is all fun and games until you have to vibe debug," said Ben South, who is building Variant. At times, AI exhibits "AI paternalism," which decides what is best for the user instead of directly answering the query. Such moderations are especially important when self-harm or harm to others is involved. For example, if a user asks an AI model for ways to commit suicide, it can respond with mental health resources instead of providing harmful information. However, this approach can sometimes be excessive. In March, Cursor's AI tool refused to write code for a user, advising him to "develop the logic" to understand system maintenance better. The chatbot reasoned that "generating code for others can lead to dependency and reduced learning opportunities." This highlights the autonomous nature of AI agents and the reliance of users and companies on them to replace human oversight. The potential for mass layoffs requires caution and raises questions about whether we can operate AI coding agents in autopilot mode.
Share
Copy Link
OpenAI introduces Codex, an advanced AI coding agent integrated into ChatGPT, capable of performing complex programming tasks autonomously. This marks a significant step in the evolution of AI-assisted software development.
OpenAI has unveiled Codex, its latest innovation in AI-assisted software development. This advanced coding agent, integrated into ChatGPT, represents a significant leap forward in the realm of AI coding tools 1.
Source: MediaNama
Codex is built on codex-1, a fine-tuned variation of OpenAI's o3 reasoning model. This AI agent is designed to handle a wide range of coding tasks, from writing production-ready code to answering questions about codebases 2. Unlike traditional code completion tools, Codex operates in a sandboxed, virtual environment that can be preloaded with the user's code repositories, allowing it to work within the context of existing projects.
Codex boasts several impressive features:
Codex is part of a new cohort of agentic coding tools that aim to revolutionize software development. These tools, including Devin, SWE-Agent, and OpenHands, are designed to work autonomously, potentially operating like virtual engineering team managers 3.
The launch of Codex reflects the growing importance of AI in software development:
Despite the excitement surrounding Codex and similar tools, several challenges remain:
Codex is currently available to ChatGPT Pro, Enterprise, and Team users, with plans to expand access to Plus and Edu users in the future. While initial access is generous and free, OpenAI plans to introduce rate limits and a new pricing scheme in the coming weeks 12.
As the software development landscape continues to evolve, tools like Codex are poised to play an increasingly important role in shaping the future of coding. However, the industry must navigate the delicate balance between automation and human oversight to ensure the production of high-quality, reliable software.
OpenAI appeals a court order requiring it to indefinitely store deleted ChatGPT conversations as part of The New York Times' copyright lawsuit, citing user privacy concerns and setting a precedent for AI data retention.
9 Sources
Technology
16 hrs ago
9 Sources
Technology
16 hrs ago
Anysphere, the company behind the AI coding assistant Cursor, has raised $900 million in funding, reaching a $9.9 billion valuation. The startup has surpassed $500 million in annual recurring revenue, making it potentially the fastest-growing software startup ever.
4 Sources
Technology
16 hrs ago
4 Sources
Technology
16 hrs ago
A multi-billion dollar deal to build one of the world's largest AI data center hubs in the UAE, involving major US tech companies, is far from finalized due to persistent security concerns and geopolitical complexities.
4 Sources
Technology
8 hrs ago
4 Sources
Technology
8 hrs ago
A new PwC study challenges common fears about AI's impact on jobs, showing that AI is actually creating jobs, boosting wages, and increasing worker value across industries.
2 Sources
Business and Economy
8 hrs ago
2 Sources
Business and Economy
8 hrs ago
Runway's AI Film Festival in New York highlights the growing role of artificial intelligence in filmmaking, showcasing innovative short films and sparking discussions about AI's impact on the entertainment industry.
5 Sources
Technology
8 hrs ago
5 Sources
Technology
8 hrs ago