Curated by THEOUTPOST
On Wed, 23 Apr, 12:01 AM UTC
2 Sources
[1]
Beyond Code Autocomplete
At first glance, the results are fascinating. Coding assistants are already changing the work of some programmers and transforming how coding is taught. However, this is the question we need to answer: Is this kind of generative AI just a glorified help tool, or can it actually bring substantial change to a developer's workflow? At Advanced Micro Devices (AMD), we design and develop CPUs, GPUs, and other computing chips. But a lot of what we do is developing software to create the low-level software that integrates operating systems and other customer software seamlessly with our own hardware. In fact, about half of AMD engineers are software engineers, which is not uncommon for a company like ours. Naturally, we have a keen interest in understanding the potential of AI for our software-development process. To understand where and how AI can be most helpful, we recently conducted several deep dives into how we develop software. What we found was surprising: The kinds of tasks coding assistants are good at -- namely, busting out lines of code -- are actually a very small part of the software engineer's job. Our developers spend the majority of their efforts on a range of tasks that include learning new tools and techniques, triaging problems, debugging those problems, and testing the software. We hope to go beyond individual assistants for each stage and chain them together into an autonomous software-development machine -- with a human in the loop, of course. Even for the coding copilots' bread-and-butter task of writing code, we found that the assistants offered diminishing returns: They were very helpful for junior developers working on basic tasks, but not that helpful for more senior developers who worked on specialized tasks. To use artificial intelligence in a truly transformative way, we concluded, we couldn't limit ourselves to just copilots. We needed to think more holistically about the whole software-development life cycle and adapt whatever tools are most helpful at each stage. Yes, we're working on fine-tuning the available coding copilots for our particular code base, so that even senior developers will find them more useful. But we're also adapting large language models to perform other parts of software development, like reviewing and optimizing code and generating bug reports. And we're broadening our scope beyond LLMs and generative AI. We've found that using discriminative AI -- AI that categorizes content instead of generating it -- can be a boon in testing, particularly in checking how well video games run on our software and hardware. In the short term, we aim to implement AI at each stage of the software-development life cycle. We expect this to give us a 25 percent productivity boost over the next few years. In the long term, we hope to go beyond individual assistants for each stage and chain them together into an autonomous software-development machine -- with a human in the loop, of course. Even as we go down this relentless path to implement AI, we realize that we need to carefully review the possible threats and risks that the use of AI may introduce. Equipped with these insights, we'll be able to use AI to its full potential. Here's what we've learned so far. GitHub research suggests that developers can double their productivity by using GitHub Copilot. Enticed by this promise, we made Copilot available to our developers at AMD in September 2023. After half a year, we surveyed those engineers to determine the assistant's effectiveness. We also monitored the engineers' use of GitHub Copilot and grouped users into one of two categories: active users (who used Copilot daily) and occasional users (who used Copilot a few times a week). We expected that most developers would be active users. However, we found that the number of active users was just under 50 percent. Our software review found that AI provided a measurable increase in productivity for junior developers performing simpler programming tasks. We observed much lower productivity increases with senior engineers working on complex code structures. This is in line with research by the management consulting firm McKinsey & Co. When we asked the engineers about the relatively low Copilot usage, 75 percent of them said they would use Copilot much more if the suggestions were more relevant to their coding needs. This doesn't necessarily contradict GitHub's findings: AMD software is quite specialized, and so it's understandable that applying a standard AI tool like Github Copilot, which is trained using publicly available data, wouldn't be that helpful. For example, AMD's graphics-software team develops low-level firmware to integrate our GPUs into computer systems, low-level software to integrate the GPUs into operating systems, and software to accelerate graphics and machine learning operations on the GPUs. All of this code provides the base for applications, such as games, video conferencing, and browsers, to use the GPUs. AMD's software is unique to our company and our products, and the standard copilots aren't optimized to work on our proprietary data. To overcome this issue, we will need to train tools using internal datasets and develop specialized tools focused on AMD use cases. We are now training a coding assistant in-house using AMD use cases and hope this will improve both adoption among developers and resulting productivity. But the survey results made us wonder: How much of a developer's job is writing new lines of code? To answer this question, we took a closer look at our software-development life cycle. AMD's software-development life cycle consists of five stages. We start with a definition of the requirements for the new product, or a new version of an existing product. Then, software architects design the modules, interfaces, and features to satisfy the defined requirements. Next, software engineers work on development, the implementation of the software code to fulfill product requirements according to the architectural design. This is the stage where developers write new lines of code, but that's not all they do: They may also refactor existing code, test what they've written, and subject it to code review. Next, the test phase begins in earnest. After writing code to perform a specific function, a developer writes a unit or module test -- a program to verify that the new code works as required. In large development teams, many modules are developed or modified in parallel. It's essential to confirm that any new code doesn't create a problem when integrated into the larger system. This is verified by an integration test, usually run nightly. Then, the complete system is run through a regression test to confirm that it works as well as it did before new functionality was included, a functional test to confirm old and new functionality, and a stress test to confirm the reliability and robustness of the whole system. Finally, after the successful completion of all testing, the product is released and enters the support phase. Even in the development and test phases, developing and testing new code collectively take up only about 40 percent of the developer's work. The standard release of a new AMD Adrenalin graphics-software package takes an average of six months, followed by a less-intensive support phase of another three to six months. We tracked one such release to determine how many engineers were involved in each stage. The development and test phases were by far the most resource intensive, with 60 engineers involved in each. Twenty engineers were involved in the support phase, 10 in design, and five in definition. Because development and testing required more hands than any of the other stages, we decided to survey our development and testing teams to understand what they spend time on from day to day. We found something surprising yet again: Even in the development and test phases, developing and testing new code collectively take up only about 40 percent of the developer's work. The other 60 percent of a software engineer's day is a mix of things: About 10 percent of the time is spent learning new technologies, 20 percent on triaging and debugging problems, almost 20 percent on reviewing and optimizing the code they've written, and about 10 percent on documenting code. Many of these tasks require knowledge of highly specialized hardware and operating systems, which off-the-shelf coding assistants just don't have. This review was yet another reminder that we'll need to broaden our scope beyond basic code autocomplete to significantly enhance the software-development life cycle with AI. Generative AI, such as large language models and image generators, are getting a lot of airtime these days. We have found, however, that an older style of AI, known as discriminative AI, can provide significant productivity gains. While generative AI aims to create new content, discriminative AI categorizes existing content, such as identifying whether an image is of a cat or a dog, or identifying a famous writer based on style. We use discriminative AI extensively in the testing stage, particularly in functionality testing, where the behavior of the software is tested under a range of practical conditions. At AMD, we test our graphics software across many products, operating systems, applications, and games. For example, we trained a set of deep convolutional neural networks (CNNs) on an AMD-collected dataset of over 20,000 "golden" images -- images that don't have defects and would pass the test -- and 2,000 distorted images. The CNNs learned to recognize visual artifacts in the images and to automatically submit bug reports to developers. We further boosted test productivity by combining discriminative AI and generative AI to play video games automatically. There are many elements to playing a game, including understanding and navigating screen menus, navigating the game world and moving the characters, and understanding game objectives and actions to advance in the game. While no game is the same, this is basically how it works for action-oriented games: A game usually starts with a text screen to choose options. We use generative AI large vision models to understand the text on the screen, navigate the menus to configure them, and start the game. Once a playable character enters the game, we use discriminative AI to recognize relevant objects on the screen, understand where the friendly or enemy nonplayable characters may be, and direct each character in the right direction or perform specific actions. To navigate the game, we use several techniques -- for example, generative AI to read and understand in-game objectives, and discriminative AI to determine mini-maps and terrain features. Generative AI can also be used to predict the best strategy based on all the collected information. Overall, using AI in the functional testing stage reduced manual test efforts by 15 percent and increased how many scenarios we can test by 20 percent. But we believe this is just the beginning. We're also developing AI tools to assist with code review and optimization, problem triage and debugging, and more aspects of code testing. Once we reach full adoption and the tools are working together and seamlessly integrated into the developer's environment, we expect overall team productivity to rise by more than 25 percent. For review and optimization, we're creating specialized tools for our software engineers by fine-tuning existing generative AI models with our own code base and documentation. We're starting to use these fine-tuned models to automatically review existing code for complexity, coding standards, and best practices, with the goal of providing humanlike code review and flagging areas of opportunity. Similarly, for triage and debugging, we analyzed what kinds of information developers require to understand and resolve issues. We then developed a new tool to aid in this step. We automated the retrieval and processing of triage and debug information. Feeding a series of prompts with relevant context into a large language model, we analyzed that information to suggest the next step in the workflow that will find the likely root cause of the problem. We also plan to use generative AI to create unit and module tests for a specific function in a way that's integrated into the developer's workflow. These tools are currently being developed and piloted in select teams. Once we reach full adoption and the tools are working together and seamlessly integrated into the developer's environment, we expect overall team productivity to rise by more than 25 percent. The promise of 25 percent savings does not come without risks. We're paying particular attention to several ethical and legal concerns around the use of AI. First, we're cautious about violating someone else's intellectual property by using AI suggestions. Any generative AI software-development tool is necessarily built on a collection of data, usually source code, and is generally open source. Any AI tool we employ must respect and correctly use any third-party intellectual property, and the tool must not output content that violates this intellectual property. Filters and protections are needed to ensure compliance with this risk. Second, we're concerned about the inadvertent disclosure of our own intellectual property when we use publicly available AI tools. For example, certain generative AI tools may take your source code input and incorporate it into its larger training dataset. If this is a publicly available tool, it could expose your proprietary source code or other intellectual property to others using the tool. Third, it's important to be aware that AI makes mistakes. In particular, LLMs are prone to hallucinations, or providing false information. Even as we off-load more tasks to AI agents, we'll need to keep a human in the loop for the foreseeable future. Lastly, we're concerned with possible biases that the AI may introduce. In software-development applications, we must ensure that the AI's suggestions don't create unfairness, that generated code is within the bounds of human ethical principles and doesn't discriminate in any way. This is another reason a human in the loop is imperative for responsible AI. Keeping all these concerns front of mind, we plan to continue developing AI capabilities throughout the software-development life cycle. Right now, we're building individual tools that can assist developers in the full range of their daily tasks -- learning, code generation, code review, test generation, triage, and debugging. We're starting with simple scenarios and slowly evolving these tools to be able to handle more-complex scenarios. Once these tools are mature, the next step will be to link the AI agents together in a complete workflow. The future we envision looks like this: When a new software requirement comes along, or a problem report is submitted, AI agents will automatically find the relevant information, understand the task at hand, generate relevant code, and test, review, and evaluate the code, cycling over these steps until the system finds a good solution, which is then proposed to a human developer. Even in this scenario, we will need software engineers to review and oversee the AI's work. But the role of the software developer will be transformed. Instead of programming the software code, we will be programming the agents and the interfaces among agents. And in the spirit of responsible AI, we -- the humans -- will provide the oversight.
[2]
How AI-driven development tools impact software observability - SiliconANGLE
How AI-driven development tools impact software observability "I made this whole program in 5 minutes with just a few (Insert GenAI tool) prompts. Any developer not using AI tools to replace developers will find themselves out of a job in two years" - random AI fanboy on X Let's face it, the next few years are going to be really tough for software-driven companies and software engineers. Even the most successful startups on their way up will be asked to deliver more software with fewer development resources. That means we can expect to see more artificial intelligence tooling being used in development, in an attempt either to enhance developer productivity or to replace some work hours with AI-driven automation and agents. Some stories about generative AI hallucinations are making the rounds, for instance when an Air Canada chatbot speciously offered a customer a refund, which resulted in a penalty when it tried to rescind the offer. Or Microsoft's experimental Tay chatbot, which became progressively more "racist" through dialogue with bias-trolling users. Haha, funny. We know large language model chatbots have insanely complex models that are largely opaque to conventional testing and observability tools. But enough said about the risks of putting AI-based applications in front of customers. Let's shift left, and explore how the use of AI development tools within development processes is affecting software observability and see if we can figure out why these problems are happening. As humans developing software, we never expected to be as fully engaged as we are now. Thanks to the evolution of automation and agile DevOps practices, per-developer productivity is at an all time high. So where else can we go from here with AI assistance? Let's look for better data than some fanboy on X saying he developed a whole app in five minutes. The recent 2024 DORA Report, with a massive survey audience underwritten by Google, does highlight significant improvements in documentation quality, code quality, and code review speed. Then, the report says: "However, despite AI's potential benefits, our research revealed a critical finding: AI adoption may negatively impact software delivery performance. As AI adoption increased [for each 25% increment], it was accompanied by an estimated decrease in delivery throughput by 1.5%, and an estimated reduction in delivery stability by 7.2%." As it turns out, AI-generated code within applications, when infused with complex probabilistic weighting and nondeterministic thinking, are less observable than conventional applications that contain rules-based logic. It's not just that AI coding and configuration assistants can make mistakes. The real problem with AI driven development is confidence. Since generative AI is designed to produce answers that are plausible and believable to the user, the AI will seem quite confident it is providing the right code and the right answers unless told to confidently investigate its own "thinking." We could go deep on so many aspects of AI's impact on observability that we'd still only be scratching the surface. So, to further complicate matters, I talked to several leading vendors involved in making observability and software quality solutions. When using AI for development, the problem of alignment becomes especially sticky, because the AI-driven tools used for code, configuration or operations are twice-removed from the intention of the end user or customer. In other words, the AI should align with the intentions of the developer, who in turn is aligning the AI-powered software with its intended business purpose. SmartBear was one of the first vendors to publish specific guidelines on how it would apply AI for development of its own software, before it started releasing AI-driven tools to software delivery teams. "You can still get trapped in viewing observability through the lens of error tracking to make sure there's no failures -- and that presupposes that every other part of what you're doing in the SDLC is adding more value to your customers when you definitely cannot hold that as constant," said Vineeta Puranik, chief technology officer at SmartBear. "How do I know that all the code we're writing, whether it's AI-generated or human generated, is actually achieving those goals and making customers feel like they are getting more value over time out of the service?" While AI routines have proven quite effective at taking real user monitoring traffic, generating a suite of possible tests and synthetic test data, and automating test runs on each pull request, any such system still requires humans who understand the intended business outcomes to use observability and regression testing tools to look for unintended consequences of change. "So the system just doesn't behave well," Puranik said. "So you fix it up with some prompt engineering. Or maybe you try a new model, to see if it improves things. But in the course of fixing that problem, you did not regress something that was already working. That's the very nature of working with these AI systems right now -- fixing one thing can often screw up something else where you didn't know to look for it." There's a new phenomenon everyone wants to try: vibecoding. Some software vendors act like vibecoding just isn't really happening in the field, while some low-code vendors are leveraging AI to help "citizen developers" build apps from exactly that perspective, so AI can operate within the guardrails of their toolkits. "Vibecoding is not just doing autocomplete on lines of code, it's developing entire new services and configuring infrastructure with just prompts," said Camden Swita, director and head of AI/ML at New Relic. "Since a vibecoder has no requirement to understand the stack, the person may not even understand the best practices of observability or instrumentation, or how to zero in on an issue later, like an SRE [site reliability engineer]. The need for good observability baked into the process is important." To address this, New Relic has added an elaborate stack tracing engine within their AI monitoring solution to help engineers understand how AI agents are interfacing with different architectural elements such as vector databases, retrieval-augmented generation and external service interfaces in production. Sure, vibecoding might replace some developers who are delivering less mission critical apps, but it seems like it might also create a new cottage industry cleaning up the mess. Here's a dev with a compelling offer making the rounds: "I can't wait to fix your vibe code for $200 an hour." We've been using AIOps-style routines productively for years to filter and tag telemetry data for better relevance in observability work. Agentic AI, meaning AI-based agents, promises to further offload some engineering work by autonomously handling multiple tasks in a workflow for an investigation, such as comparing codebases for change, documenting and escalating incidents with stack trace reports, and generating test cases. Here's my concern: It's like asking AI agents to monitor AI-filtered telemetry, for applications with code generated by AI, and tested with AI-generated tests -- sort of like a wyvern eating its own tail. A human still needs to be involved to keep the agent on course. "Let's say, 'agent, write some code that satisfies these tests, please,'" said Phillip Carter, principal product manager of open telemetry and AI at Honeycomb. "And it does. Except one problem. It looked at all of my test cases and it planted those as 'if-statements' inside of the function. Oh no, when I told it to satisfy the test case, it was very literal in interpreting what I was saying, and it just wrote the code that makes the test pass. I have basically created a tautological system that does perform per the spec. And that's simpler than talking about things like Kubernetes configuration changes." Carter added that there can be a legitimate acceleration of tasks, "but some people would argue the bottleneck has never been in the code generation side of it, as it shifts the bottleneck toward verification and understanding what should actually be happening. This highlights a use case where we'll never really get away from needing experienced people." Honeycomb's observability platform allows engineers to drop into code-level analysis from a heat map, and it recently added an AI-enhanced natural language query function, trained on gathering telemetry for tying specific development, SRE and ops use cases to service-level objectives. Well before the current AI hullabaloo, we were already seeing crossover between the space formerly known as test automation and observability. Real user monitoring and synthetic test data and generated test scenarios are getting "shifted left" for pre-production awareness, as well as "shifted right" to provide better observability and test feedback from production. Katalon just put out an extensive 2025 State of Software Quality 2025 report that clearly indicates that QA is a bright spot for AI development, with more than 75% of respondents reporting using some AI-driven testing tools. Respondents who used AI testing tools reported prioritizing test planning and design less (36%) than non-AI users (44%), indicating some reduction in manual effort through AI. These findings support their idea of the "hybrid tester" who will zip together several different AI models and agents with conventional test automation. The aim will be to enhance observability coverage, shorten test and delivery cycle times, and accelerate documentation and feedback loops, alongside conventional test automation and manual testing tasks. Katalon itself has taken on a composite AI approach. Its agentic AI acts as the key in a "zipper" that stitches together prompts and responses from many different AI-driven testing, monitoring and observability tools within the context of validating a business scenario or service-level objective. Software development, like any creative work, follows the golden triangle of software development. You can either have it fast, good or cheap -- or at most, two out of the three. What observability points out for AI-driven development is that you can definitely deliver software faster, and perhaps cheaper (the jury is still out on that in the long run), but better software may remain just out of reach in many cases without clarity on who owns business and service level objectives for these tools. "It's not that different than other areas of specialization coming into the software lifecycle, just like observability led to SREs trying to figure out what is going on within the stack," said Patrick Lin, senior vice president and general manager of observability at Splunk, a Cisco Systems company. "The idea of a full stack developer may expand to include AI skills as a prerequisite. At the same time, you will still have DBAs and network operations teams that are specialists." Even when developing with AI tools, added Hao Yang, head of AI at Splunk, "we've always relied on human gatekeepers to ensure performance. Now, with agentic AI, teams are finally automating some tasks, and taking the human out of the loop. But it's not like engineers don't care. They still need to monitor more, and know what an anomaly is, and the AI needs to give humans the ability to take back control. It will put security and observability back at the top of the list of critical features." In practice, the golden signals of software observability (latency, traffic, errors and saturation) are still the same, but Yang also highlights new ones for looking at AI responses: relevance, quality, hallucination and toxicity. Here's an interesting quandary: if I use a copilot in GitHub, or a tool such as Cursor, who should take responsibility if there are faults in the application, or the wrong infrastructure is implemented? Whom does an SRE call first? "We still have a lot of SREs and engineers who do not trust the computer with that kind of reasoning. You can still use LLMs to suggest approaches, but the more automated and complex your system becomes, the more you need humans in the loop," said Tom Wilkie, chief technology officer at Grafana Labs. "The LLM may have written some of the code, but if there's a bug in it, that's still my code and pull request." Still, I have to wonder, who actually owns the code, and the intellectual property it represents in a product, if the developer approves a lengthy terms-of-use attestation during signup? "As a management team, we decided to take a risk-tolerant approach to AI tools," said Wilkie. "Also, we are open source, and 90% of our code will be out there in public, so we have no concerns about engineers using these tools and leaking proprietary code to an LLM. We can attract engineers to us because we are open source." To whatever extent you can use open source tools with AI-assisted coding, it makes the value of contributions higher, since they will certainly be vetted and hardened by a community of thousands or millions of developers. Nobody wants to use open source tooling that real human contributors won't stand behind. No matter what, it's only going to get harder for developers. More competitive. Some companies will lay off developers because of AI, or the promise of it. It's really amazing how quickly almost any software company of significant size already has a "head of AI" leadership role present. It took about five years after DevOps or cloud appeared on the scene before we saw director-level appointments with those buzzwords in their titles. The illusory "five years of experience developing with AI" will become a seldom-achieved requirement on some developer job recs. Even AI development companies such as Anthropic have had to tell job applicants to not use AI when answering questions on their recruiting portal. So many billions of dollars have been invested in AI development tooling that it is unlikely that any of the purported beneficiaries of reduced workforces and timelines -- or the media and analysts that participated in hyping out-of-the-box AI application delivery -- are going to tell the market that end-customer codebases are becoming cursed with intractable problems. At least, not until we have more high-profile production failures caused by AI development tools without enough human oversight and governance. That's why AI-aware observability and shift-left production-style testing are more important than ever before in heading off functional errors and configuration drift before they get replicated everywhere.
Share
Share
Copy Link
An exploration of AI's impact on software development, from code generation to observability, highlighting both potential benefits and unexpected challenges.
The integration of AI into software development has sparked both excitement and skepticism in the tech industry. Companies like Advanced Micro Devices (AMD) are exploring AI's potential to revolutionize their software development processes 1. Initial results suggest that AI coding assistants can significantly boost productivity, with GitHub research indicating a potential doubling of developer output 1.
However, AMD's internal studies reveal a more nuanced picture. While junior developers working on basic tasks benefited greatly from AI assistants, senior developers tackling specialized projects saw minimal productivity gains 1. This discrepancy highlights the need for AI tools tailored to specific codebases and development environments.
AMD's approach extends beyond mere code generation. They're adapting large language models for code review, optimization, and bug report generation. Additionally, they're exploring discriminative AI for testing, particularly in assessing video game performance on their hardware 1.
As AI tools become more prevalent in development processes, new challenges emerge, particularly in software observability. AI-generated code, infused with complex probabilistic weighting and nondeterministic thinking, proves less observable than conventional rules-based logic 2.
Contrary to expectations, increased AI adoption in software development may lead to decreased delivery throughput and stability. The 2024 DORA Report indicates that for every 25% increase in AI adoption, there's an estimated 1% decrease in delivery throughput and a 7% reduction in delivery stability 2.
A significant challenge with AI-driven development is the issue of confidence. AI tools are designed to produce plausible and believable answers, which can lead to overconfidence in their output without proper verification 2.
SmartBear's CTO, Vineeta Puranik, emphasizes the importance of aligning AI-generated code with intended business outcomes. She warns against the trap of focusing solely on error tracking, stressing the need to ensure that AI-generated code truly adds value to customers 2.
A new trend called 'vibecoding' is emerging, where developers use AI to create entire services and configure infrastructure using prompts. This approach, while promising, raises concerns about maintaining code quality and understanding the underlying systems 2.
Despite challenges, companies like AMD remain optimistic about AI's potential in software development. They aim for a 25% productivity boost over the next few years by implementing AI throughout the software development lifecycle 1. The long-term vision involves creating an autonomous software-development machine, with human oversight to ensure quality and alignment with business goals.
Reference
[1]
Generative AI is revolutionizing software development, offering significant productivity gains but also raising concerns about code quality and security. The impact varies based on developer experience and organizational readiness.
3 Sources
3 Sources
AI is revolutionizing the programming landscape, offering both opportunities and challenges for entry-level coders. While it simplifies coding tasks, it also raises the bar for what constitutes an "entry-level" programmer.
2 Sources
2 Sources
A comprehensive look at the latest developments in AI, including OpenAI's internal struggles, regulatory efforts, new model releases, ethical concerns, and the technology's impact on Wall Street.
6 Sources
6 Sources
A comprehensive look at the current state of AI adoption in enterprises, covering early successes, ROI challenges, and the growing importance of edge computing in AI deployments.
4 Sources
4 Sources
DeepSeek's emergence disrupts the AI market, challenging industry giants and raising questions about AI's future development and societal impact.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved