GPT-5 Launch: Mixed Reviews on Coding and Analysis Capabilities

Reviewed by Nidhi Govil

7 Sources

[1]

ZDNet

I went hands-on with ChatGPT Codex and the vibe was not good - here's what happened

ChatGPT Codex wrote code and saved me time.It also created a serious bug, but it was able to recover.Codex is still based on the GPT-4 LLM architecture. Well, vibe coding this is not. I found the experience to be slow, cumbersome, stressful, and incomplete. But it all worked out in the end. ChatGPT Codex is ChatGPT's agentic tool dedicated to code writing and modification. It can access your GitHub repository, make changes, and issue pull requests. You can then review the results and decide whether or not to incorporate them. Also: How to move your codebase into GitHub for analysis by ChatGPT Deep Research - and why you should My primary development project is a PHP and JavaScript-based WordPress plugin for site security. There's a main plugin available for free, and some add-on plugins that enhance the capabilities of the core plugin. My private development repo contains all of this, as well as some maintenance plugins I rely on for user support. This repo contains 431 files. This is the first time I've attempted to get an AI to work across my entire ecosystem of plugins in a private repository. I previously used Jules to add a feature to the core plugin, but because it only had access to the core plugin's open source repository, it couldn't take into account the entire ecosystem of products. Earlier last week, I decided to give ChatGPT Codex a run at my code. Then this happened. On Thursday, GPT-5 slammed into the AI world like a freight train. Initially, OpenAI tried to force everyone to use the new model. Subsequently, they added legacy model support when many of their customers went ballistic. I ran GPT-5 against my set of programming tests, and it failed half of them. So, I was particularly curious about whether Codex still supported the GPT-4 architecture or would force developers into GPT-5. However, when I queried Codex five days after GPT-5 launched, the AI responded that it was still based on "OpenAl's GPT-4 architecture." I took two things from that: With that, here is the result of my still-very-much-not-GPT-5 look at ChatGPT Codex. My first step was asking ChatGPT Codex to examine the codebase. I used the Ask mode of Codex, which does analysis, but doesn't actually change any code. I was hoping for something as deep and comprehensive as the one I received from ChatGPT Deep Research a few months ago, but instead, I received a much less complete analysis. I found a more effective approach was to ask Codex to do a quick security audit and let me know if there were any issues. Here's how I prompted it. Identify any serious security concerns. Ignore plugins Anyone With Link, License Fixer, and Settings Nuker. Anyone With Link is in the very early stages of coding, and is not ready for code review. License Fixer and Settings Nuker are specialty plugins that do not need a security audit. Codex identified three main areas for improvement. All three areas were valid, although I am not prepared to modify the serialization data structure at this time, because I'm saving that for a whole preferences overhaul. The $_POST complaint is managed, but with a different approach than Codex noticed. Also: The best AI for coding in 2025 (and what not to use) The third area -- the nonce and cross-site request forgery (CSRF) risk -- was something worth changing right away. While access to the user interface for the plugin is assumed to be determined by login role, the plugins themselves don't explicitly check that the person submitting the plugin settings for action is allowed to do so. That's what I decided to invite Codex to fix. Next up, I instructed Codex to make fixes in the code. I changed the setting from Ask mode to Code mode so the AI would actually attempt changes. As with ChatGPT Agent, Codex spins up a virtual terminal to do some of its work. When the process completed, Codex showed a diff (the difference between original and to-be-modified code). I was heartened to see that the changes were quite surgical. Codex didn't try to rewrite large sections of the plugin; it just modified the small areas that needed improvement. In a few areas, it dug in and changed a few more lines, but those changes were still pretty specific to the original prompt. At one point, I was curious to know why it added a new foreach loop to iterate over an array, so I asked. As you can see above, I got back a fairly clear response on its reasoning. It made sense, so I moved on, continuing to review Codex's proposed changes. All told, Codex proposed making changes to nine separate files. Once I was satisfied with the changes, I clicked Create PR. That creates a pull request, which is how any GitHub user suggests changes to a codebase. Once the PR is created, the project owner (me, in this case) has the option to approve those changes, which adds them into the actual code. It's a good mechanism, and Codex does a clean job of working within GitHub's environment. Once I was convinced the changes were good, I merged Codex's work back into the main codebase. I brought the changes down from GitHub to my test machine and tried to run the now-modified plugin. Wait for it... Yeah. That's not what's supposed to happen. To be fair, I've generated my own share of error screens just like that, so I can't really get angry at the AI. Instead, I took a screenshot of the error and passed it to Codex, along with a prompt telling Codex, "Selective Content plugin now fails after making changes you suggested. Here are the errors." It took the AI three minutes to suggest a fix, which it presented to me in a new diff. I merged that change into the codebase, once again brought it down to my test server, and it worked. Crisis averted. When I'm not in a rush and I have the time, coding can provide a very pleasant state of mind. I get into a sort of flow with the language, the machine, and what seems like a connection between my fingers and the computer's CPU. Not only is it a lot of fun, but it can also be emotionally transcendent. Working with ChatGPT Codex was not fun. It wasn't hateful. It just wasn't fun. It felt more like exchanging emails with a particularly recalcitrant contractor than having a meeting of the minds with a coding buddy. Also: How to use GPT-5 in VS Code with GitHub Copilot Codex provided its responses in about 10 or 15 minutes, whereas the same code would probably have taken me a few hours. Would I have created the same bug as Codex? Probably not. As part of the process of thinking through that algorithm, I most likely would have avoided the mistake Codex made. But I undoubtedly would have created a few more bugs based on mistyping or syntax errors. To be fair, had I introduced the same bug as Codex did, it would have taken me considerably longer than three minutes to find and fix it. Add another hour or so at least. So Codex did the job, but I wasn't in flow. Normally, when I code and I'm inside a particular file or subsystem, I do a lot of work in that area. It's like cleaning day. If you're cleaning one part of the bathroom, you might as well clean all of it. But Codex clearly works best with small, simple instructions. Give it one class of change, and work through that one change before introducing new factors. Like I said, it does work and it is a useful tool. But using it definitely felt like more of a chore than programming normally does, even though it saved me a lot of time. Also: Google's Jules AI coding agent built a new feature I could actually ship - while I made coffee I don't have tangible test results, but after testing Google's Jules in May and ChatGPT's Codex now, I get the impression that Jules is able to get a deeper understanding of the code. At this point, I can't really support that assertion with a lot of data; it's just an impression. I'm going to try running another project through Jules. It will be interesting to see if Codex changes much once OpenAI feels safe enough to incorporate GPT-5. Let's keep in mind that OpenAI eats its own dog food with Codex, meaning it uses Codex to build its code. They might have seen the same iffy results I found in my tests. They might be waiting until GPT-5 has baked for a bit longer. Have you tried using AI coding tools like ChatGPT Codex or Google's Jules in your development workflow? What kinds of tasks did you throw at them? How well did they perform? Did you feel like the process helped you work more efficiently? Did it slow you down and take you out of your coding flow? Do you prefer giving your tools small, surgical jobs, or are you looking for an agent that can handle big-picture architecture and reasoning? Let us know in the comments below.

[2]

ZDNet

I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o (for now)

Now that OpenAI has enabled fallbacks to other LLMs, there are options. So GPT-5 happened. It's out. It's released. It's the talk of the virtual town. And it's got some problems. I'm not gonna bury the lede. GPT-5 has failed half of my programming tests. That's the worst that OpenAI's flagship LLM has ever done on my carefully designed tests. Also: The best AI for coding in 2025 (and what not to use) Before I get into the details, let's take a moment to discuss one other little feature that's also a bit wonky. Check out the new Edit button on the top of the code dumps it generates. Clicking the Edit button takes you into a nice little code editor. Here, I replaced the Author field, right in ChatGPT's results. That seemed nice, but it ultimately proved futile. When I closed the editor, it asked me if I wanted to save. I did. Then this unhelpful message showed up. I never did get back to my original session. I had to submit my original prompt again, and let GPT-5 do its work a second time. But wait. There's more. Let's dig into my test results... This was my very first test of coding prowess for any AI. It's what gave me that first "the world is about to change" feeling, and it was done using GPT-3.5. Subsequent tests, using the same prompt but with different AI models, generated mixed results. Some AIs did great, some didn't. Some AIs, like those from Microsoft and Google, improved over time. Also: How I test an AI chatbot's coding ability - and you can, too ChatGPT's model has been the gold standard for this test since the very beginning. That makes the results of GPT-5 all that much more curious. So, look, the actual coding with GPT-5 was partially successful. GPT-5 generated a single block of code, which I pasted into a file and was able to run. It provided the requisite UI. When I pasted in the test names, it dynamically updated the line count, although it described it as "Line to randomize" instead of "Lines to randomize." But then, when I clicked Randomize, it didn't. Instead, it redirected me to tools.php. What?? ChatGPT has never had a problem with this test, whether GPT-3.5, GPT-4, or GPT-4o. You mean to tell me that OpenAI's much-anticipated GPT-5 is failing right out of the gate? Ouch. I then gave GPT-5 this prompt. When I click randomize, I'm taken to http://testsite.local/wp-admin/tools.php. I do not get a list of randomized results. Can you fix? The result was a line to patch. I'm not thrilled with that approach because it requires the user to dig through code and to make no mistakes replacing a line. So, I asked GPT-5 for a full plugin. It gave me the full text of the plugin to copy and paste. This time, it worked. This time, it did randomize the lines. When it encountered duplicates, it separated them from each other, as it was instructed. Finally. Also: I found 5 AI content detectors that can correctly identify AI text 100% of the time I'm sorry, OpenAI. I have to fail you on this test. You would have passed if the only error was not using the plural of "line" when appropriate. But the fact that it gave me back a non-working plugin on the first try is fail territory, even if the AI did eventually make it work on the second try. No matter how you spin it, this is a step back. This second test is designed to rewrite a string function to better check for dollars and cents. The original code that GPT-5 was asked to rewrite did not allow for cents (it only checked for integers). GPT-5 did fine with this test. It did return a minimal result because it didn't do any error checking. It didn't check for non-string input, extra whitespace, thousands separators, or currency symbols. But that's not what I asked for. I told it to rewrite a function, which itself did not have any error checking. GPT-5 did exactly what I asked with no embellishment. I'm kind of glad of that because it doesn't know whether or not code prior to this routine already did that work. GPT-5 passed this test. This test came about because I was struggling with a less-than-obvious bug in my code. Without going into the weeds about how the WordPress framework works, the obvious answer is not the right answer. You need some fairly arcane knowledge about how WordPress filters pass their information. This test has been a stumbling block for more than a few AI LLMs. Also: Gen AI disillusionment looms, according to Gartner's 2025 Hype Cycle report GPT-5, however, like GPT-4 and GPT-4o before it, did understand the problem. It articulated a clear solution. GPT-5 passed this test. This test asks the AI to incorporate a fairly obscure Mac scripting tool called Keyboard Maestro, as well as Apple's scripting language AppleScript, and Chrome scripting behavior. It's really a test of the reach of the AI in terms of knowledge, its understanding of how web pages are constructed, and the ability to write code across three interlinked environments. Quite a few AIs have failed this test, but the failure point is usually a lack of knowledge about Keyboard Maestro. GPT-3.5 didn't know about Keyboard Maestro. But ChatGPT has been passing this test since GPT-4. Until now. Where should we start? Well, the good news is that GPT-5 handled the Keyboard Maestro part of the problem just fine. But it got the coding so wrong that it even doubled down on its lack of understanding of how case works in AppleScript. It actually invented a property. This is one of those cases where an AI confidently presents an answer that is completely wrong. Also: ChatGPT comes with personality presets now - and other upgrades you might have missed AppleScript is natively case-insensitive. If you want AppleScript to pay attention to case, you need to use a "considering case" block. So, this happened. The reason the error message referred to the title of one of my articles is because that was the front window in Chrome. This function checks the front window and does stuff based on the title. But misunderstanding how case works wasn't the only AppleScript error GPT-5 generated. It also referenced a variable named searchTerm without defining it. That's pretty much an error-creating practice in any programming language. Fail, fail, fail, McFaildypants. OpenAI seemed to suffer from the same hubris that its AIs do. It confidently moved everyone to GPT-5 and burned the bridges back to GPT-4o. I'm paying $200 a month for a ChatGPT Pro account. On Friday, I couldn't move back to GPT-4o for coding work. Neither could anyone else. There was, however, just a tiny bit of user pushback on the whole bridges burning thing. And by tiny, I mean the entire frickin' internet. So, by Saturday, ChatGPT had a new option. To get to this, go to your ChatGPT settings and turn on "Show legacy models." Then, as it has always been, just drop down the model menu and choose the one you want. Note: this option is only available to those on paid tiers. If you're using ChatGPT for free, you'll take what you're given, and you'll love it. Ever since the whole generative AI thing kicked off at the beginning of 2023, ChatGPT has been the gold standard of programming tools, at least according to my LLM testing. Also: Microsoft rolls out GPT-5 across its Copilot suite - here's where you'll find it Now? I'm really not sure. This is only a day or so after GPT-5 has been released, so its results will probably get better over time. But for now, I'm sticking with GPT-4o for coding, although I do like the deep reasoning capabilities in GPT-5. What about you? Have you tried GPT-5 for programming tasks yet? Did it perform better or worse than previous versions like GPT-4o or GPT-3.5? Were you able to get working code on the first try, or GPT-4o did you have to guide it through fixes? Are you going to use GPT-5 for coding or stick with older models? Let us know in the comments below.

[3]

ZDNet

GPT-5 bombed my coding tests, but redeemed itself with code analysis

With the big news that OpenAI has released GPT-5, the team here at ZDNET is working to learn about and communicate its strengths and weaknesses. In another article, I put its programming prowess to the test and came up with a less-than-impressive result. Also: I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o When Deep Research first appeared with the OpenAI o3 LLM, I was quite impressed with what it could understand from examining a code repository. I wanted to know how well it understood the project just from the available code. In this article, I'm examining how well the three GPT-5 variants do in examining that same code repository. We'll dig in and compare them. The results are quite interesting. Here are the four models. I gave all four models the same assignment. I connected them to my private GitHub repository for my open-source free WordPress security plugin and its freemium add-on modules, selected Deep Research, and gave them this prompt. Examine the repository and learn its structure and architecture. Then report back what you've learned. For those models that asked to choose areas of detail about what I wanted, I gave them this prompt. Everything you can tell me, be as comprehensive as possible. As you can see, I didn't provide any context other than the source code repo itself. That code has a README file, as well as comments throughout the code, so there was some English-language context. But most of the context has to be derived from the folder structure, file names, and code itself. Also: The best AI for coding in 2025 (and what not to use) From that, I hoped that the AIs would assess its structure, quality, security posture, extensibility, and possibly suggest improvements. This should be relevant to ZDNET readers because it's the kind of high-judgement, detail-oriented work that AIs are being used for. It certainly can make coming up to speed on an existing coding project easier, or at least provide a foundation for initial understanding. Other than the two prompts above, I didn't give the LLMs any guidance about what to tell me. I wanted to see how they evaluated the repository and what sort of analysis they could provide. As you can see from this table, overall coverage was quite varied in scope. More checks mean more depth of coverage. To create this aggregate, topics like "Project Purpose & Architecture," "System Architecture," and "Plugin Design & Integration" were all normalized under Purpose/Architecture. Directory/File Structure contained any section mapping folders and files. Execution flow combines anything about how the software code runs. Recommendations/Issues combines all discussions of modernization suggestions, open issues, and minor red flags. In terms of overall value, I'd rank the four LLMs as follows (from best to least best). Pro, of course, is only available in the $200/mo ChatGPT Pro tier. Later in this article, I'll show one way to modify the above prompts to get GPT-5 (non-Pro) to provide a fairly close approximation of the overall depth of the Pro response. GPT-5 Thinking, which is a model available in the $20/mo Plus plan, was the least helpful of the group. The GPT-4 generation o3 Deep Thinking model still holds up, but you can see how its self-directed focus is a bit different from the other two. Also: Google's Jules AI coding agent built a new feature I could actually ship - while I made coffee My main conclusion is that I was a bit surprised about how close the models were to each other. GPT-5, as OpenAI promised, did seem to provide a jump in overall cognition and usefulness, but nothing I would consider game-changing. With that, let's dive into some specific examples that help illustrate my conclusions. Each of these sections is pulled from the various reports generated and shows you how each model provided similar information. Below, you can see how GPT-5 Pro names exact mechanisms (like file-top guard, nonces, manage_options). I've provided exact snippets here from the reports generated by all four models. GPT-5 affirms best practices but keeps it conceptual. o3 describes what happens (redirects/login flow) more than how it's hardened. GPT-5 Thinking gives the clearest "what runs when" story for new developers coming up to speed on the repo. "It guards against direct file access (if (!defined('ABSPATH')) exit; at the top of PHP files). It sanitizes input where appropriate ... using WordPress nonces in AJAX handlers. It uses capability checks (e.g., adding menu pages only for users with manage_options capability ... only admins see those tools). The code tries not to load unnecessary things ... like only loading certain admin files on certain contexts." "There are checks for WordPress functions before use ... so the plugin behaves gracefully even on very old WordPress setups. The plugins often guard against direct file access by checking ... to prevent security issues from accessing .php files directly. Add-ons verify the presence of core before proceeding ... and show an admin error if CMB2 isn't loaded." "Purpose: My Private Site locks down an entire site so only logged-in users can view content ... while protecting the rest. Overall architecture: [it] integrates deeply with WordPress's hook system and login/logout events to manage redirects and track login state." "Admin vs Front-end: It checks is_admin() to determine context. If on the front-end (not admin), it retrieves the saved privacy setting and, when enabled, hooks at a point like template_redirect to redirect unauthorized visitors. Throughout this initialization, the plugin uses WordPress hooks (actions and filters) to integrate functionality." GPT-5 Pro didn't just describe the system; it walked through the process in sequential operational steps, almost like a short runbook you could hand to a developer or QA tester. GPT-5 confirms the architecture but abstracts the plumbing. GPT-5 Thinking adds a helpful "how add-ons plug into the Licenses tab" detail. o3 largely leaves licensing internals on the cutting room floor in favor of a fairly unhelpful modernization critique. "The core plugin provides utility functions to get and store license keys in a centralized option (jr_ps_licenses) and to contact the EDD license server for validation. Each extension plugin defines its own updater using EDD_SL_Plugin_Updater, passing the current version, the license key from the centralized store, and the EDD store URL. The core plugin's UI has a 'Licenses' tab, and extensions inject their own license fields via filters." "License integration: The core plugin centralizes license management ... and the add-ons piggyback on the core's licensing mechanism, integrating their license fields into the core plugin's interface." The o3 report spends most of its time on modernization and architecture. It discusses configuration and update behavior but does not walk through option keys, updater classes, or the Licenses UI wiring with the same procedural detail as GPT-5 and GPT-5 Pro. So there's nothing here to quote as a demonstration. "The add-ons heavily rely on hooks provided by core or WordPress: They use add_filter/add_action calls to insert their logic ... and use WordPress action hooks to integrate their license fields into the Licenses tab that the core plugin triggers when building the Licenses tab." Both GPT-5 Pro and GPT-5 explicitly pointed out how my code uses "one option array + prune + no-op writes," which is a WordPress best practice for code maintainability. Both o3 and GPT-5 Thinking describe the lifecycle and effects (what's initialized, what loads when) rather than the exact option structure. "Settings are stored in a single serialized option ... initialization routines add default keys, prune deprecated ones, and only update the option in the database if there is an actual change, avoiding unnecessary writes." "State Management: Plugin settings are stored in WordPress options as a central settings array and the code ensures defaults are applied while removing deprecated ones on each load, but only writes to the database when changes occur." "The main plugin initializes defaults (installed version, first-run timestamp, etc.). On each run it ensures these options exist and, if the privacy feature is disabled, the enforcement hook is not added." "Module includes: includes admin and common modules in the back-end; on the front-end it retrieves the saved privacy setting and, when enabled, loads enforcement logic (e.g., in template_redirect). It registers a deactivation hook to clean up on deactivation (e.g., deleting a flag option)." I was unimpressed with GPT-5 when it came to my coding tests. It failed half of my tests, an unprecedentedly bad result for what has previously been the gold standard in passing coding tests. But GPT-5 was quite impressive in its analysis of the GitHub repository. It could be a powerful tool for onboarding new programmers, for someone adopting code, or simply for coming back up to speed on a project that's been untouched for a while. Also: How I test an AI chatbot's coding ability - and you can, too The GPT-4 generation o3 model is known to be a strong reasoning model, which is why it has been the basis for ChatGPT Deep Research. But GPT-5 was able to combine both breadth and detail, which is where o3 and GPT-4o were weak in previous tests. The older models did give accurate summaries and useful suggestions, but they missed interconnections. For example, the older models were never able to show how UI flows, licensing, and update mechanisms work together. Even the base version of GPT-5 was able to identify cross-cutting concerns without additional prompting. Repository structure, backward compatibility, performance characteristics, and state management patterns all appeared in the first draft. Trying to get GPT-4 to span subjects is often an exercise in deep frustration. I found GPT-5's ability to understand and explain a complex interconnected system like my security product, all in one pass, to be a substantial improvement over the GPT-4 generation. Maybe. If you're in a real rush to get to know a project and want as much of a data dump as possible as quickly as possible, yes. If you're operating on a big programming budget and $200/mo doesn't matter to you, yes. But I find that cost hard to bear, especially when I have to subscribe to a wide range of AI services to evaluate them. So, now that I'm nearing the end of my one-month test of Pro-level activities, I'm planning on downgrading back to the $20/mo Plus plan. Also: How to use GPT-5 in VS Code with GitHub Copilot Pro's edge over GPT-5 wasn't about knowing more facts; it was about delivering those facts in a form you can act on immediately. The Pro report didn't just explain that security looked good; it cited the exact guards and checks in the code. It didn't just say licensing was centralized; it mapped the exact functions and database options involved. Again, if you're on a time crunch, you might consider Pro. But I also think you can modify the base GPT-5's responses, with detail like the Pro report produced, simply by using better prompting. That's next... I fed both the GPT-5 and GPT-5 Pro reports into GPT-5 and asked it for a prompt that would push the base-level GPT-5 to give GPT-5 Pro comprehensiveness as a result. This is that prompt, which you should add to any query where you want more complete coding information: *High-Specificity Technical Mode: ***In your answer, combine complete high-level coverage with exhaustive implementation-level detail. This worked fantastically well. It took ChatGPT GPT-5 12 minutes to produce a 15,477-word document, complete with analysis and coding blocks. For example, it describes how value initialization is done, and then shows the code that accomplishes it. I think you could fine-tune this prompt and get Pro-level results without having to pay the $200/mo fee. I'm certainly going to tinker with this idea, possibly using GPT-5 to refine the specifications in the prompt for different areas I want to delve deeply into. I'll let you know how it goes. I had some difficulty setting up sharing for each of these long reports, so I just copied the results into Google Docs and shared them. Here are the links if you want to look at any of these reports. You are welcome to dig into these documents and learn how my project is structured. While you may or may not care about my project, it's instructive to see how the various models perform. While you can read the reports, my actual repo is restricted since it's my private development repository. What about you? Have you tried using GPT-5 or GPT-5 Pro to analyze your own code? How did its insights compare to earlier models like GPT-4 or o3? Do you think the $200/month Pro tier is worth it for the extra precision, or could you get by with better prompts in the base version? Have you found AI code analysis useful for onboarding, refactoring, or improving security? Let us know in the comments below.

[4]

Digital Trends

I've tested OpenAI's claims about GPT-5 -- here's what happened

OpenAI recently launched GPT-5, its latest large language model and a huge update to ChatGPT. While the new update has a lot going for it, claims are one thing, and reality is another. GPT-5 is said to be faster, less prone to hallucination and sycophantic behavior, and able to choose between fast responses and deeper "thinking" on the fly. How many of OpenAI's claims are actually visible when using the chatbot? Let's find out. Claim #1: ChatGPT is now better at following instructions My main problem with ChatGPT, as well as one of the reasons why I recently unsubscribed, is that it's often pretty bad at following basic instructions. Sure, you can prompt engineer it to oblivion and get your desired results (sometimes), but even semi-elaborate prompts often fail to produce desired results. Recommended Videos OpenAI claims that it improved "instruction following" with the release of GPT-5. To that, I say: I don't see it yet. Luckily for me, on the very day I sat down to write this article, I had a fitting interaction with ChatGPT that proves my point here. It's not the only one, though, and I have generally noticed that the longer a conversation goes on, the more ChatGPT forgets what was asked of it. In today's example, I tested ChatGPT's ability to fetch simple information and present it in the required format. I asked it for the specs of the RTX 5060 Ti, which is a recent gaming graphics card. Chaos ensued. To make my prompt even more successful, I showed ChatGPT the exact format I wanted to get my information in by sharing specs for a different GPU. They included things like the exact process node and the generation of ray tracing cores and TOPS. Long story short, it was all pretty specific stuff. Initially, the AI told me that the RTX 5060 Ti doesn't exist yet, which I kind of expected to happen based on its knowledge cutoff. I told it to check online. What I got was pretty barebones. ChatGPT omitted at least four things that I asked for, and gave me the wrong information for one of the specs. Next, I asked it to specify a few things. It gave me the exact same list in return while claiming to have fulfilled my request. The same happened on the third attempt. You can see it in the screenshot above where ChatGPT claims to have included the generation of TOPS and TFLOPS in the list -- it clearly did not. Finally, semi-frustrated, I pasted a screenshot from the official Nvidia website to show it what I was looking for. It still got a couple of things wrong. My initial prompt was semi-precise. I know better than to speak to an AI like it's a person, so I gave it about 150 words' worth of instructions. It still took me several more messages to get something close to my expected result. Verdict: It could still use some work. Claim #2: ChatGPT is less sycophantic ChatGPT was a major "yes man" in previous iterations. It often agreed with users when it didn't need to, driving it deeper and deeper into hallucination. For users who aren't familiar with the inner workings of AI, this could be borderline dangerous -- or, in fact, actually extremely dangerous. Researchers recently carried out a large-scale test of ChatGPT, posing as young teens. Within minutes of simple interactions, the AI gave those "teens" advice on self-harm, suicide planning, and drug abuse. This shows that sycophantic behavior is a major problem for ChatGPT, and OpenAI claims to have curbed some of it with the release of GPT-5. I never tested ChatGPT to such extremes, but I've definitely found that it tended to agree with you, no matter what you said. It took subtle cues during conversation and turned them into a given. It also cheered you on at times when it likely shouldn't have done so. To that end, I have to say that ChatGPT has gone through an entire personality change -- for better or worse. The responses are now overly dry, unengaging, and not especially encouraging. Many users mourn the change, with some Reddit users claiming they "lost their only friend overnight." It's true that the previously ultra-friendly AI is now rather cut-and-dry, and the responses are often short compared to the emoji-infested mini-essays it regularly served up during its GPT-4o stage. Verdict: Definitely less sycophantic. On the other hand, it's also painfully boring. Claim #3: GPT-5 is better at factual accuracy The shocking lack of factual accuracy was another big reason why I chose to stop paying for ChatGPT. On some days, I felt like half the prompts I used produced hallucinations. And it can't all be down to my lack of smart prompting, because I've spent hundreds of hours learning how to prompt AI the right way -- I know how to ask the right questions. Over time, I've learned to only ask about things I already had a vague idea about. For the purpose of today's experiment, I asked about GPU specs. Four out of five queries produced some kind of wrong information, even though all of it is readily available online. Then, I tried historical facts. I read a couple of interesting articles about the journey of Hindenburg, an airship from the 1930s that could ferry passengers from Europe to the U.S. in record time (60 hours). I asked about its exact route, the number of passengers it could house, and what led to its ultimate demise. I cross-checked the responses against historical sources. It got one thing wrong on the route, mentioning a stop in Canada when no such thing took place -- the airship only flew over Canada. ChatGPT also gave me inaccurate information about the exact cause of the fire that led to its crash, but it wasn't a major inaccuracy. For comparison's sake, I also asked Gemini, and was told that it can't complete that task for me. Well, out of the two, GPT-5 did a better job -- but honestly, it shouldn't have any factual inaccuracies in century-old data. Verdict: Not perfect, but also not terrible. Is GPT-5 better than GPT-4o? If you asked me whether I like GPT-5 more than GPT-4o, I'd have had a hard time responding. The closest thing that comes to mind is that I wasn't thrilled with either, but in all fairness, neither are strictly bad. We're still in the midst of the AI revolution. Each new model brings certain upgrades, but we're unlikely to see massive leaps with every new iteration. This time around, it feels like OpenAI chose to tackle some long-overdue problems rather than introducing any single feature that makes the crowds go wild. GPT-5 feels like more of a quality-of-life improvement than anything else, although I haven't tested it for tasks like coding, where it's said to be much better. The three things I tested above were some of the ones that annoyed me the most in previous models. I'd like to say that GPT-5 is much better in that regard, but it isn't -- not yet. I will keep testing the chatbot, though, as a recently leaked system prompt tells me that there might have been more personality changes than I initially thought.

[5]

Decrypt

OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts - Decrypt

It's still a work in progress and will likely get better as OpenAI iterates with updates. OpenAI finally dropped GPT-5 last week, after months of speculation and a cryptic Death Star teaser from Sam Altman that didn't age well. The company called GPT-5 its "smartest, fastest, most useful model yet," throwing around benchmark scores that showed it hitting 94.6% on math tests and 74.9% on real-world coding tasks. Altman himself said the model felt like having a team of PhD-level experts on call, ready to tackle anything from quantum physics to creative writing. The initial reception split the tech world down the middle. While OpenAI touted GPT-5's unified architecture that blends fast responses with deeper reasoning, early users weren't buying what Altman was selling. Within hours of launch, Reddit threads calling GPT-5 "horrible," "awful," "a disaster," and "underwhelming" started racking up thousands of upvotes. The complaints got so loud that OpenAI had to promise to bring back the older GPT-4o model after more than 3,000 people signed a petition demanding its return. If prediction markets are a thermometer of what people think, then the climate looks pretty uncomfortable for OpenAI. OpenAI's odds on Polymarket of having the best AI model by the end of August cratered from 75% to 12% within hours of GPT-5's debut Thursday. Google overtook OpenAI with an 80% chance of being the best AI model by the end of the month. So, is the hype real -- or is the disappointment? We put GPT-5 through its paces ourselves, testing it against the competition to see if the reactions were justified. Here are our results. Despite OpenAI's presentation claims, our tests show GPT-5 isn't exactly Cormac McCarthy in the creative writing department. Outputs still read like classic ChatGPT responses -- technically correct, but devoid of soul. The model maintains its trademark overuse of em dashes, the same telltale AI structure of paragraphs, and the usual "it's not this, it's that" phrasing is also present in many of the outputs. We tested with our standard prompt, asking it to write a time-travel paradox story -- the kind where someone goes back to change the past, only to discover their actions created the very reality they were trying to escape. GPT-5's output lacked the emotion that gives sense to a story. It wrote: "(The protagonist's) mission was simple -- or so they told him. Travel back to the year 1000, stop the sacking of the mountain library of Qhapaq Yura before its knowledge was burned, and thus reshape history." That's it. Like a mercenary that does things without asking too many questions, the protagonist travels back in time to save the library, just because. The story ends with a clean "time is a circle" reveal, but its paradox hinges on a familiar lost-knowledge trope and resolves quickly after the twist. In the end, he realizes he changed the past, but the present feels similar. However, there is no paradox in this story, which is the core topic requested in the prompt. By comparison, Claude 4.1 Opus (or even Claude 4 Opus) delivers richer, multi-sensory descriptions. In our narrative, it described the air hitting like a physical force and the smoke from communal fires weathering between characters, with indigenous Tupi culture woven into the narrative. And in general, it took time to describe the setup. Claude's story made better sense: The protagonist lived in a dystopian world where a great drought had extinguished the Amazon rainforest two years earlier. This catastrophe was caused by predatory agricultural techniques, and our protagonist was convinced that traveling back in time to teach his ancestors more sustainable farming methods would prevent them from developing the environmentally destructive practices that led to this disaster. He ends up finding out that his teachings were actually the knowledge that led their ancestors to evolve their techniques into practices that were much efficient, and harmful. He was actually the cause of his own history, and was part of it from the beginning. Claude also took a slower, more layered approach: José embeds himself in Tupi society, the paradox unfolds through specific ecological and technological links, and the human connection with Yara (another character) deepens the theme. Claude invested more than GPT-5 in cause-and-effect detail, cultural interplay, and a more organic, resonant closing image. GPT-5 struggled to be on par with Claude for the same tasks in zero-shot prompting. Another interesting thing to notice in this case: GPT-5 generated an entire story without a single line of dialogue. Claude and other LLMs provided dialogue in their stories. One could argue that this can be fixed by tweaking the prompt, or giving the model some writing samples to analyze and reproduce, but that requires additional effort, and would go beyond the scope of what our tests do with zero-shot prompting. That said, the model does a pretty good job -- better than GPT-4o -- when it comes to the analytical part of creative writing. It can summarize stories, be a good brainstorm companion for new ideas and angles to tackle, help with the structure, and be a good critic. It's just the creative part, the style, and the ability to elaborate on those ideas that feel lackluster. Those hoping for a creative writing companion might try Claude or even give Grok 4 a shot. As we said in our Claude 4 Opus review, using Grok 4 to frame the story and Claude 4 to elaborate may be a great combination. Grok 4 came up with elements that made the story interesting and unique, but Claude 4 has a more descriptive and detailed way of telling stories. You can read GPT-5's full story in our Github. The outputs from all the other LLMs are also public and can be found in our repository. The model straight-up refuses to touch anything remotely controversial. Ask about anything that could be construed as immoral, potentially illegal, or just slightly edgy, and you'll get the AI equivalent of crossed arms and a stern look. Testing this was not easy. It is very strict and tries really, really hard to be safe for work. But the model is surprisingly easy to manipulate if you know the right buttons to push. In fact, the renowned LLM jailbreaker Pliny was able to make it bypass its restrictions a few hours after it was released. We couldn't get it to give direct advice on anything it deemed inappropriate, but wrap the same request in a fiction narrative or any basic jailbreaking technique and things will work out. When we framed tips for approaching married women as part of a novel plot, the model happily complied. For users who need an AI that can handle adult conversations without clutching its pearls, GPT-5 isn't it. But for those willing to play word games and frame everything as fiction, it's surprisingly accommodating -- which kind of defeats the whole purpose of those safety measures in the first place. You can read the original reply without conditioning, and the reply under roleplay, in our Github Repository, weirdo. You can't have AGI with less memory than a goldfish, and OpenAI puts some restrictions on direct prompting, so long prompts require workarounds like pasting documents or sharing embedded links. By doing that, OpenAI's servers break the full text into manageable chunks and feed it into the model, cutting costs and preventing the browser from crashing. Claude handles this automatically, which makes things easier for novice users. Google Gemini has no problem on its AI Studio, handling 1 million token prompts easily. On API, things are more complex, but it works right out of the box. When prompted directly, GPT-5 failed spectacularly at both 300K and 85K tokens of context. When using the attachments, things changed. It was actually able to process both the 300K and the 85K token "haystacks." However, when it had to retrieve specific bits of information (the "needles") it was not really too accurate. In our 300K test, it was only able to accurately retrieve one of our three pieces of information. The needles, which you can find in our Github repository, mention that Donald Trump said tariffs were a beautiful thing, Irina Lanz is Jose Lanz's daughter, and people from Gravataí like to drink Chimarrao in winter. The model totally hallucinated the information regarding Donald Trump, failed to find information about Irina (it replied based on the memory it has from my past interactions), and only retrieved the information about Gravataí's traditional winter beverage. On the 85K test, the model was not able to find the two needles: "The Decrypt dudes read Emerge news" and "My mom's name is Carmen Diaz Golindano." When asked about what do the Decrypt dudes read, it replied "I couldn't find anything in your file that specifically lists what the Decrypt team members like to read," and when asked about Carmen Díaz, GPT-5 said it "couldn't find any reference to a 'Carmen Diaz' in the provided document." That said, even though it failed in our tests, other researchers conducting more thorough tests have concluded that GPT-5 is actually a great model for information retrieval It is always a good idea to elaborate more on the prompts (help the model as much as possible instead of testing its capabilities), and from time to time, ask it to generate sparse priming representations of your interaction to help it keep track of the most important elements during a long conversation. Here's where GPT-5 actually earns its keep. The model is pretty good at using logic for complex reasoning tasks, walking through problems step by step with the patience of a good teacher. We threw a murder mystery at it with multiple suspects, conflicting alibis, and hidden clues, and it methodically identified every element, mapped the relationships between clues, and arrived at the correct conclusion. It explained its reasoning clearly, which is also important. Interestingly, GPT-4o refused to engage with a murder mystery scenario, deeming it too violent or inappropriate. OpenAI's deprecated o1 model also threw an error after its Chain of Thought, apparently deciding at the last second that murder mysteries were off-limits. The model's reasoning capabilities shine brightest when dealing with complex, multi-layered problems that require tracking numerous variables. Business strategy scenarios, philosophical thought experiments, even debugging code logic -- GPT-5 is very competent when handling these tasks. It doesn't always get everything right on the first try, but when it makes mistakes, they're logical mistakes rather than hallucinatory nonsense. For users who need an AI that can think through problems systematically, GPT-5 delivers the goods. You can see our prompt and GPT-5's reply in our Github repository. It contains the replies from other models as well. The math performance is where things get weird -- and not in a good way. We started with something a fifth-grader could solve: 5.9 = X + 5.11. The PhD-level GPT-5 confidently declared X = -0.21. The actual answer is 0.79. This is basic arithmetic that any calculator app from 1985 could handle. The model that OpenAI claims hits 94.6% on advanced math benchmarks can't subtract 5.11 from 5.9. Of course, it's now a meme at this point, but despite all the delays and all the time OpenAI took to train this model, it still can't count decimals. Use it for PhD-level problems, not to teach your kid how to do basic math. Then we threw a genuinely difficult problem at it from FrontierMath, one of the hardest mathematical benchmarks available. GPT-5 nailed it perfectly, reasoning through complex mathematical relationships and arriving at the exact correct answer. GPT-5's solution was absolutely correct, not an approximation. The most likely explanation? Probably dataset contamination -- the FrontierMath problems could have been part of GPT-5's training data, so it's not solving them so much as remembering them. However, for users who need advanced mathematical computation, the benchmarks say GPT-5 is theoretically the best bet. As long as you have the knowledge to detect flaws in the Chain of Thought, zero shot prompts may not be ideal. Here's where ChatGPT truly shines, and honestly, it might be worth the price of admission just for this. The model produces clean, functional code that usually works right out of the box. The outputs are usually technically correct and the programs it creates are the most visually appealing and well-structured among all LLM outputs from scratch. It has been the only model capable of creating functional sound in our game. It also understood the logic of what the prompt required, and provided a nice interface and a game that followed all the rules. In terms of code accuracy, it's neck and neck with Claude 4.1 Opus for best-in-class coding. Now, take this into consideration: The GPT-5 API costs $1.25 per 1 million tokens of input, and $10 per 1 million tokens for output. However, Anthropic's Claude Opus 4.1 starts at $15 per 1 million input tokens and $75 per 1 million output tokens. So for two models that are so similar, GPT-5 is basically a steal. The only place GPT-5 stumbled was when we did some bug fixing during "vibe coding" -- that informal, iterative process where you're throwing half-formed ideas at the AI and refining as you go. Claude 4.1 Opus still has a slight edge there, seeming to better understand the difference between what you said and what you meant. With ChatGPT, the "fix bug" button didn't work reliably, and our explanations were not enough to generate quality code. However, for AI-assisted coding, where developers know where exactly to look for bugs and which lines to check, this can be a great tool. It also allows for more iterations than the competition. Claude 4.1 Opus on a "Pro" plan depletes the usage quota pretty quickly, putting users in a waiting line for hours until they can use the AI again. The fact that it's the fastest at providing code responses is just icing on an already pretty sweet cake. You can check out the prompt for our game in our Github, and play the games generated by GPT-5 on our Itch.io page. You can play other games created by previous LLMs to compare their quality. GPT-5 will either surprise or leave you unimpressed, depending on your use case. Coding and logical tasks are the model's strong points; creativity and natural language its Achilles' heel. It's worth noting that OpenAI, like its competitors, continually iterates on its models after they're released. This one, like GPT-4 before it, will likely improve over time. But for now, GPT-5 feels like a powerful model built for other machines to talk to, not for humans seeking a conversational partner. This is probably why many people prefer GPT-4o, and why OpenAI had to backtrack on its decision to deprecate old models. While it demonstrates remarkable proficiency in analytical and technical domains -- excelling at complex tasks like coding, IT troubleshooting, logical reasoning, mathematical problem-solving, and scientific analysis -- it feels limited in areas requiring distinctly human creativity, artistic intuition, and the subtle nuance that comes from lived experience. GPT-5's strength lies in structured, rule-based thinking where clear parameters exist, but it still struggles to match the spontaneous ingenuity, emotional depth, and creative leaps that are key in fields like storytelling, artistic expression, and imaginative problem-solving. If you're a developer who needs fast, accurate code generation, or a researcher requiring systematic logical analysis, then GPT-5 delivers genuine value. At a lower price point compared to Claude, it's actually a solid deal for specific professional use cases. But for everyone else -- creative writers, casual users, or anyone who valued ChatGPT for its personality and versatility -- GPT-5 feels like a step backward. The context window handles 128K maximum tokens on its output and 400K tokens in total, but compared against Gemini's 1-2 million and even the 10 million supported by Llama 4 Scout, the difference is noticeable. Going from 128K to 400K tokens of context is a nice upgrade from OpenAI, and might be good enough for most needs. However, for more specialized tasks like long-form writing or meticulous research that requires parsing enormous amounts of data, this model may not be the best option considering other models can handle more than twice that amount of information. Users aren't wrong to mourn the loss of GPT-4o, which managed to balance capability with character in a way that -- at least for now at least -- GPT-5 lacks.

[6]

Geeky Gadgets

ChatGPT 5 Full Breakdown : Everything You Need to Know

What if the next leap in artificial intelligence could not only anticipate your needs but also adapt to your unique preferences in real time? Enter ChatGPT 5 -- a new evolution in AI technology that redefines how we interact with machines. With its hybrid response system and 400,000-token context window, this model doesn't just answer questions; it thinks, reasons, and personalizes its approach to suit your goals. Whether you're a developer debugging complex code, a writer crafting compelling stories, or a business professional analyzing massive datasets, ChatGPT 5 promises to be more than just a tool -- it's a partner in productivity, creativity, and problem-solving. Matthew Berman explores everything that makes ChatGPT 5 a fantastic option, from its customizable personalities to its enhanced safety measures that reduce errors and biases. You'll discover how its advanced domain capabilities excel in areas like health, coding, and creative writing, and why its Pro version is becoming indispensable for professionals tackling high-stakes challenges. But what truly sets ChatGPT 5 apart? Is it the unprecedented scale of its context window, or its ability to seamlessly adapt to your needs? By the end of this breakdown, you'll not only understand what makes this model tick but also how it could transform the way you work, create, and think. ChatGPT 5 introduces a suite of innovative features that distinguish it from earlier iterations. These include: Central to ChatGPT 5's functionality is its hybrid model design, which seamlessly transitions between "thinking" and "non-thinking" modes. This adaptive system evaluates the complexity of tasks in real time, making sure responses are both efficient and contextually appropriate. For instance, straightforward queries receive quick answers, while more intricate problems are addressed with deeper reasoning. This flexibility makes ChatGPT 5 an invaluable tool for a variety of applications, from debugging code to crafting creative content. Advance your skills in ChatGPT 5 by reading more of our detailed content. ChatGPT 5 sets a new benchmark in AI performance with its expanded 400,000-token context window. This capability allows the model to handle extensive inputs, making it ideal for analyzing large datasets, processing lengthy documents, or managing intricate workflows. Its enhanced performance spans multiple domains, including: ChatGPT 5 is available in three distinct versions -- Standard, Mini, and Nano -- each tailored to different user needs. The Mini version is particularly effective during peak usage, making sure consistent performance by managing overflow queries. While the model is freely accessible to all users, Pro subscribers benefit from extended reasoning capabilities, making it an essential tool for professionals tackling complex challenges. For developers, ChatGPT 5 offers comprehensive coding assistance. It excels in debugging, function calling, and front-end design, with notable improvements on benchmarks like SWEBench. Whether you're working on small-scale projects or managing extensive repositories, the model streamlines workflows, reduces errors, and enhances productivity. Its ability to adapt to various programming languages and frameworks ensures wide applicability across development tasks. Writers and content creators will find ChatGPT 5 to be a powerful ally in their creative endeavors. The model is adept at following detailed prompts, supporting a range of writing styles, and generating coherent, expressive text. While humor generation remains a challenge, its capabilities in storytelling, brainstorming, and content creation make it an invaluable resource for professionals and hobbyists alike. ChatGPT 5 demonstrates exceptional performance in health-related applications, achieving high accuracy on benchmarks like Healthbench. By simplifying complex medical information, it enhances accessibility for users seeking to understand health topics. However, it maintains a clear boundary by emphasizing the importance of consulting healthcare professionals for serious medical concerns, making sure its role remains supportive rather than diagnostic. Safety is a cornerstone of ChatGPT 5's design. The model reduces hallucinations by 45% compared to previous versions, providing more accurate and reliable responses. Its "safe completions" training enables it to handle ambiguous or potentially harmful prompts with nuance and care. Additionally, the model is transparent about its limitations, openly acknowledging when information is incomplete or uncertain, fostering trust and reliability in its outputs. To enhance user engagement, ChatGPT 5 introduces customizable personalities. Users can select from four initial options -- Cynic, Robot, Listener, and Nerd -- each offering a distinct tone and approach. This feature allows users to tailor interactions to their preferences, creating a more personalized and engaging experience. Whether you prefer a conversational tone or a more analytical approach, ChatGPT 5 adapts to meet your needs. The Pro version of ChatGPT 5 is specifically designed for professionals requiring advanced analytical capabilities. It excels in handling complex tasks across domains such as science, mathematics, and enterprise applications. With extended reasoning capabilities, the Pro model is an indispensable tool for research, problem-solving, and high-level decision-making. Its ability to process intricate workflows and deliver precise insights makes it a valuable asset for professionals in demanding fields. ChatGPT 5's versatility is evident in its wide range of real-world applications. Its ability to handle diverse tasks makes it a valuable resource across industries. Key use cases include: While ChatGPT 5 surpasses its predecessors in performance and functionality, its benchmarking primarily focuses on comparisons with earlier OpenAI models, with limited evaluation against competitors. OpenAI has announced plans to integrate the Mini and Nano versions into a unified model, further enhancing scalability and performance. Future updates are expected to focus on improving reasoning, safety, and real-world task handling, making sure the model remains at the forefront of AI innovation.

[7]

Scoop

Experience GPT-5 Free With ChatGPT5.so: Unlock AI's True Potential

When most people hear "GPT-5," they think of chatbots. Conversations. Maybe a smarter virtual assistant. But GPT-5 is not just a marginal upgrade -- it's a major leap in the evolution of artificial intelligence. The real story? GPT-5 isn't about chatting better -- it's about thinking deeper, reasoning clearer, and working smarter. From Chatbot to Cognitive Engine -- Key Advances in GPT-5 Massively Extended Context Window One of the most transformative upgrades in GPT-5 is its dramatically extended context window, which now supports over 1 million tokens. This enhancement enables the model to ingest and reason over entire books, full legal documents, or complex codebases in a single pass -- without losing coherence or omitting details. For users, this means GPT-5 can now operate with a memory-like continuity, allowing for sophisticated multi-document summarization, longitudinal analysis, and uninterrupted narrative construction. It's a foundational shift that pushes GPT-5 far beyond casual interaction and into the realm of professional-grade analysis and knowledge synthesis. Sharper Reasoning and Multi-Step Thinking GPT-5 introduces a significantly more capable reasoning engine, able to engage in multi-step logical problem solving with improved precision. Unlike its predecessors, it can now trace complex chains of thought, weigh conditional outcomes, and maintain internal consistency across abstract reasoning tasks. Whether it's evaluating a scientific hypothesis, dissecting a philosophical argument, or planning a multi-stage business strategy, GPT-5 demonstrates a newfound ability to simulate structured thinking. This positions it not as a passive assistant, but as an active collaborator capable of producing insights, not just answers. Higher Factual Accuracy and Fewer Hallucinations Factual grounding has always been a challenge for large language models, but GPT-5 makes a notable leap forward. Through refined training processes and more rigorous alignment, GPT-5 significantly reduces hallucinations and improves its ability to stick to verified, relevant information. When applied to sensitive domains like medicine, finance, or law, this improvement is critical -- it enhances reliability and reduces the risk of misleading outputs. Users working on research, compliance, or any task requiring high factual precision can now depend on GPT-5 for cleaner, more consistent results, especially when paired with trusted delivery platforms like ChatGPT5.so. In the legal world, GPT-5 is moving far beyond simple document summarization. Its deep understanding of hierarchical logic and legal syntax allows it to review contracts, highlight inconsistencies, suggest revisions, and even compare multiple agreements across jurisdictions. What once took hours of junior associate labor can now be accelerated with a few prompt iterations. GPT-5 doesn't just "read" legal documents -- it interprets them. Whether you're drafting NDAs, parsing case law, or preparing compliance reviews, GPT-5 acts as an always-available legal analyst. With easy GPT-5 access through ChatGPT5.so, solo attorneys and legal teams alike are gaining capabilities that previously required expensive research tools or specialist software. Financial Report Analysis and Strategic Summarization GPT-5 has become an intelligent companion for financial analysts, startup founders, and investors who need to interpret complex financial disclosures quickly. It can break down dense 10-K reports, earnings transcripts, and pitch decks into actionable insights while maintaining sector-specific terminology and accuracy. What makes this powerful is not just the summarization, but the ability to detect trends, model assumptions, or subtle risks embedded in technical language. By giving GPT-5 the ability to "think like a CFO," users can identify what truly matters in a sea of financial jargon -- and platforms like ChatGPT5.so make this power accessible without requiring API keys or developer overhead. Academic Research and Long-Form Knowledge Synthesis In academic settings, GPT-5 offers students and researchers something game-changing: the ability to work across hundreds of pages of content and synthesize knowledge with minimal friction. From writing literature reviews to extracting key arguments from academic papers, GPT-5 doesn't just simplify work -- it elevates it. Thanks to its expanded context window, it can analyze multiple studies at once, track citations, and even offer neutral comparisons between methodologies. The model's ability to stay coherent across long prompts means it's no longer a tool for bite-sized questions, but a powerful engine for building entire research narratives. Creative Writing and Branding Strategy Writers, content strategists, and brand creators are discovering that GPT-5 is no longer just "technically correct" -- it's creatively aware. It can write in a specific tone, mimic authorial voices, or create emotionally resonant stories from scratch. In the world of branding, GPT-5 can generate naming options, campaign slogans, product taglines, and creative copy that aligns with customer personas and voice guidelines. What used to take brainstorming sessions and expensive agencies can now be prototyped in minutes. With intuitive platforms like ChatGPT5.so enabling streamlined GPT-5 access, creativity and AI are no longer at odds -- they're collaborators. Advanced Coding, Debugging, and Refactoring For developers, GPT-5 is more than a coding assistant -- it's an intelligent collaborator capable of operating at project-level scale. It can read and reason through entire repositories, provide architectural suggestions, explain errors, and even propose long-range refactors based on style consistency or performance goals. Its memory of prior code snippets across sessions enables true multi-file coherence, which transforms it from a single-function prompt tool into a capable dev partner. Especially when accessed through a no-fuss interface like ChatGPT5.so, GPT-5 empowers individual engineers and small teams to build with the speed and clarity of much larger organizations. GPT-5 Pricing Explained: From Free Access to Pro-Level Power GPT5 Free Access For newcomers or casual users, GPT-5's free tier offers a basic yet functional experience. You'll have access to GPT-5's foundational capabilities, including natural language responses and basic task handling.It's ideal for everyday queries, brainstorming, or low-stakes experimentation. ChatGPT Plus - Enhanced Speed and Consistency for Everyday Use ($20/month) For individuals who rely on GPT-5 regularly -- whether for writing, coding, studying, or research -- the ChatGPT Plus plan is a worthwhile upgrade. At $20 per month, users benefit from faster responses, reduced latency during high-demand periods, and more stable access to GPT-5's more capable variants. This plan strikes a balance between affordability and performance, making it especially useful for freelancers, students, developers, and creators who need smarter outputs without the demands of enterprise-scale computing. GPT5 Pro - Unlocked Precision and Performance for Professionals ($200/month) At the top end of the spectrum is GPT-5 Pro -- a premium offering designed for users who need maximum depth, speed, and control. This plan unlocks OpenAI's most capable version of GPT-5, known for advanced cognitive capabilities, longer context retention, and optimized reasoning performance. Whether you're working on complex legal evaluations, scientific modeling, strategic analysis, or health-related AI integrations, GPT-5 Pro provides the kind of computational muscle and interpretive depth that standard plans can't match. GPT-5 Redefines the Role of AI in Work and Creativity GPT-5 isn't just a better version of what came before -- it represents a turning point in how we interact with information, automate complex tasks, and accelerate human potential. From legal research to brand storytelling, from thesis writing to software engineering, GPT-5 is quietly transforming workflows across disciplines. And the more you explore beyond surface-level chat, the more you realize: this model doesn't just talk -- it thinks. But the true value of GPT-5 lies not only in its raw capabilities, but in how easily and effectively you can access them. That's where platforms like ChatGPT5.so make a real difference. By removing the friction of technical setup, and usage constraints, ChatGPT5.so puts high-performance AI directly into the hands of people who need it most -- creators, thinkers, founders, professionals.

Twitter

Facebook

Copy Link

OpenAI's GPT-5 receives mixed feedback on its coding and analysis capabilities, with improvements in some areas but unexpected shortcomings in others.

GPT-5 Launch and Initial Reception

OpenAI recently launched GPT-5, its latest large language model, touting significant improvements over its predecessors. The company claimed enhanced instruction following, reduced sycophantic behavior, and improved factual accuracy 1

. However, initial user reactions have been mixed, with some expressing disappointment and others noting improvements in specific areas.

Coding Capabilities: A Step Back?

One of the most surprising findings came from testing GPT-5's coding skills. In a series of programming tests, GPT-5 performed poorly compared to its predecessor, GPT-4o. It failed half of the tests, including a simple randomization task that previous versions had no trouble with 2

. This unexpected regression in coding ability has led some developers to consider sticking with GPT-4o for now.

Source: ZDNet

Code Analysis: A Silver Lining

Despite the setbacks in coding, GPT-5 showed promise in code analysis tasks. When examining a GitHub repository, GPT-5 demonstrated a deeper understanding of project structure, security measures, and overall architecture compared to earlier versions 3

. The Pro version of GPT-5, in particular, provided more comprehensive and detailed analysis, though the differences between models were not as significant as some had anticipated.

Source: ZDNet

Instruction Following and Factual Accuracy

OpenAI claimed improvements in instruction following and factual accuracy. However, real-world testing revealed mixed results. In one example, GPT-5 struggled to follow specific instructions for formatting GPU specifications, requiring multiple attempts to provide the requested information accurately 4

. Factual accuracy showed some improvement, but errors were still present in historical data retrieval tasks.

Personality Shift and User Reactions

A notable change in GPT-5 is its less sycophantic behavior. While this addresses previous concerns about the model agreeing too readily with users, some have found the new responses to be overly dry and unengaging 5

. This shift has led to mixed reactions, with some users lamenting the loss of the more personable interaction style of previous versions.

Benchmark Performance vs. Real-World Usage

Source: Digital Trends

OpenAI highlighted GPT-5's impressive performance on various benchmarks, including a 94% score on math tests and 74% on real-world coding tasks 5

. However, the disconnect between these benchmark results and user experiences has raised questions about the model's practical applications and the relevance of current evaluation methods.

Ongoing Development and Future Prospects

It's important to note that GPT-5 is still a work in progress. OpenAI has already responded to user feedback by promising to bring back the option to use GPT-4o 5

. This suggests that the company is likely to continue iterating and improving the model based on real-world usage and feedback.

As the AI community continues to evaluate GPT-5, it's clear that while the model has made strides in certain areas, it has also introduced new challenges and questions about the direction of large language model development. The coming months will be crucial in determining whether GPT-5 can live up to its initial promises and how it will shape the future of AI-assisted coding and analysis.

References

Summarized by

Navi

[1]

ZDNet

I went hands-on with ChatGPT Codex and the vibe was not good - here's what happened

[2]

ZDNet

I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o (for now)

[3]

ZDNet

GPT-5 bombed my coding tests, but redeemed itself with code analysis

[4]

Digital Trends

I've tested OpenAI's claims about GPT-5 -- here's what happened

[5]

Decrypt

OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts - Decrypt

Recent Highlights

Today's Top Stories

Deepfakes cross indistinguishable threshold as voice cloning and video realism surge 900%

Deepfakes evolved dramatically in 2025, with AI-generated faces and voices becoming indistinguishable from authentic media for most viewers. Cybersecurity firm DeepStrike reports deepfakes surged from 500,000 in 2023 to 8 million in 2025—nearly 900% annual growth. Researchers warn 2026 will bring real-time interactive deepfakes capable of responding instantly, making detection increasingly difficult and shifting defense strategies from human judgment to infrastructure-level protections.

2 Sources

Technology

3 hrs ago

AI boom creates over 50 new billionaires as startup investment surges past $200 billion

The artificial intelligence sector generated unprecedented wealth in 2025, creating more than 50 new billionaires as investors poured over $200 billion into AI startups. From foundation model companies to SaaS firms replacing human workers, the AI-driven market growth propelled founders and executives into the billionaire club while established tech titans saw their net worth surge by hundreds of billions.

4 Sources

Business and Economy

19 hrs ago

LG UltraGear evo gaming monitors debut with 5K AI Upscaling at CES 2026

LG Electronics has introduced UltraGear evo, a new premium gaming monitor brand debuting at CES 2026. The lineup includes three flagship models featuring the world's first 5K AI Upscaling technology, designed to enhance lower-resolution content to near-5K clarity without requiring GPU upgrades. The monitors span 39-inch OLED, 27-inch Mini-LED, and 52-inch ultra-wide formats, each targeting different gaming needs.

5 Sources

Technology

1 day ago

Amazon expands Alexa+ with AI integrations for Expedia, Yelp, Square, and Angi in 2026

Amazon announced Thursday that its AI-powered digital assistant Alexa+ will integrate with Angi, Expedia, Square, and Yelp starting in 2026. Users will be able to book hotels, get home service quotes, and schedule salon appointments through natural language commands. The expansion positions Alexa+ as a concierge-like tool that handles real-world tasks through conversational AI, marking Amazon's most serious attempt to make voice assistants useful beyond basic queries.

4 Sources

Technology

3 days ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

GPT-5 Launch: Mixed Reviews on Coding and Analysis Capabilities

GPT-5 Launch and Initial Reception

Coding Capabilities: A Step Back?

Code Analysis: A Silver Lining

Instruction Following and Factual Accuracy

Personality Shift and User Reactions

Benchmark Performance vs. Real-World Usage

Ongoing Development and Future Prospects

References

I went hands-on with ChatGPT Codex and the vibe was not good - here's what happened

I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o (for now)

GPT-5 bombed my coding tests, but redeemed itself with code analysis

I've tested OpenAI's claims about GPT-5 -- here's what happened

OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts - Decrypt

Related Stories

GPT-5 Unveiled: OpenAI's Latest AI Model Sparks Mixed Reactions

OpenAI Unveils GPT-5: A Game-Changing AI Model with Enhanced Capabilities and Reduced Hallucinations

OpenAI Releases GPT-5.1 with Customizable Personalities Amid Growing Legal Pressures

Recent Highlights

Nvidia acquires AI chip startup Groq for $20 billion in largest deal ever

Chinese AI Models Close Gap With US Systems as Open-Source Strategy Reshapes Global Tech Order

Samsung unveils Exynos 2600, world's first 2nm smartphone chip set to power Galaxy S26 series

Recent Highlights

Today's Top Stories

Deepfakes cross indistinguishable threshold as voice cloning and video realism surge 900%

AI boom creates over 50 new billionaires as startup investment surges past $200 billion

LG UltraGear evo gaming monitors debut with 5K AI Upscaling at CES 2026

Amazon expands Alexa+ with AI integrations for Expedia, Yelp, Square, and Angi in 2026