I built an app with both Codex and Claude Code, and only one made me want to keep…

The topic of I built an app with both Codex and Claude Code, and only one made me want to keep… is currently the subject of lively debate — readers and analysts are keeping a close eye on developments.

This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

AI has changed a lot in a very short time, and coding has unfortunately felt every bit of it. Just a few years ago, people would advise you left and right to consider majoring in Computer Science as a guaranteed path to a six-figure career (or just quit whatever you’re doing and learn to code). Fast forward to today, building something doesn’t really have that barrier to entry anymore. You don’t need to sit through an extremely long coding course or pay an expert developer to build your idea for you. You simply need to know what you want to build and how to describe it clearly enough for an AI to comprehend.

So, the question has quickly shifted from “can you code” to “what AI are you coding with.” I’ve explored pretty much everything out there, from the more simplistic coding tools like Replit and Lovable, to open-source Claude Code alternatives like OpenCode, to what everyone’s talking about lately: Claude Code and OpenAI’s Codex. Everyone has an opinion on which one is better. I figured the only way to settle it, at least for myself, was to build the same app with both.

Something I’ve mentioned in my articles about AI-assisted coding multiple times before is that you need to be building tools for yourself. Not tools you can profit off of, but simple tools that genuinely save you time or solve a real problem in your day. If the tool works well and does what it’s meant to do, you might be able to turn it commercial later — but that should always be the byproduct rather than the goal. So, I decided to put Claude Code and Codex to the test by building a tool I’ve wanted to for a while instead of a generic to-do app.

It’s extremely specific to my workflow, which is exactly what makes it a good test. Now, I’m someone who gets distracted extremely easily. I’ve tested every screen time app there is, and resorted to crazy tactics like locking my phone in the cupboard. I recently discovered an open-source app called Foqos that works with an NFC tag to build your own version of the viral Brick device, and it’s been working incredibly well.

I don’t pick up my phone during focus sessions, and even when I do, there’s nothing on there to do. Just to unlock things, I’d have to get up, find the NFC tag, and scan it. That tiny bit of friction is genuinely all it takes. Unfortunately, none of that applies to my laptop. My phone might be locked down, but my laptop still has everything that distracts me mid-work.

Given that my focus sessions typically involve writing, the tool I wanted to build was simple: block all of it until I paste in proof that I’ve actually written something. The idea was straightforward. I’d set a word count goal and the tool would block all my usual distractions: WhatsApp Desktop, Instagram, X, LinkedIn.

Everything stays locked down until I paste in what I’ve written, and the tool verifies I’ve actually hit the goal. No timers, no honor system, none of the generic focus apps out there. Just write the words or stay blocked. As you can tell, it’s a tool that’s extremely specific to my needs. In my opinion, that’s exactly the kind of thing coding tools should be good at: turning a personal problem into a working solution.

When doing these types of experiments where the goal is to evaluate the output each tool produces, I prefer being a bit intentionally vague with the prompt. If I tell the tool each and every detail upfront, both tools will probably get to the same place. The interesting part is what happens when they have to figure things out on their own. So, I used a very to the point prompt that described what I wanted to build but didn’t really mention the design palette, what the UI should like, the specifics, or any of the implementation details. I simply told it what I want.

hey! is it possible to make a tool where like….it blocks certain apps and websites (in my case, whatsapp desktop and websites like instagram and X and linkedin etc) that i know would distract me until i paste in proof that i’ve done my work and written like 1000 words (or whatever goal i set)

Here’s where the comparison begins! Claude Code told me that building such a tool was certainly possible, but it wanted to ask a few questions first. It asked me if there were any specific apps I wanted to block other than the ones mentioned, if I wanted to choose a word count goal, and if I wanted it to save everything I had written somewhere. Once I answered its questions, it began building. Codex, on the other hand, didn’t ask me a single question and just went straight into building the tool out. Compared to Claude Code, it did explain each step it was performing more in-depth, but I do wish it had asked a few things first instead of making assumptions on its own. That’s a major reason why I always take some time to be more detailed with my prompts, since a bit of back and forth initially saves you the headache of fixing things later.

Claude Code was done building the tool in under 10 minutes, and it set up a server so I could run it. Interestingly, Codex didn’t build a web-based tool right off the bat, and instead built a terminal-based tool that you had to run through commands. Once it was done building, it asked me if it should take it to the next step by adding a GUI, harder blocking rules, scheduled sessions, and smarter “proof” checks than just raw word count. I said yes to the GUI, and to the smarter proof checks too (and decided to prompt Claude Code to do the same to keep things fair). The next step was actually testing what each tool had come up with.

Let’s start with Claude Code. It built a web-based tool that runs on a local server, which you open in your browser. Design-wise, it was clean, minimalistic, and honestly pretty nice to look at. It didn’t have the gradient vibe-coded look, which I appreciate! On the first try, it did the basic job well. You hit the Start FocusLock button, it blocks your distractions, and you paste in your writing to unlock everything!

Codex took a slightly different approach. Its version, FocusGuard, runs as an actual desktop app using Python’s built-in tkinter library. You don’t need a browser to run it. I actually liked that about it since it felt more like a real tool living on my device! Similar to Claude Code’s version, the UI of Codex’s tool was minimalist and nice to look at. Under the hood, both tools work similarly. They edit your system’s hosts file to block websites and use process killing to shut down apps like WhatsApp whenever they try to run!

However, the experience of using them was where things started to diverge. Codex’s version had a noticeable issue right out of the gate: the textbox where you’re supposed to paste your writing was completely broken. It just didn’t work. I had to ask Codex to fix it, and it didn’t get it right until the third attempt.For a tool whose entire purpose revolves around pasting text in, that’s not a great first impression.

Claude Code’s version, on the other hand, worked without any issues from the start — but it defaulted the word count goal to 1,000 with no option to change it. I had to go back and ask for that. Codex actually handled that better by letting you type in your own goal from the beginning, even though its textbox was a bit buggy there too!

Now, remember how I mentioned above that I instructed both tools to add smarter proof checks? Of course, that’s what I put to the test next. Both tools worked alright when I submitted an entire article I had written. I then decided to paste in complete gibberish. Claude Code’s version hit me with the following error: No punctuation or sentence structure found. Write real sentences. Funnily enough, Codex’s version accepted the gibberish and unlocked the app.

I then pasted lorem ipsum text into both, and unfortunately, both of them accepted it as real writing. Something worth mentioning here though is that I accidentally pasted the same lorem ipsum text into the Claude Code tool once again, and it said: You already submitted this exact text. Write something new. This means Claude Code built in duplicate detection without me ever asking for it. It’s not perfect, but it’s clearly thinking a step ahead of Codex when it comes to preventing cheating.

Now, both Claude Code and Codex built an impressive tool. However, I’d say Claude Code clearly won. The whole point of this tool was to block apps, paste in my writing, and then prove I’d done the work. Claude Code did all of that without a single issue on the first run. Codex couldn’t even get the pasting part right and needed three rounds of fixes before I could even test it!

Claude Code also asked questions before building, which led to fewer problems down the line. Codex skipped that step, made its own assumptions, and still ended up with bugs. Codex does get credit for two things: it let you customize the word count from the start, and the desktop app approach is arguably nicer than running something in a browser tab. But those are preferences. A working tool beats a slightly-better-designed broken one every time.

Why it matters

News like this often changes audience expectations and competitors’ plans.

When one player makes a move, others usually react — it is worth reading the event in context.

What to look out for next

The full picture will become clear in time, but the headline already shows the dynamics of the industry.

Further statements and user reactions will add to the story.

I built an app with both Codex and Claude Code, and only one made me want to keep…

You Missed

I kept waiting for Docker to fail, but nineteen containers on bare metal proved…

Turning Observability into a Search Space

Walmart’s new 4K Google TV stick is the Chromecast replacement I needed [Gallery]

If you use a Fitbit or Garmin smartwatch, you need to set up this Grafana dashboard

I built an app with both Codex and Claude Code, and only one made me want to keep…

Related Post

I kept waiting for Docker to fail, but nineteen containers on bare metal proved…

Walmart’s new 4K Google TV stick is the Chromecast replacement I needed [Gallery]

If you use a Fitbit or Garmin smartwatch, you need to set up this Grafana dashboard

You Missed

I kept waiting for Docker to fail, but nineteen containers on bare metal proved…

Turning Observability into a Search Space

Walmart’s new 4K Google TV stick is the Chromecast replacement I needed [Gallery]

If you use a Fitbit or Garmin smartwatch, you need to set up this Grafana dashboard