I’ve been playing with OpenAI’s Codex (online) and Google’s Jules, both asynchronous web-based coding agents. Both start the same way: connect to GitHub, choose a repository, and give them a task to work on.
Codex lets you set up an environment with env variables and custom installs (pip install -r requirements.txt
for instance) but Jules does not. Neither has internet access after the initial pull.
Codex has access to a docker environment and can run unit tests and linting, but Jules does not.
Both run their coding skills, show you the diffs, and offer to push to a new branch
At the moment, Codex has some notable advantages to Jules in the setup. But that’s where it stops,
Before I go further, neither is producing out-of-the-box usable code for me. Both are clearly in research-preview mode. If you are hoping for a productivity lift, these are not for you. Also to note: I don’t have deterministic evals any more than anybody else in the industry, so these are vibe-level impressions.
Jules seems to produce better code, at the expense of not being able to test it or fix linting errors, which is a significant disadvantage in CI/CD pipelines. Codex can fix the unit tests and linting errors, at the disadvantage of not submitting code that solves the original issue.
Which means neither is fulfilling their promise, today. Sonnet 4 and Opus 4 are easily superior.
I’ll do some more testing in the next few days, especially with Jules, which of the two seems like it has more promise if I can figure out what tasks it can handle.
0 Comments