Jules & Codex: Initial Impressions

AI | 0 comments

I’ve been playing with OpenAI’s Codex (online) and Google’s Jules, both asynchronous web-based coding agents. Both start the same way: connect to GitHub, choose a repository, and give them a task to work on.

Codex lets you set up an environment with env variables and custom installs (pip install -r requirements.txt for instance) but Jules does not. Neither has internet access after the initial pull.

Codex has access to a docker environment and can run unit tests and linting, but Jules does not.

Both run their coding skills, show you the diffs, and offer to push to a new branch

At the moment, Codex has some notable advantages to Jules in the setup. But that’s where it stops,

Before I go further, neither is producing out-of-the-box usable code for me. Both are clearly in research-preview mode. If you are hoping for a productivity lift, these are not for you. Also to note: I don’t have deterministic evals any more than anybody else in the industry, so these are vibe-level impressions.

Jules seems to produce better code, at the expense of not being able to test it or fix linting errors, which is a significant disadvantage in CI/CD pipelines. Codex can fix the unit tests and linting errors, at the disadvantage of not submitting code that solves the original issue.

Which means neither is fulfilling their promise, today. Sonnet 4 and Opus 4 are easily superior.

I’ll do some more testing in the next few days, especially with Jules, which of the two seems like it has more promise if I can figure out what tasks it can handle.

0 Comments

What's your $0.02?

This site uses Akismet to reduce spam. Learn how your comment data is processed.