Codex and Claude: a two-model workflow

Using two AI models on the same project sounds like it should create confusion. For me, it has done the opposite. Each model has a clear role and neither tries to do everything.

This is how I use Codex and Claude Code together across projects like Euclid and my fitness dashboard. The pattern took a few projects to find. It is now the default for anything non-trivial.

What each model does well

Codex is fast, cheap, and holds a large context window. Its strongest use is audit. It reads a full repo in one pass, maps the structure, and spots gaps between what the documentation says and what the code actually does. A data contract that drifted three sessions ago, a prompt file that references a field that no longer exists, a naming inconsistency between two layers of a pipeline: Codex finds these quickly.

This matters especially when experience is missing. When I am not sure whether a project is on track, whether a decision made two weeks ago is still coherent with where things are now, Codex gives an independent read. It is not emotionally invested in what was built. It just describes what it sees.

Where Codex is weaker: written output. Summaries, narrative explanations, and project reports are noticeably flatter. It also has a tendency to propose solutions that are more complex than the problem requires. Given the choice between a direct fix and an architectural restructure, Codex will sometimes choose the restructure.

Claude Code is most effective when the vision is clear. Building this website, implementing a well-defined feature in the fitness dashboard, executing a precise task with a defined scope: Claude produces something direct, readable, and maintainable. Its written output is consistently cleaner, which matters when documentation or a CHANGELOG entry is part of the deliverable.

Where Claude is less useful: detecting its own drift. In a long project, it does not always notice when a decision made in session three is contradicted by a file modified in session twelve. That is not a criticism. It does not seem to be what the tool is optimised for.

The combination: Codex audits and identifies what is off. Claude implements the correction.

The separation between workspaces

The workflow depends on keeping the two models in separate directories.

~/claude/ is the historical source. It holds the project as Claude Code built it, session by session. Codex reads it but never writes to it.

~/codex/ is the active workspace. It is where Codex does its analysis, produces its plans, and generates prompts for Claude to execute.

This separation matters because it keeps the audit clean. Codex can read the full history of a project without the risk of accidentally modifying it. Claude executes against the real files. Nothing gets mixed.

How a session works in practice

A typical session on a complex project runs like this.

I open Codex and ask it to audit a specific area of the repo: a data contract, a flow, a prompt file that may have drifted from the current schema.

Once Codex returns its analysis, I ask it in the same conversation to format its recommendation as a structured task brief. The brief includes the context, the mission, the constraints, and the expected deliverables. That is what I will give to Claude Code. The example in the next section comes from exactly this step.

Before I use the brief, I read it. Codex sometimes proposes changes that are broader than the problem requires. If the scope feels too large, I edit the brief myself to narrow it down.

Then I open a terminal, navigate to the project folder, launch Claude Code with claude, and paste the brief as my first message. Claude reads the project files, executes the task, and commits the result.

The Codex-generated prompts tend to be verbose. Here is a representative example from the Euclid project, where Codex identified a drift between a batch normalisation prompt and the current data contract:

Context
You are working only in ~/claude/euclid/.
You can read ~/codex/projects/euclid/, but must never write to it.
You must take into account:
- ~/codex/projects/euclid/audit-convergence-flow.md
- the realignment already completed in flow.md, agents/data.md, agents/gtm.md,
  agents/outbound.md, brain.md, skillet/attio_import.md, skillet/n8n_attio.md
- the supplementary Codex audit on residual gaps

Mission
Correct only the residual gaps that are still blocking or structurally significant.
Do not rewrite the entire flow.

Priority corrections

1. Obsolete GTM prompt
The file prompts/gtm/scoring_v1.md is still aligned on the old taxonomy and
old output schema. It must be realigned on:
- v1 taxonomy: sepa, non_sepa, unknown
- primary key: source_id
- current output schema defined in agents/gtm.md
- current scoring logic defined in config/scoring_rules.md
- explicit shortlist rules after scoring
- no remaining fields or examples of type segment, fit_score,
  accessibility_score, confidence if these names are no longer the target contract
- no remaining references to baas_provider, retail_neobank,
  crossborder_emi, crypto_emi

2. Obsolete Claude batch snippet
The file skillet/claude_batch.md still contains a GTM system prompt with:
- score out of 100
- segments tier1_baas, tier2_crossborder, etc.
- misaligned JSON contract
Realign it on the current v1 contract, or explicitly reduce it to a historical
example no longer in use if you judge it should no longer be prescriptive.
It must no longer be able to reintroduce the old logic by mistake.

3. Handoff personas to clarify
The flow and agents must tell a single consistent story about the production
of gtm/personas_topN_YYYYMMDD.json. Decide clearly:
- either the GTM Agent produces the personas and the Outbound Agent consumes them
- or the Outbound Agent produces the personas, but then flow.md, agents/gtm.md
  and agents/outbound.md must be consistent
Recommendation: keep personas in the GTM Agent scope, and make the Outbound
Agent a consumer of the shortlist and personas.

4. Secondary documentary cleanup
If VISION.md is intended to be shown or used as a reference document, realign
the passages still manifestly contradicting the new taxonomy and new playbooks.
If you choose not to realign everything now, explicitly document what remains
historical.

Requirements
- Do not modify the already-validated P1 logic.
- Do not reintroduce old segments in prompts or snippets.
- Keep source_id as the unique canonical key.
- Do not touch ~/codex/.
- Favour targeted corrections over unnecessary broad rewrites.

Deliverables
1. Corrected files in ~/claude/euclid/
2. A brief final summary listing:
   - files modified
   - decisions made
   - any points deliberately left historical
3. Git versioning required:
   - stage modified files
   - create a single clear commit
   - recommended commit message:
     Align prompts and batch snippets with P1 flow contract

Note
Once these corrections are done, Codex will re-audit the changes.
The goal is to close residual gaps, not to reopen the global design.

The first time I saw a prompt this long, I was sceptical. It felt like too much context. In practice, the execution was precise. Claude made exactly the change described, nothing more. The verbosity turned out to be a feature, not noise. It left no room for Claude to interpret or guess.

The three modes

Over several projects, I settled on three ways to use the combination.

Audit mode. Codex reads an existing Claude project and identifies structural problems: missing documentation, drifted contracts, redundant files. The output is a list of issues, not a set of changes. Claude Code is not involved yet.

Challenge mode. Codex reads a project and proposes an alternative approach to something specific. A different data model, a simpler flow, a cleaner naming convention. The proposal stays in ~/codex/ as a reference. Claude Code may or may not implement it, depending on whether the proposal holds up on reflection.

Execution mode. Codex produces a prompt. Claude Code implements it. This is the mode in the example above.

What this looks like on real projects

On Euclid, the data normalisation pipeline went through several iterations where the batch prompt and the live data contract drifted apart. Codex caught this twice, quickly, because it could read the full repo in one pass and compare files across directories. Fixing the drift took one Claude Code session each time.

On the fitness dashboard, Codex audited the data pipeline structure and flagged a naming inconsistency between how Strava data was labelled in the ingestion scripts and how it was referenced in the analysis layer. A ten-minute audit, a single prompt, one session to fix it.

The honest limits

Codex does not always get the solution right. Its proposals sometimes carry more complexity than the project needs, and a non-technical user may not immediately recognise this. The rule I follow: if the prompt Codex generates requires more than three files to change, read it carefully before sending it to Claude. Scope tends to grow in that direction.

The other limit is cost management. Running two models on the same project adds up. I use Codex selectively, for audits and for situations where the context load is genuinely large. Day-to-day sessions run in Claude Code alone.

There is a less obvious benefit to the two-model setup that offsets some of that overhead: resilience. If one model has an outage, hits a rate limit, changes its pricing, or simply produces poor results on a specific task, the project does not stop. The other model can cover it. On an urgent fix, being able to switch without losing context or rebuilding a workflow from scratch has practical value. Not every project needs this kind of redundancy. But for anything running continuously or under time pressure, it is worth having.

More technical projects than mine may produce different results. The combination I describe is based on Ops and data work, not production software engineering. But the broader principle has held up: having multiple models look at the same project is like having several different perspectives on the same problem. Each one catches something the others miss. That alone has made the approach worth keeping.