INSIGHTS

Why we released codex-strict-profile for larger codebases

An experimental public Codex profile for teams that need more verification, broader code reading, and clearer risk boundaries on medium and large codebases.

2026-03-14 • 12 min read • By Oscar

AI & AutomationWeb Development

Codex can feel impressive on a narrow task. Give it one file, one bug, one well-bounded change, and speed often looks like intelligence. Give it a medium or large codebase, a wider surface area, and a task that reaches across multiple files or decisions, and that same speed can start to hide a different problem: incomplete work delivered with too much confidence.

That was the pattern we kept seeing in real delivery work. The issue was not that the model could never write useful code. The issue was that the default operating bias often leaned toward speed, brevity, and reasonable-sounding assumptions at exactly the moments when a broader surface needed more reading, more verification, and more caution. On a larger repository, that changes the cost curve. The most expensive failure is not a dramatic crash. It is the quiet miss: the unreviewed edge, the skipped caller, the half-finished refactor, the claim of “done” before the broader system was actually checked.

That is why we published codex-strict-profile as a public experiment. It is a small standalone package that rewrites the selected CODEX_HOME so new Codex sessions use a stricter Via Logos profile by default. The goal is simple: push Codex toward more verification, more code reading, more explicit validation, and less assumption-heavy execution on larger or higher-risk codebases.

This is the same governed-delivery lens that shapes how we approach AI & Automation and the same reason we care so much about human-in-the-loop agentic workflows. Speed matters. But on real systems, speed without defensibility is usually just delayed rework.

ขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙กขคงจฉชซญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ๐๑๒๓๔๕๖๗๘๙ก

Need AI delivery that stays governable on a real codebase?

Free online consultation. Then you get a clear first milestone, acceptance criteria, and a breakdown of fixed‑price Statements of Work (SoWs).

Request a free consultation

The failure is not bad syntax

That is not what we are saying.

The real problem is operational. On broader tasks, the default behavior of many coding assistants still rewards:

moving ahead when ambiguity should stop the run
reading narrowly when the task demands wider repository context
treating delegation as a shortcut instead of a governed tool
optimizing for fast response over verifiable completeness
declaring success before the changed surface has been meaningfully checked

The codex-strict-profile README states the core reason for the package plainly: the stock Codex prompt stack is biased toward speed, brevity, and assumption-heavy execution in ways that can become counterproductive on larger codebases. That framing matters. It is not an indictment of all AI-assisted development. It is a recognition that operating defaults shape outcomes.

We also do not think bigger context windows solve this by themselves. That is one reason the broader market conversation keeps circling back to architecture, retrieval, and workflow discipline. In Sourcegraph’s write-up on long context for code, longer context helps, but large repositories still need global retrieval and code intelligence. In GitHub’s prompt engineering guidance for Copilot Chat, the recurring lesson is not “ask faster.” It is “state the goal clearly, break complex tasks down, provide context, and reduce ambiguity.” The pattern is consistent: reliability comes from a better operating model, not from confidence alone.

What changes after install

The package does two concrete things.

One part is simple file placement: the installer drops a strict asset bundle into whatever CODEX_HOME you point it at.

Second, it rewrites CODEX_HOME/config.toml so new Codex sessions use the vl_strict profile by default.

The current managed profile sets a deliberately cautious operating posture:

approval_policy = "never"
sandbox_mode = "danger-full-access"
multi_agent = true
fast_mode = false
verbosity = "high"
reasoning pinned high, including plan-mode reasoning
Linux bwrap disabled inside the strict profile

This package does not hide what it is doing. The README spells it out: once you install it, that CODEX_HOME stays on the stricter profile until you uninstall it.

The repo also makes the lifecycle plain:

install and uninstall scripts exist for macOS, Linux, WSL, and PowerShell
install creates a pre-change backup when appropriate
uninstall restores that backup if it exists
smoke tests are included so you can verify the install path cleanly

That matters for enterprise readers because rollout questions are rarely only about features. They are about reversibility, predictability, and the ability to inspect exactly what changed.

A quick comparison

For most teams, the useful comparison is not which key lands in config.toml. It is how the assistant behaves once the repo gets wide.

Dimension	Typical speed-biased default	`codex-strict-profile` bias	Why it matters on larger codebases
Ambiguity handling	Assume and proceed	Verify or mark unresolved	Wide-surface work breaks when hidden assumptions pile up
Code reading depth	Read only what seems immediately relevant	Read more surrounding code before editing	Cross-file changes often fail outside the obvious file
Reuse discipline	Patch locally to move fast	Reuse existing abstractions where possible	Broad codebases punish duplicated logic and one-off fixes
Delegation posture	Smaller, faster helpers feel like the default	Multi-agent is available, but verification still matters	Delegation without checking can multiply bad assumptions
Validation	Completion can look done before checks are strong enough	Validate the changed surface before claiming done	The real cost is rework, rescue sessions, and missed downstream breakage
Permissions model	May vary by ambient config and approvals	Explicit, strong operating mode for the selected profile	Predictable behavior helps, but the risk must be accepted consciously

If you are the person who owns rollout risk, the permissions row is where the conversation gets real.

If you only read the strictness part and miss the permission model, you are reading the package incorrectly. This profile is not “safer” in the sense of tighter containment. It is stricter in reasoning and workflow posture while simultaneously being more powerful and more dangerous at the machine boundary. That tradeoff must stay visible.

Why strict does not mean safe

This distinction is worth making explicitly because it is exactly where technical buyers can get trapped by vague language.

Stricter reasoning behavior can be good for larger codebases because it pushes the agent toward:

better uncertainty handling
broader code reading
stronger validation expectations
more defensible implementation choices

But the profile also pins Codex to an operating mode where it can act without approval prompts and with broad filesystem access inside the user boundary. That means the package improves one kind of discipline while relaxing another kind of friction.

In practice, the right way to think about the profile is this:

it is more governed at the reasoning layer
it is more forceful at the machine-execution layer
it is only a good idea when those two facts fit your environment

That is why the README recommends using it only on repositories and machines you trust, preferring a dedicated CODEX_HOME when isolation matters, and keeping backups away from sensitive working contexts. Those are not legal disclaimers. They are the operating model.

The profile is more deliberate at the reasoning layer and more forceful at the machine boundary. That tradeoff has to stay visible.

Why this matters on larger surfaces

Small tasks forgive a lot. Larger systems do not.

On a medium or large codebase, one incomplete change rarely stays local. A renamed interface ripples outward. A shortcut around an existing abstraction creates divergence. A missing validation path forces a rescue session later. A confident summary that was not checked against the broader surface wastes reviewer time because now the reviewer has to reconstruct what the agent actually did not read.

This is the context where a stricter operating posture can be worth the friction.

Not because caution is emotionally satisfying. Because the shape of the work changes:

the relevant code is farther away from the edited file
the number of callers and side effects grows
the cost of a half-right answer rises sharply
the team often cares more about traceability than raw draft speed
validation becomes part of the feature, not a cleanup step

This is also why the package may save tokens over the full life of a task even if it spends more tokens per session. The repo README is careful here, and we should be too: that is an inference, not a measured guarantee. If stricter behavior reduces retries, rescue sessions, and rework, the total cost of finishing a task can improve. But that depends on workflow, repo shape, and how the profile is used.

That point matters especially for enterprise teams. They do not optimize for cheap-looking sessions in isolation. They optimize for the total cost of reaching a trustworthy result: engineering time, review time, debugging time, release risk, and the number of times a task has to be recovered after the assistant sounded more complete than it actually was.

Use it when

The profile makes the most sense when the cost of a bad shortcut is higher than the cost of a slower first pass.

You are working in a medium or large repository where cross-file effects are common.
You need the agent to read more, verify more, and guess less.
You care about defensible implementation choices, not just fast drafts.
You are using Codex on a trusted machine and trusted repositories.
You can isolate the behavior to a dedicated CODEX_HOME if needed.
You want a public package your team can inspect, install, uninstall, and discuss openly.

Do not use it casually when

You are working around sensitive files, unrelated working directories, or production credentials.
You want a low-friction assistant for small throwaway tasks.
You are not comfortable with approval_policy = "never" and danger-full-access.
You do not want the selected CODEX_HOME to change globally until uninstall.
You have not reviewed the install and uninstall behavior yet.
You are expecting a magic fix for weak prompts, unclear requirements, or poor repo hygiene.

The profile can nudge Codex toward better habits, but it still cannot rescue a ticket that says “clean this up,” a codebase full of one-off abstractions, or a team that rubber-stamps the diff.

How to test it before a wider rollout

If your team is curious but cautious, do not start with a broad rollout. Start with a bounded review.

1. Isolate the environment

Use a dedicated CODEX_HOME if you want to separate strict behavior from your normal Codex setup. That is one of the README’s own recommended safety boundaries, and it is the right default for evaluation.

2. Read the actual behavior changes

Before installing anything, review the public repo and confirm the global profile behavior matches what your team intends to allow:

install and uninstall scripts exist for macOS, Linux, WSL, and PowerShell
install creates a preinstall backup when appropriate
uninstall restores from backup when present
smoke tests are included

3. Run a real task, not a toy task

Do not judge the profile on a single greenfield prompt. Test it on the kind of work that actually hurts when the assistant cuts corners:

a multi-file refactor
a change that touches a shared interface
a task where the acceptance criteria depend on broad repository awareness

This is where the release lines up with the strongest external writing in the space. The best large-codebase advice is consistently evaluation-led. Kilo’s architecture article is especially good on this point: test on real refactors, real callers, real failure modes. That is the right instinct here too.

4. Compare not just output, but review burden

After the trial, open the diff with the reviewer and ask where they still had to chase missing callers, skipped checks, or fuzzy claims.

Most teams will take a slower first pass if it means the reviewer is not spending the afternoon hunting for skipped callers, missing checks, and half-finished edges.

5. Keep the risk language honest

If your team decides the profile is useful, keep the operating boundaries attached to it. Do not flatten the message into “better Codex.” That would be misleading. It is better described as “a more cautious and more forceful operating profile that should be used deliberately.”

A first-week rollout slice

For teams that want a practical adoption path, here is the lowest-risk first week:

Before you run a single task, read the repo, walk through the install script, and make sure you know exactly how uninstall gets you back out.

That first week should feel almost dull. You are reading the repo, testing install and uninstall, and watching how the assistant behaves when the task is real. For this package, that kind of plain trial tells you more than any flashy benchmark ever will.

A strict operating profile raises the evidence bar at the read wider, caller review, and validate checkpoints before a task is treated as done.

Why we released it now

We opened the repo because this kind of decision should be inspectable. Teams need to see the install scripts, the config changes, and the warning language for themselves.

Too much AI tooling marketing still tries to compress a serious workflow decision into a feeling: faster, smarter, more autonomous. But engineering leaders do not buy feelings. They buy risk-adjusted outcomes. They need to know:

what changed
what new risk comes with it
what problem it is supposed to solve
how to test whether it really helps
how to back it out cleanly if it does not

A public repo lets a buyer, staff engineer, or platform owner inspect the tradeoff instead of relying on a polished demo.

It is also why we are keeping the positioning transparent. codex-strict-profile is experimental. It should be used with care. It changes the behavior of the selected CODEX_HOME globally until uninstall. And while it may reduce full-task rework over time, that is still a reasoned expectation, not a measured promise.

The real point

A team working inside a trusted dev environment may welcome this profile; a team juggling sensitive directories on a shared laptop should probably walk away.

Once a repository gets big enough that one missed caller can burn half a day, the assistant’s defaults stop feeling abstract.

If your team is trying to bring governed AI workflows into real delivery work, start at https://via-logos.com or email team@vialogos.org. We can pressure-test the operating model against your codebase, your controls, and the rollout risk.

If the fit is there, we write the delivery pipeline, draft the SoW, and send an official quote your team can take back to the room.