OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR
The name "Codex" used to mean the GPT-3 code completion model from 2021. In 2026, OpenAI's Codex is a different product: a CLI-based coding agent that reads your repo, plans changes, runs commands, and opens a PR. This guide is the install, the first run inside a real codebase, and the honest list of what Codex is and isn't good at.
What you'll learn
- What Codex is in 2026 (agent, not autocomplete)
- Install + sign in (Plus, Pro, Business, Enterprise, Edu tiers)
- First run inside a real repository
- 3 task types Codex is good at (and 3 it isn't)
- The gotchas — quota, sandboxing, when to supervise
What Codex is in 2026
Codex is OpenAI's agentic coding CLI. It is shipped as a binary you install with npm or a curl script, and it runs in your terminal. The interface is roughly:
$ codex "Add a /healthz endpoint to this Express app that returns DB ping status"
Codex then:
- Reads your repo to understand the structure.
- Proposes a plan.
- Asks for approval.
- Edits files, runs tests, runs linters.
- Reports a diff and (if you have git set up) opens a PR.
It is not the old GPT-3 Codex autocomplete model. That model was deprecated in 2023. The current Codex is built on top of codex-1 (a fine-tune of o3) and ships in the same CLI as codex on macOS, Linux, and Windows (via WSL).
Install + sign in
The install takes about 2 minutes.
macOS / Linux
npm install -g @openai/codex
codex --version # should print codex-cli 0.x.yWindows
Use WSL (Windows Subsystem for Linux). Codex does not run natively on Windows PowerShell. In a WSL terminal:
npm install -g @openai/codexSign in
codex loginThis opens a browser window to OpenAI's auth page. Sign in with the account that has one of: Plus, Pro, Business, Enterprise, or Edu tier. Codex is not available on the free ChatGPT tier as of 2026.
After login, your credentials are stored in ~/.codex/auth.json. To check:
codex whoamiFirst run inside a real repo
The best way to evaluate Codex is to give it a real task in a real repo. Do not start with a toy "hello world" — that is too easy to evaluate.
Step 1 — Pick a small, well-defined task
In your repo, find a task that is:
- Bounded — affects 1-3 files, has clear success criteria.
- Verifiable — has a test, or you can run the app and see the result.
- Safe — does not touch production data, secrets, or irreversible operations.
Good first tasks:
- "Add a
/healthzendpoint to the Express app that pings the database." - "Refactor the user authentication middleware to use async/await instead of callbacks."
- "Add JSDoc comments to every exported function in
src/utils/."
Bad first tasks:
- "Rewrite the authentication system" (too broad).
- "Fix the bug in checkout" (not specific enough — which bug?).
- "Make the app faster" (no success criteria).
Step 2 — Initialize Codex in the repo
From the repo root:
codex initThis scans the repo, identifies the language and framework, and writes a codex.md file with repo-specific instructions for Codex. You can edit codex.md to add your own conventions (e.g., "always use TypeScript strict mode," "never use any").
Step 3 — Run the task
codex "Add a /healthz endpoint to this Express app that returns DB ping status"Codex reads the repo, proposes a plan in the terminal, and waits for your approval (y/n). On approval, it makes changes, runs your test suite, and reports.
For a small task, the whole run takes 1-5 minutes. For a complex task across 10+ files, it can take 15-30 minutes.
Step 4 — Review the diff
Codex outputs a unified diff. Review it carefully. If you are happy, commit and push. If you are not, run codex revert to undo all changes (Codex uses git for safety — it commits to a temporary branch, so revert is clean).
3 task types Codex is good at
- Boilerplate generation — adding standard endpoints, model definitions, configuration files. Tasks with a clear "shape" from common patterns.
- Test writing — given an existing function, write unit tests that cover the happy path and obvious edge cases. Codex is excellent at this because tests have a clear pass/fail criterion.
- Mechanical refactors — "rename this method across the codebase," "convert all callbacks to async/await in
src/legacy/," "add type annotations to every function in this file." The mechanical nature means fewer places for Codex to be creative in the wrong way.
3 task types Codex isn't good at
- Architecture decisions — "should we use a queue here or call the service directly?" Codex will give you an answer, but it will be a generic answer. Architecture decisions need human context.
- Cross-cutting changes that touch shared abstractions — adding a new field to a database schema, then propagating it through the API, the UI, and the tests. Codex can do it, but the diff is large and the error rate is higher. Review carefully.
- Performance optimization — Codex will happily refactor code in ways that look cleaner but are slower. Always benchmark after a Codex-suggested "performance improvement."
Gotchas
1. Quota
Codex uses tokens from your ChatGPT subscription. A 30-minute complex task can burn through 5-10% of your weekly quota. Heavy users should plan for the Pro or Business tier; Plus quota is too tight for daily use.
2. Sandboxing
Codex runs commands in a sandbox by default — it can read your files but cannot make network calls, cannot touch paths outside the repo, and cannot run anything that requires sudo. This is good for safety but means tasks that need to call an external API (e.g., "test the Stripe webhook locally") require --no-sandbox or explicit permission grants.
3. The "in circles" failure mode
If Codex gets stuck — usually because the task is ambiguous or the repo is too unfamiliar — it can loop: try something, fail, try again, fail differently, try again. The CLI shows a "I've been working for X minutes without progress, want me to continue?" prompt. If you see this, kill the run, refine the task description, and try again with more context.
4. When to supervise
For tasks that affect:
- Authentication, authorization, session handling
- Payment, billing, financial logic
- Database migrations
- Production configuration
...run Codex in plan-only mode (codex --plan or by approving only the plan, not the execution). Review the plan, then either approve it or refine it.
codex --plan to get a plan without execution. Useful for "is this even the right approach?" questions before letting Codex make changes.FAQ
How is this different from Cursor / Claude Code?
Cursor lives in an editor; Codex lives in a terminal. Cursor is better for interactive coding (you in the loop, code on screen); Codex is better for autonomous tasks (give it a job, walk away, review the diff). Claude Code is the most similar competitor — it is also a terminal-based agent. Codex is generally faster; Claude Code is generally better at long-horizon reasoning across many files.
Does Codex run code?
Yes. With sandboxing, it can run any command that does not require network or elevated privileges. With --no-sandbox, it can run anything your user account can run.
Is my repo uploaded to OpenAI?
File contents are sent to the OpenAI API for processing. The repo is not "uploaded" in the persistent sense — it is not stored on OpenAI's servers, and it is not used to train future models. Check OpenAI's current data usage policy for the most accurate answer.
Why did Codex go in circles on my task?
Usually the task is ambiguous, the repo is large and unfamiliar, or the success criterion is unclear. Refine the task ("write a /healthz endpoint that returns 200 OK with a JSON body { status: 'ok', db: 'ok' } after pinging the DB") and try again.
Can I run Codex on private/closed-source code?
Yes. Use the Business or Enterprise tier; OpenAI's terms for those tiers explicitly do not retain your code or use it for training.
Does Codex work with monorepos?
Yes, but performance drops. For a 100+ package monorepo, point Codex at a specific subdirectory with codex "..." --root=packages/auth.
How is this different from GitHub Copilot's "agent mode"?
GitHub Copilot's agent mode (introduced 2024) is similar in spirit. Codex is OpenAI-native (uses codex-1/o3), runs in your terminal, and is the more autonomous of the two. Copilot's agent mode is more integrated with the GitHub PR review flow.
Will Codex replace junior engineers?
No. Codex is a force multiplier. The engineers who use it ship 2-3x faster. The engineers who don't fall behind. The total number of engineers needed for a given product is not zero-sum with Codex — it shifts the bottleneck from "writing code" to "deciding what code to write."