OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR

Q: Is my repo uploaded to OpenAI?

File contents are sent to the OpenAI API for processing. The repo is not stored on OpenAI's servers and is not used to train future models. Use the Business or Enterprise tier for explicit non-retention.

The name "Codex" used to mean the GPT-3 code completion model from 2021. In 2026, OpenAI's Codex is a different product: a CLI-based coding agent that reads your repo, plans changes, runs commands, and opens a PR. This guide is the install, the first run inside a real codebase, and the honest list of what Codex is and isn't good at.

What you'll learn

What Codex is in 2026 (agent, not autocomplete)
Install + sign in (Plus, Pro, Business, Enterprise, Edu tiers)
First run inside a real repository
3 task types Codex is good at (and 3 it isn't)
The gotchas — quota, sandboxing, when to supervise

What Codex is in 2026

Codex is OpenAI's agentic coding CLI. It is shipped as a binary you install with npm or a curl script, and it runs in your terminal. The interface is roughly:

$ codex "Add a /healthz endpoint to this Express app that returns DB ping status"

Codex then:

Reads your repo to understand the structure.
Proposes a plan.
Asks for approval.
Edits files, runs tests, runs linters.
Reports a diff and (if you have git set up) opens a PR.

It is not the old GPT-3 Codex autocomplete model. That model was deprecated in 2023. The current Codex is built on top of codex-1 (a fine-tune of o3) and ships in the same CLI as codex on macOS, Linux, and Windows (via WSL).

Tip

Codex is the successor to / complement to GitHub Copilot Workspace and OpenAI's earlier "Code Interpreter" product. If you used either of those, Codex is the same idea but better integrated with your terminal and git workflow.

The install takes about 2 minutes.

macOS / Linux

npm install -g @openai/codex
codex --version  # should print codex-cli 0.x.y

Windows

Use WSL (Windows Subsystem for Linux). Codex does not run natively on Windows PowerShell. In a WSL terminal:

npm install -g @openai/codex

codex login

This opens a browser window to OpenAI's auth page. Sign in with the account that has one of: Plus, Pro, Business, Enterprise, or Edu tier. Codex is not available on the free ChatGPT tier as of 2026.

After login, your credentials are stored in ~/.codex/auth.json. To check:

codex whoami

First run inside a real repo

The best way to evaluate Codex is to give it a real task in a real repo. Do not start with a toy "hello world" — that is too easy to evaluate.

Step 1 — Pick a small, well-defined task

In your repo, find a task that is:

Bounded — affects 1-3 files, has clear success criteria.
Verifiable — has a test, or you can run the app and see the result.
Safe — does not touch production data, secrets, or irreversible operations.

Good first tasks:

"Add a /healthz endpoint to the Express app that pings the database."
"Refactor the user authentication middleware to use async/await instead of callbacks."
"Add JSDoc comments to every exported function in src/utils/."

Bad first tasks:

"Rewrite the authentication system" (too broad).
"Fix the bug in checkout" (not specific enough — which bug?).
"Make the app faster" (no success criteria).

Step 2 — Initialize Codex in the repo

From the repo root:

codex init

This scans the repo, identifies the language and framework, and writes a codex.md file with repo-specific instructions for Codex. You can edit codex.md to add your own conventions (e.g., "always use TypeScript strict mode," "never use any").

Step 3 — Run the task

codex "Add a /healthz endpoint to this Express app that returns DB ping status"

Codex reads the repo, proposes a plan in the terminal, and waits for your approval (y/n). On approval, it makes changes, runs your test suite, and reports.

For a small task, the whole run takes 1-5 minutes. For a complex task across 10+ files, it can take 15-30 minutes.

Step 4 — Review the diff

Codex outputs a unified diff. Review it carefully. If you are happy, commit and push. If you are not, run codex revert to undo all changes (Codex uses git for safety — it commits to a temporary branch, so revert is clean).

Tip

Always review the diff. Codex is good but not perfect, and the cost of a 30-second manual review is much smaller than the cost of a subtle bug in production.

3 task types Codex is good at

Boilerplate generation — adding standard endpoints, model definitions, configuration files. Tasks with a clear "shape" from common patterns.
Test writing — given an existing function, write unit tests that cover the happy path and obvious edge cases. Codex is excellent at this because tests have a clear pass/fail criterion.
Mechanical refactors — "rename this method across the codebase," "convert all callbacks to async/await in src/legacy/," "add type annotations to every function in this file." The mechanical nature means fewer places for Codex to be creative in the wrong way.

3 task types Codex isn't good at

Architecture decisions — "should we use a queue here or call the service directly?" Codex will give you an answer, but it will be a generic answer. Architecture decisions need human context.
Cross-cutting changes that touch shared abstractions — adding a new field to a database schema, then propagating it through the API, the UI, and the tests. Codex can do it, but the diff is large and the error rate is higher. Review carefully.
Performance optimization — Codex will happily refactor code in ways that look cleaner but are slower. Always benchmark after a Codex-suggested "performance improvement."

Gotchas

1. Quota

Codex uses tokens from your ChatGPT subscription. A 30-minute complex task can burn through 5-10% of your weekly quota. Heavy users should plan for the Pro or Business tier; Plus quota is too tight for daily use.

2. Sandboxing

Codex runs commands in a sandbox by default — it can read your files but cannot make network calls, cannot touch paths outside the repo, and cannot run anything that requires sudo. This is good for safety but means tasks that need to call an external API (e.g., "test the Stripe webhook locally") require --no-sandbox or explicit permission grants.

3. The "in circles" failure mode

If Codex gets stuck — usually because the task is ambiguous or the repo is too unfamiliar — it can loop: try something, fail, try again, fail differently, try again. The CLI shows a "I've been working for X minutes without progress, want me to continue?" prompt. If you see this, kill the run, refine the task description, and try again with more context.

4. When to supervise

For tasks that affect:

Authentication, authorization, session handling
Payment, billing, financial logic
Database migrations
Production configuration

...run Codex in plan-only mode (codex --plan or by approving only the plan, not the execution). Review the plan, then either approve it or refine it.

Tip

Use codex --plan to get a plan without execution. Useful for "is this even the right approach?" questions before letting Codex make changes.

FAQ

How is this different from Cursor / Claude Code?

Cursor lives in an editor; Codex lives in a terminal. Cursor is better for interactive coding (you in the loop, code on screen); Codex is better for autonomous tasks (give it a job, walk away, review the diff). Claude Code is the most similar competitor — it is also a terminal-based agent. Codex is generally faster; Claude Code is generally better at long-horizon reasoning across many files.

Does Codex run code?

Yes. With sandboxing, it can run any command that does not require network or elevated privileges. With --no-sandbox, it can run anything your user account can run.

Is my repo uploaded to OpenAI?

File contents are sent to the OpenAI API for processing. The repo is not "uploaded" in the persistent sense — it is not stored on OpenAI's servers, and it is not used to train future models. Check OpenAI's current data usage policy for the most accurate answer.

Why did Codex go in circles on my task?

Usually the task is ambiguous, the repo is large and unfamiliar, or the success criterion is unclear. Refine the task ("write a /healthz endpoint that returns 200 OK with a JSON body { status: 'ok', db: 'ok' } after pinging the DB") and try again.

Can I run Codex on private/closed-source code?

Yes. Use the Business or Enterprise tier; OpenAI's terms for those tiers explicitly do not retain your code or use it for training.

Does Codex work with monorepos?

Yes, but performance drops. For a 100+ package monorepo, point Codex at a specific subdirectory with codex "..." --root=packages/auth.

How is this different from GitHub Copilot's "agent mode"?

GitHub Copilot's agent mode (introduced 2024) is similar in spirit. Codex is OpenAI-native (uses codex-1/o3), runs in your terminal, and is the more autonomous of the two. Copilot's agent mode is more integrated with the GitHub PR review flow.

Will Codex replace junior engineers?

No. Codex is a force multiplier. The engineers who use it ship 2-3x faster. The engineers who don't fall behind. The total number of engineers needed for a given product is not zero-sum with Codex — it shifts the bottleneck from "writing code" to "deciding what code to write."

OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR

OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR

What you'll learn

What Codex is in 2026

macOS / Linux

Windows

First run inside a real repo

Step 1 — Pick a small, well-defined task

Step 2 — Initialize Codex in the repo

Step 3 — Run the task

Step 4 — Review the diff

3 task types Codex is good at

3 task types Codex isn't good at

Gotchas

1. Quota

2. Sandboxing

3. The "in circles" failure mode

4. When to supervise

FAQ

How is this different from Cursor / Claude Code?

Does Codex run code?

Is my repo uploaded to OpenAI?

Why did Codex go in circles on my task?

Can I run Codex on private/closed-source code?

Does Codex work with monorepos?

How is this different from GitHub Copilot's "agent mode"?

Will Codex replace junior engineers?

Frequently asked questions

Related tutorials

Codex CLI code review: a pre-merge checklist for small teams

Claude Code CLI vs Claude Cowork: which one to use for which task

Cursor vs Trae IDE vs Claude Code: which AI IDE for solo devs in 2026?

How to set up WorkBuddy and run your first autonomous task

OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR

OpenAI Codex CLI: install it, give it a real codebase, watch it ship a PR

What you'll learn

What Codex is in 2026

Install + sign in

macOS / Linux

Windows

Sign in

First run inside a real repo

Step 1 — Pick a small, well-defined task

Step 2 — Initialize Codex in the repo

Step 3 — Run the task

Step 4 — Review the diff

3 task types Codex is good at

3 task types Codex isn't good at

Gotchas

1. Quota

2. Sandboxing

3. The "in circles" failure mode

4. When to supervise

FAQ

How is this different from Cursor / Claude Code?

Does Codex run code?

Is my repo uploaded to OpenAI?

Why did Codex go in circles on my task?

Can I run Codex on private/closed-source code?

Does Codex work with monorepos?

How is this different from GitHub Copilot's "agent mode"?

Will Codex replace junior engineers?

Frequently asked questions

Related tutorials

Codex CLI code review: a pre-merge checklist for small teams

Claude Code CLI vs Claude Cowork: which one to use for which task

Cursor vs Trae IDE vs Claude Code: which AI IDE for solo devs in 2026?

How to set up WorkBuddy and run your first autonomous task