Until now, I mostly worked with Codex through the Web version. The workflow was clear: write a spec, break it into small tasks, define success criteria, send a task to Codex, get a branch in GitHub, review it, merge it, and then pull the code into the local environment.
That worked well, but it had one obvious limitation: Codex worked in one environment, while local testing happened in another.
With Codex CLI, the goal is not to replace the workflow. The goal is to move it into a more controlled local setup. I still want small tasks, separate branches, success criteria, and tests. The difference is that Codex now works inside my local development environment instead of only through a remote branch.
The move is roughly from this:
Codex Web -> remote branch -> merge -> local pull -> manual testing
To this:
Codex CLI -> local branch -> Dev tests -> push -> QA tests -> merge to main
Because of that, Codex should not work in the same directory where I do manual testing. I want two local working areas:
Dev -> Codex CLI work, code changes, and quick tests
QA -> manual testing before merging to main
Production is not part of this flow. It runs on a separate server, so it is not relevant to the local development cycle.
It is possible to run Codex CLI in the same directory where manual testing is done, but that creates an unhealthy mix of responsibilities.
While working, Codex may leave the directory in an intermediate state:
If manual testing happens in that same directory, it becomes harder to know what is actually being tested: a clean version of the branch, or an intermediate result from the development process.
Separating Dev from QA solves that. Codex gets its own workspace, and manual testing gets a separate one.
One way to solve this is to use two separate clones:
~/src/dev/my_project
~/src/qa/my_project
This is easy to understand, but less convenient to maintain. It duplicates the repository, requires managing remotes in more than one place, uses more disk space, and increases the chance of mixing up directories.
A cleaner option is git worktree.
git worktree allows multiple working directories to share the same repository. Each directory can be checked out to a different branch, while all of them use the same Git object database.
For example:
~/src/dev/my_project -> Dev for Codex CLI
~/src/qa/my_project -> QA and manual testing
This gives the same practical separation as an additional clone, without duplicating the whole repository.
Assume the existing repository is here:
~/src/qa/my_project
Keep it as the QA environment, and create another worktree for Codex:
cd ~/src/qa/my_project && git worktree add ../../dev/my_project main
This command first moves into the existing repository, then creates a second working directory from the main branch.
Because the command runs from:
~/src/qa/my_project
The relative path:
../../dev/my_project
Points to:
~/src/dev/my_project
The result:
~/src/qa/my_project -> QA, manual testing
~/src/dev/my_project -> Dev, Codex CLI work
From this point on, Codex works only under:
~/src/dev/my_project
Manual testing happens under:
~/src/qa/my_project
In this project, I do not use a local virtualenv. Running the app and running the tests both happen through Docker Compose, so the Dev and QA separation should also exist at the Compose level.
The simple approach is to use the same compose file with a different project name.
In Dev:
docker compose -p my_project_dev up -d --build
In QA:
docker compose -p my_project_qa up -d --build
The -p flag tells Docker Compose to create containers, networks, and volumes under a different project name. This keeps the two environments from colliding with each other.
If there are real differences between Dev and QA later, override files can be added:
docker-compose.yml
docker-compose.dev.yml
docker-compose.qa.yml
docker-compose.yml is the shared base file. It contains the common service definitions, build settings, networks, volumes, and default configuration used by both environments.
The override files are only needed when Dev and QA should differ:
docker-compose.dev.yml -> Dev-specific changes
docker-compose.qa.yml -> QA-specific changes
For example, Dev can be started like this:
docker compose -p my_project_dev -f docker-compose.yml -f docker-compose.dev.yml up -d --build
And QA like this:
docker compose -p my_project_qa -f docker-compose.yml -f docker-compose.qa.yml up -d --build
This is useful when the environments need different ports, environment variables, databases, volumes, mock services, or logging settings.
At the beginning, if there is no real difference between the environments, using the same compose file with a different project name is enough.
Start in the Dev environment:
cd ~/src/dev/my_project
Update main:
git checkout main && git pull
Create a branch for the task:
git checkout -b codex-cli/task-name
I prefer using a fixed prefix for branches created for Codex CLI work:
codex-cli/...
That makes it easy to identify where the branch came from and what it is for.
Then start the Dev environment:
docker compose -p my_project_dev up -d --build
And run Codex CLI:
codex --sandbox workspace-write --ask-for-approval on-request
While learning the workflow, I prefer not to give Codex too much freedom. It should be able to edit project files, but broader actions should still require approval.
Codex works better when the task is small and well-defined. For example:
Implement the following task in a minimal and focused way.
Task:
...
Success criteria:
1. ...
2. ...
3. ...
Testing:
Use Docker Compose.
For a quick check, run the short test command.
If it fails, provide the full test command for debugging.
Constraints:
- Use Python unittest, not pytest.
- Do not change unrelated files.
- Do not reorganize code unless explicitly required.
- Keep the change small and reviewable.
This structure matters more than the tool itself. Codex behaves better when the task is bounded, the success criteria are clear, and the constraints on the change are explicit.
During normal development, I do not always need to see the full unittest output. When everything passes, a short answer is enough.
Long test output creates noise. It makes it harder to see whether the test passed, harder to find the important line, and easier for Codex to flood the screen with information that is not useful.
So it is useful to keep two test modes:
quick -> short result for normal development
full -> full output for debugging
Codex should usually run:
scripts/test.sh quick
When debugging, it can run:
scripts/test.sh full
The same script can run against Dev or QA by changing COMPOSE_PROJECT_NAME.
Create one script:
scripts/test.sh
With this content:
#!/usr/bin/env bash
set -o pipefail
MODE="${1:-quick}"
PROJECT_NAME="${COMPOSE_PROJECT_NAME:-my_project_dev}"
SERVICE_NAME="${APP_SERVICE_NAME:-app}"
OUTPUT_FILE="/tmp/my_project_unittest_output.log"
case "$MODE" in
quick)
if docker compose -p "$PROJECT_NAME" exec "$SERVICE_NAME" python -m unittest discover -s code/_Tests -p "test_*.py" > "$OUTPUT_FILE" 2>&1; then
echo "TESTS PASSED"
else
echo "TESTS FAILED"
echo "For full output, run:"
echo "COMPOSE_PROJECT_NAME=$PROJECT_NAME APP_SERVICE_NAME=$SERVICE_NAME scripts/test.sh full"
echo "Or inspect the captured output with:"
echo "cat $OUTPUT_FILE"
exit 1
fi
;;
full)
docker compose -p "$PROJECT_NAME" exec "$SERVICE_NAME" python -m unittest discover -s code/_Tests -p "test_*.py" -v
;;
*)
echo "Usage: $0 [quick|full]"
exit 1
;;
esac
Make it executable:
chmod +x scripts/test.sh
Run a quick test in Dev:
scripts/test.sh quick
Run the full test in Dev:
scripts/test.sh full
Run a quick test in QA:
COMPOSE_PROJECT_NAME=my_project_qa scripts/test.sh quick
Run the full test in QA:
COMPOSE_PROJECT_NAME=my_project_qa scripts/test.sh full
This gives Codex one stable interface for tests, while still keeping Dev and QA separated.
After Codex finishes the work in Dev, and the quick tests pass, push the branch:
git push -u origin codex-cli/task-name
Move to the QA environment:
cd ~/src/qa/my_project && git fetch && git checkout codex-cli/task-name && git pull
Start QA:
docker compose -p my_project_qa up -d --build
Run the quick test:
COMPOSE_PROJECT_NAME=my_project_qa scripts/test.sh quick
If needed, run the full test:
COMPOSE_PROJECT_NAME=my_project_qa scripts/test.sh full
After that, do the manual testing.
Only if the manual tests pass, continue to merge into main.
After the merge, update the QA environment back from main:
cd ~/src/qa/my_project && git checkout main && git pull && docker compose -p my_project_qa up -d --build
It is also possible to run the quick test again:
COMPOSE_PROJECT_NAME=my_project_qa scripts/test.sh quick
This keeps the QA environment aligned with main.
Start with a very small task, not a real feature.
For example:
AGENTS.md to the project.scripts/test.sh.Only after the workflow itself feels stable should Codex CLI get real development tasks.