AI tool comparison
agent-skills vs Codex CLI 2.0
Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.
Developer Tools
agent-skills
Production-grade engineering skills library for AI coding agents
75%
Panel ship
—
Community
Free
Entry
agent-skills is a structured library of 20 production-grade engineering skills for AI coding agents, published by Addy Osmani (former Google Chrome DevTools lead, author of Essential JavaScript Design Patterns). It provides a complete spec-to-ship workflow via 7 slash commands (/spec, /plan, /build, /test, /review, /code-simplify, /ship) that work across Claude Code, Cursor, Gemini CLI, Windsurf, and GitHub Copilot — any agent that supports CLAUDE.md or equivalent configuration files. The library includes three specialist personas that activate on demand: a security auditor (checks for injection vulnerabilities, hardcoded secrets, OWASP Top 10), a code reviewer (focuses on maintainability, complexity, and test coverage), and a test engineer (generates unit, integration, and edge-case tests). Four reference checklists (API design, accessibility, performance, deployment) give agents shared evaluation criteria. Each skill is written as a Markdown instruction file following the CLAUDE.md conventions popularized by the karpathy-skills library. agent-skills accumulated 6,693 GitHub stars in its first trending week, outpacing most comparable skill collections. Osmani's framing — treating agent skills as a first-class engineering asset rather than ad-hoc prompts — resonates with teams trying to standardize how they use AI coding tools. The library is MIT-licensed and designed to be forked and extended.
Developer Tools
Codex CLI 2.0
GPT-5 powered terminal agent for autonomous multi-file code editing
100%
Panel ship
—
Community
Free
Entry
Codex CLI 2.0 is a terminal-based coding agent from OpenAI that autonomously handles multi-file refactoring, test generation, and GitHub PR creation from the command line. It defaults to GPT-5 and operates as a local agent that can read, edit, and commit code across an entire repository. It represents a significant upgrade over the original Codex CLI, moving from single-file completions to full agentic workflows.
Reviewer scorecard
“Having security audits, test generation, and spec creation as first-class slash commands changes how you think about agent-assisted development. The cross-tool compatibility (Claude, Cursor, Gemini) means you can standardize across a team with mixed tool preferences. Fork it, customize the checklists, and you have a company playbook.”
“The primitive here is a GPT-5 loop that can read your whole repo context, plan a multi-file diff, run your tests, and open a PR — all from one shell command. That's not a wrapper, that's actual orchestration that would take a real afternoon to replicate cleanly yourself. The DX bet is right: complexity lives in the agent's planning layer, not in config files — no YAML schemas, no 12-environment-variable setup. The moment of truth is `codex 'refactor auth module to use middleware pattern'` and watching it touch six files without blowing up your imports. It survives that test more often than it should. My one gripe: the PR description quality degrades hard on large diffs, and there's no way to inject a PR template without forking the config. That's a craft miss, not a deal-breaker.”
“This is well-packaged prompt engineering, not a fundamentally new capability. The value depends entirely on the underlying agent following instructions reliably — which varies wildly across tools and models. Teams that haven't established basic code review processes will use this as a crutch rather than building genuine engineering discipline.”
“Direct competitor is Cursor's background agent plus gh CLI, and if you already pay for Cursor you have 80% of this. What Codex CLI 2.0 has that Cursor doesn't is terminal-first composability — you can pipe it into CI, chain it with make targets, run it headless on a remote box. The scenario where it breaks is any refactor that requires understanding business logic not expressed in code: rename a concept that lives in Confluence docs and a Slack thread, and the agent confidently produces the wrong thing at scale across 40 files. Prediction: OpenAI ships this as a native feature of the API with a proper function-calling scaffold in 12 months and the standalone CLI becomes redundant. It ships now because the terminal-native composability is genuinely ahead of what the API exposes directly today — but that window is narrow.”
“The real innovation here is treating agent behavior as versionable, shareable code. The next step is organizations maintaining their own agent-skills forks as living engineering standards — the CLAUDE.md pattern is becoming a de facto org-level configuration layer for how teams interact with AI.”
“The thesis baked into Codex CLI 2.0 is falsifiable: by 2028, most incremental software changes in codebases under 500k tokens will be authored by agents, not humans typing. This tool is a bet that the terminal is the right control plane for that future — not an IDE plugin, not a chat UI. That's the right bet because CI/CD pipelines are already terminal-native, and composability with existing shell tooling is a forcing function for adoption in professional environments. The second-order effect nobody is talking about: if PR creation becomes trivially agentified, the bottleneck shifts entirely to code review, and review tooling becomes the high-value surface. This tool is on-time to the agentic dev tools wave — not early, not late. The future state where this is infrastructure is every CI pipeline running a codex step that auto-generates regression tests for every PR before human review.”
“The /spec and /plan commands are genuinely useful for non-engineers who need to communicate feature requirements to an AI agent. Clear structured specs reduce the back-and-forth of vague prompts — this could be the bridge between product thinking and implementation.”
“The job-to-be-done is single and clean: execute a multi-file code change from a natural language description without leaving the terminal. No 'and' required. Onboarding is fast — `npm install -g @openai/codex`, set your API key, run one command against your repo, and you're watching it work inside 90 seconds. That's a real win. The product has an opinion: it defaults to GPT-5, it defaults to opening a PR, it defaults to running your test suite before committing — these are the right defaults and they're not configurable away without effort, which is the correct call. The incompleteness problem is the `--approve-all` flag: the tool ships it, which means the product is already deferring safety judgment to users who will absolutely misuse it on a Friday afternoon deploy. A more opinionated PM would have gated that behind an explicit config key, not a flag.”
Weekly AI Tool Verdicts
Get the next comparison in your inbox
New AI tools ship daily. We compare them before you waste an afternoon.