A good local WordPress development setup is invisible. You clone a repo, run one command, and you’re editing code against a real WordPress site within a minute. Tests run on every save. Bugs reproduce instantly. Changes ship to a real staging environment with one click. The whole machine fades into the background and you spend your time on the work that matters.
A bad setup is the opposite. You spend half your morning fighting Docker, the other half discovering why the bug you fixed locally still happens on production. Friction compounds. By the time you’re three months into a project, every change costs you twice what it should.
This is a complete guide to setting up the good version. It covers every tool worth knowing in 2026 (and a few that aren’t), how to pick the right stack for what you’re doing, the configurations I actually use, the testing patterns that scale from solo work to a team with AI agents in the loop, and the gotchas you’ll hit along the way.
The framing throughout is AI-first. I run a team of AI agents shipping production WordPress code every day, and I’m building the platform that makes that work for everyone else at AgentVania. The tools and patterns that hold up when an AI agent is one of your developers are the same ones that make life easier for a human developer. If you build for the harder case, the easier case comes free.
1. Why local WordPress development matters
The case for local development is so old it barely needs making, but humour me — the constraints have shifted in 2026, and the modern answer isn’t the same as it was five years ago.
You need a local environment because:
- Iteration speed. Editing files locally is hundreds of milliseconds faster than editing over SFTP or through wp-admin. Compounded across a day, that’s hours.
- Reproducibility. A real WordPress site that mirrors production lets you reproduce bugs that don’t happen in your head and prove fixes that don’t happen by hope.
- Testing. PHPUnit and Playwright don’t run against production. They run against an isolated WordPress install, which has to come from somewhere.
- CI. The same environment definition that runs on your laptop should run in GitHub Actions. The local environment is the source of the CI environment, not a separate thing.
- AI agents. This is the new one. An AI agent on your team writing code, running tests, and verifying its own work needs a programmatic environment to operate against. The local environment is now also the agent environment.
The last point reshapes which tools deserve serious consideration. A local environment that only works through a GUI excludes AI agents from your workflow by construction. That’s a choice — and increasingly a costly one.
2. What you actually need from a local environment
Eight requirements. The first four are universal. The last four are what an AI-first workflow adds; they happen to also be exactly what a productive human workflow wants.
1. A real WordPress install. Not a partial mock, not a stubbed runtime. The actual WordPress core, running against a real database (or close to one), serving real admin pages over HTTP.
2. Configurable WP / PHP versions. You should be able to point the environment at any WordPress core version (current stable, trunk, a specific tag) and any PHP version (8.1, 8.2, 8.3, 8.4) without reinstalling anything. Compatibility matrices are routine in plugin work.
3. WP-CLI access. Every meaningful operation in WordPress — activating plugins, importing content, updating options, creating users, exporting databases — has a WP-CLI command. Your environment must expose wp as a callable command against the running install.
4. Reset to clean. You should be able to destroy the entire environment and rebuild it from scratch in seconds. Persistent test cruft is the silent killer of testing discipline.
5. Environment defined in a file. Not configured through a wizard. Not stored in some per-machine app data folder. A config file committed to the repo, alongside the code, so anyone (human or agent) cloning the repo gets an identical environment from one command.
6. Every operation as a CLI command. “Create site,” “install plugin,” “import database,” “run tests,” “reset state” — all callable from a script. No required clicks anywhere.
7. Programmatic state inspection. You need to query “is this plugin active?”, “what does this option store?”, “what errors did the last cron run produce?” — without screenshotting an admin screen and parsing pixels.
8. Reproducible snapshots. Spin a fresh environment, run a test scenario, capture the result, tear it down. Repeatedly. Cheaply. The cost of a single test run sets the ceiling on how many tests you’ll actually run.
GUI-driven tools score 0 out of 4 on requirements 5–8 by default. Some have community workarounds that lift the score. None compare to tools that were built for the command line from day one. Keep this scorecard in mind as you read the tool landscape below.
3. The tool landscape
There are more local WordPress tools than most people realise, and the field has shifted significantly in 2025–2026. Here’s the complete picture, grouped by category.
Desktop GUI apps
Local WP (formerly Local by Flywheel). The friendliest GUI in the space and the tool most WordPress devs reach for first. Click “Create Site,” pick PHP and WP versions, you’re in. It bundles Mailpit, SSL, Live Link tunnels, cloud backups, and a blueprint system for saving site configurations. There’s no official CLI, but the community has filled the gap: salcode/wpcli-localwp-setup wires up WP-CLI directly, and a March 2026 AI agent skill wraps WP-CLI for LLM use. Excellent for human-first work, possible to drive from agents with extra effort.
DevKinsta. Kinsta’s free local dev tool. Polished UI, push-to-staging integration if you host on Kinsta. Docker under the hood, GUI on top. A reasonable pick if you’re already in their ecosystem. If you’re not, you’re using a hosting vendor’s tool for portability reasons, which is backwards.
WordPress Studio (the WP.com one, formerly Studio by WordPress.com). New since 2024, growing fast. Built on wp-now — WordPress compiled to WebAssembly with SQLite instead of MySQL, no Docker required. Spins up in seconds. Defaults to SQLite but configurable to MySQL by editing wp-config.php. The 2025 SQLite driver rewrite (AST-based, replicates MySQL’s information_schema) closed most of the compatibility gap. Has a polished GUI plus some CLI access. The fastest path to “WordPress, right now” without thinking about servers.
Docker-based tools
wp-env. WordPress’s official Docker-based environment, distributed as @wordpress/env on npm. Configured by a .wp-env.json file in your repo, so the environment is checked in alongside the code. Two commands: wp-env start, wp-env run cli wp .... Ships a pre-wired PHPUnit test environment via a tests-cli container. Less polished than the desktop GUIs but infinitely more scriptable. Purpose-built for plugin and theme development.
DDEV. Full-featured Docker dev environment, YAML config, multi-stack (WordPress, Laravel, Drupal, plain PHP, more). ddev wp runs WP-CLI inside the container. ddev pull syncs database and files from staging. Different projects can run different PHP/MySQL versions side by side without conflict. Works on macOS (with OrbStack, Docker Desktop, Colima, or Lima), Windows (WSL2), and Linux. The agency consensus in 2026, and the strongest choice for full WordPress sites where the unit of work is bigger than one plugin.
Lando. YAML-configured Docker dev environments. Powerful, flexible, supports many stacks beyond WordPress. Plays in DDEV’s space and has been there longer; DDEV has eaten most of its momentum in the past two years. Still well-loved at agencies that use Lando across non-WP projects too.
Custom Docker Compose. Some teams roll their own docker-compose.yml. Total control, total maintenance burden. The right answer when your stack is unusual (multisite plus WPML plus custom services plus specific PHP-FPM tuning) and the wrong answer for plain plugin dev.
For a deeper comparison of when to reach for Local WP versus the Docker family specifically, I covered that in Local WP vs Docker: When to Use Each for WordPress Development — this guide takes a broader view across the whole landscape, but the Local-WP-vs-Docker question is the most common branching point and that post zooms in on it.
WebAssembly tools
WordPress Playground. The same WASM-based engine as Studio, but shipped as a browser tool and a CLI (@wp-playground/cli). The killer feature is blueprints: a JSON file describing exactly which WP version, which plugins (from URLs or zips), which settings, which demo content. Send someone a blueprint URL and they get an identical WordPress site in their browser in five seconds. For bug reproductions and demos, nothing else comes close. Same SQLite caveat as Studio, same new driver improvements.
Legacy and specialised
VVV (Varying Vagrant Vagrants). Vagrant-based, heavier than the Docker options, older school. Still the recommended path if you’re contributing to WordPress core itself, where you need the exact reference setup the core team uses.
Trellis / Bedrock. The Roots stack. Composer-managed WordPress, Ansible-provisioned dev/staging/prod. Not a “spin up and click” tool but a whole methodology for treating WordPress like a real PHP project. Excellent if you’re already living that way, massive overkill if you’re not.
Laravel Valet (with the WordPress driver). Mac-only, lightweight, fast. Real PHP, real MySQL, no containers. Some Laravel devs who also do WordPress swear by it. Less common in the pure-WP world.
MAMP / XAMPP. Still around, still works. If you’re maintaining a 12-year-old site on PHP 7.4 and don’t want to learn Docker, fine. Otherwise, move on.
Cloud ephemeral
InstaWP. Cloud rather than local, but worth naming. Spin up a real hosted WP site in 30 seconds, default lifespan a couple of days, extendable. Has an API for scripted provisioning. Great for sending a colleague something to click on, and handy for content and marketing screenshots. Not a primary dev environment but it occupies the same mental slot for some tasks.
Helpers worth knowing
OrbStack. Not a WP tool, but the single biggest performance upgrade you can make to your local dev setup if you’re on a Mac. It replaces Docker Desktop with a lighter, faster, native engine. Free for personal use.
Install with Homebrew:
brew install --cask orbstack
open -a OrbStack
On first launch, grant the system permissions it asks for (network extension, virtualization entitlements — standard Mac dev tool stuff). If you’re migrating from Docker Desktop, OrbStack offers a one-click import of your existing containers, images, and volumes on first run.
After that, every Docker-based WP tool just works on top of OrbStack with zero config changes. wp-env’s npx wp-env start finds the Docker socket OrbStack exposes the same way it would find Docker Desktop’s. ddev start, lando start, any docker-compose up you run — they all sit on top of the OrbStack engine without needing to know. You don’t reconfigure your tools; you just have a faster engine under them.
What you gain: 3–5× faster startup on the same workloads, roughly 40% lower memory pressure, no licensing concerns for personal or small-team use, and a clean orb CLI for managing containers directly (orb ps, orb logs, orb shell <container>) if you want it. What you give up: nothing meaningful unless your workflow depended on a Docker Desktop-specific feature like the Docker Scout panel.
For wp-env specifically, this is the change that makes the “start a fresh test environment” cycle feel cheap enough to do constantly rather than something you avoid.
4. Picking your stack by use case
Run every tool above against the eight requirements, and the right answer becomes a function of what you’re trying to do. Here’s the mapping I use.
You’re developing a WordPress plugin or theme, especially one you’ll distribute on WordPress.org or sell commercially. Use wp-env. The environment definition lives in your plugin repo as .wp-env.json, the PHPUnit test infrastructure is pre-wired, and the same definition runs in GitHub Actions for free. Add OrbStack underneath for speed. This is what I use on every plugin I maintain.
You’re building a full WordPress site — custom theme, custom plugins, integrations with external services, the works. Use DDEV. wp-env is purpose-built for plugin/theme dev and gets cramped when the unit of work is bigger. DDEV gives you the broader stack support, the ddev pull workflow for syncing from staging, and the flexibility to add Redis, Elasticsearch, Mailpit, or whatever your stack needs as additional containers.
You’re doing client work that needs to feel like a real long-lived WordPress site you can come back to next month. Use Local WP. It’s optimised for exactly this. Persistent state, friendly UI, Push/Pull to WP Engine and Flywheel, no Docker tax.
You need to reproduce a customer-reported bug, or demonstrate that a bug is fixed, in a way you can share. Use a WordPress Playground blueprint. The reproduction is a single JSON file you can commit to a PR, send to a customer, or hand to QA. They open one URL and they’re looking at the same broken state you’re looking at. Round-trips on bug reports drop from days to minutes once you adopt this pattern. I cover the authoring details in §7.
You’re prototyping a theme or small plugin and don’t want to wait for Docker. Use WordPress Studio. Sub-second startup, polished GUI, real WordPress. Just know the SQLite caveat: if you’re testing something that depends on MySQL-specific behaviour, switch the Studio site to MySQL via wp-config.php, or fall back to wp-env.
You’re contributing to WordPress core. Use VVV. It’s the reference setup the core team itself uses. The other tools work for core contribution too, but VVV is the most aligned with how the core team thinks about the environment.
You’re at an agency running 20+ WordPress projects across multiple developers. Use DDEV as your standard. Commit .ddev/config.yaml to every project, everyone clones and runs ddev start, you have consistency across the team. The ddev pull workflow lets developers sync from the staging environment, which removes the “but my local doesn’t match the bug we’re seeing” friction.
You need to send a colleague a real hosted WordPress site to click on (not just local). Use InstaWP. Cloud-ephemeral, public URL, lifespan you control.
5. Setting it up — the AI-first stack, concrete
Here’s the configuration I use for a real production WordPress plugin with several add-ons. Adapt to your project; the structure transfers cleanly.
OrbStack
If you’re on a Mac and don’t have it yet:
brew install --cask orbstack
open -a OrbStack # first launch, grant permissions when prompted
OrbStack provides the Docker engine. wp-env, DDEV, and anything else Docker-based will use it transparently.
wp-env for plugin development
Create .wp-env.json at the root of your plugin repo:
{
"core": "WordPress/WordPress#6.8",
"phpVersion": "8.2",
"plugins": [
".",
"../my-plugin-addon-feed-to-post",
"../my-plugin-addon-full-text",
"../my-plugin-addon-filtering"
],
"themes": [],
"config": {
"WP_DEBUG": true,
"WP_DEBUG_LOG": true,
"WP_DEBUG_DISPLAY": false,
"SCRIPT_DEBUG": true
},
"mappings": {
"wp-content/mu-plugins/query-monitor-loader.php": "./wp-env/mu-plugins/query-monitor-loader.php"
}
}
The plugins array mounts your plugin (from .) plus any sibling repos that depend on it. The core ref pins to a specific WordPress version — flip to WordPress/WordPress#6.9 or WordPress/WordPress#trunk to test against a different release.
Run it:
npx @wordpress/env@latest start
That brings up two containers: a WordPress container at http://localhost:8888 and a tests container for PHPUnit. The first startup takes a couple of minutes (image pull); subsequent starts are seconds.
WP-CLI:
npx wp-env run cli wp plugin list
npx wp-env run cli wp option update timezone_string America/New_York
npx wp-env run cli wp post create --post_title="Test" --post_status=publish
Bootstrap script for repeatable state
The raw .wp-env.json gives you an empty WordPress. For real testing you want plugins activated, a known timezone (so timezone bugs reproduce), some demo content. Wrap that in a script:
#!/usr/bin/env bash
# wp-env/bootstrap.sh — idempotent setup after wp-env start
set -e
WP="npx wp-env run cli wp"
# Activate all mounted plugins
$WP plugin activate --all
# Non-UTC timezone so timezone bugs surface locally
$WP option update timezone_string America/New_York
# Theme
$WP theme activate twentytwentyfour
# Test page
$WP post list --post_type=page --field=ID | grep -q . || \
$WP post create --post_type=page --post_title="Test page" --post_status=publish
# Seed sources, plugins, whatever else your dev workflow needs
# ...
echo "Bootstrap complete. Admin: http://localhost:8888/wp-admin (admin / password)"
Add npm scripts to your package.json so the team converges on the same commands:
{
"scripts": {
"wp:up": "npx @wordpress/env@latest start && bash wp-env/bootstrap.sh",
"wp:reset": "npx @wordpress/env@latest destroy && npm run wp:up",
"wp:cli": "npx @wordpress/env@latest run cli wp",
"wp:test": "npx @wordpress/env@latest run tests-cli wp test"
}
}
Now anyone (or any AI agent) running pnpm wp:up after cloning the repo has the same environment you do.
DDEV alternative
For full-site projects, the equivalent setup:
brew install ddev/ddev/ddev
cd ~/sites/my-wordpress-site
ddev config --project-type=wordpress --project-name=my-site --docroot=public
ddev start
ddev wp core download --path=public
ddev wp core install --url=$(ddev describe -j | jq -r .raw.primary_url) \
--title="My Site" --admin_user=admin --admin_password=password \
[email protected] --skip-email
DDEV’s ddev pull integrates with hosting providers to sync from staging, which becomes essential at agency scale:
ddev pull pantheon --skip-files
# or for custom syncs:
ddev pull provider --skip-files
Playground blueprint template
A starter blueprint for a bug-reproduction scenario:
{
"$schema": "https://playground.wordpress.net/blueprint-schema.json",
"preferredVersions": {
"wp": "6.8",
"php": "8.2"
},
"features": {
"networking": true
},
"steps": [
{
"step": "login",
"username": "admin",
"password": "password"
},
{
"step": "installPlugin",
"pluginData": {
"resource": "url",
"url": "https://your-host/my-plugin-pr-build.zip"
},
"options": { "activate": true }
},
{
"step": "runPHP",
"code": "<?php require_once '/wordpress/wp-load.php'; update_option('myplugin_test_pending_count', 7);"
},
{
"step": "setSiteOptions",
"options": {
"timezone_string": "America/New_York"
}
},
{
"step": "goTo",
"url": "/wp-admin/index.php"
}
]
}
Run it locally to verify:
npx @wp-playground/cli@latest server \
--blueprint ./qa/blueprints/issue-707-pending-badge.json \
--blueprint-may-read-adjacent-files \
--port 9707
Open http://127.0.0.1:9707/wp-admin/ and the scenario is staged. More on the workflow patterns around blueprints in §7.
6. Testing patterns
A real local environment makes a half-dozen testing patterns cheap. Here’s what each is for and how to set it up against the stack above.
Unit tests with PHPUnit + Brain Monkey
For pure-logic tests that don’t need WordPress loaded, use Brain Monkey with PHPUnit. WordPress functions are stubbed; tests run in milliseconds against your composer autoload:
composer require --dev brain/monkey phpunit/phpunit
Brain Monkey lets you assert on add_filter/add_action calls without booting WordPress. These tests run constantly while you’re coding because they’re nearly free. They’re also the only tests an LLM-driven agent can usefully run in a feedback loop without spinning up an environment.
Integration tests via wp-env’s tests-cli container
For tests that need real WordPress, wp-env ships a separate tests-cli container with a clean WordPress test database:
npx wp-env run tests-cli wp test # custom test runner
# or directly:
npx wp-env run tests-cli ./vendor/bin/phpunit
The test database resets between runs, so your tests can freely create posts, users, options. The first time you run this, your phpunit.xml.dist needs to load the wp-env test bootstrap:
<phpunit bootstrap="tests/bootstrap.php" colors="true">
<testsuites>
<testsuite name="integration">
<directory>tests/integration</directory>
</testsuite>
</testsuites>
</phpunit>
And tests/bootstrap.php loads the WordPress test environment that wp-env provides:
<?php
$_tests_dir = getenv('WP_TESTS_DIR') ?: '/wordpress-phpunit';
require_once $_tests_dir . '/includes/functions.php';
tests_add_filter('muplugins_loaded', function () {
require dirname(__DIR__) . '/your-plugin.php';
});
require $_tests_dir . '/includes/bootstrap.php';
Manual QA against the dev WordPress
The dev WordPress at http://localhost:8888 is the right place to click around and see your changes. Add Query Monitor as a mu-plugin via your .wp-env.json mappings so you can inspect database queries, hooks fired, and HTTP requests on every page load.
Cross-browser checks via Playwright
For UI checks that span Chrome, Safari, Firefox, and Edge — a standard QA matrix in the WordPress world — Playwright is the right tool:
npm install -D @playwright/test
npx playwright install
A test that verifies the admin sidebar renders correctly:
import { test, expect } from '@playwright/test';
test('admin sidebar has no horizontal scroll', async ({ page }) => {
await page.goto('http://localhost:8888/wp-admin/');
await page.fill('#user_login', 'admin');
await page.fill('#user_pass', 'password');
await page.click('#wp-submit');
await page.goto('http://localhost:8888/wp-admin/admin.php?page=my-plugin-hub');
const sidebar = page.locator('#adminmenu');
const scrollWidth = await sidebar.evaluate(el => el.scrollWidth);
const clientWidth = await sidebar.evaluate(el => el.clientWidth);
expect(scrollWidth).toBeLessThanOrEqual(clientWidth);
});
Same script runs in CI. When you have an AI agent on the team, the agent can author Playwright tests for the changes it makes, run them locally, attach screenshots to the PR.
Visual regression and agent-driven screenshot review
A useful pattern that’s emerged once you have AI agents in the loop: instead of asserting on DOM structure, assert on what the page actually looks like. Playwright has built-in visual regression via expect(page).toHaveScreenshot() — first run captures the baseline image, subsequent runs fail if pixels drift beyond a threshold. The diff image lands in the test report, which the agent can attach to the PR as evidence:
test('Aggregator admin badge renders correctly', async ({ page }) => {
await page.goto('http://localhost:8888/wp-admin/');
// ... login + setup ...
await expect(page.locator('#adminmenu')).toHaveScreenshot('admin-menu-with-badge.png');
});
The second pattern is even more useful: have the agent review its own screenshots using multimodal vision. Claude (and similar multimodal models) can read screenshots and judge whether the rendered UI matches the intent. The workflow:
- Agent implements a UI change
- Agent runs a Playwright script that captures key screenshots at each state
- Agent passes those screenshots to itself (or to a reviewer agent) and asks “does this match what we expected?”
- Agent attaches the screenshots to the PR along with its own assessment
This is what closes the loop on “the agent verified its own visual work.” Combined with the pixel-level regression catch from toHaveScreenshot(), you get both kinds of coverage: small drift caught by the baseline comparison, semantic correctness checked by the vision review.
You don’t need a video recording tool for this. Screenshots are easier to inspect than video frames, faster to capture, and what multimodal models read natively today.
Database state inspection
Two patterns. For ad-hoc queries, wp eval against the running container:
npx wp-env run cli wp eval 'global $wpdb; var_dump($wpdb->get_results("SELECT * FROM {$wpdb->prefix}options WHERE option_name LIKE \"myplugin_%\""));'
For ongoing development, install Query Monitor (wp plugin install query-monitor --activate) and use it from the admin bar. Both work for humans; only the WP-CLI path works for agents, which is fine — that’s the one that scales.
Snapshot / reset workflows
The fast cycle is wp-env start once, npm run wp:reset when you need a clean slate, wp-env stop when you’re done for the day. The reset is destructive — bring back the demo state with your bootstrap script.
For finer-grained snapshots within a session, dump and restore the database:
npx wp-env run cli wp db export /tmp/snapshot.sql
# ... do destructive things ...
npx wp-env run cli wp db import /tmp/snapshot.sql
7. Sharing reproductions with Playground blueprints
This is the section that justifies its weight in gold. WordPress Playground blueprints have changed how my team thinks about bug reports.
The old workflow
Customer reports a bug. Support writes back asking for the WP version, the PHP version, the active plugins, the theme, the steps to reproduce. Customer answers in pieces over three days. Developer eventually has enough to attempt a reproduction. Developer’s local environment doesn’t quite match, so the bug doesn’t reproduce. Two more rounds. Eventually the bug reproduces. The fix takes 20 minutes; the reproduction took six days.
The blueprint workflow
Customer reports a bug. Support sends them a Playground blueprint URL with the plugin and the steps pre-staged. Customer clicks, hits the same broken state, confirms “yes, that’s exactly what I see.” Total time: two minutes. Developer pulls the same blueprint, fixes the bug, ships a new blueprint in the PR that demonstrates the fix from a clean install. QA opens the new blueprint, verifies, signs off.
This works because a blueprint captures every dimension of the environment — WP version, PHP version, plugin versions, settings, content, demo data — in a single JSON file. The file is the reproduction.
Conventions I use
In every WordPress plugin repo, I add a qa/blueprints/ directory. One blueprint per bug class. Naming: issue-NNN-short-description.json where NNN is the GitHub issue number. Each blueprint contains a comment block at the top describing what scenario it stages and which bug it demonstrates.
Every PR that has user-visible behaviour ships a blueprint. The PR description links it. Reviewers run it. If a reviewer’s experience doesn’t match what the blueprint shows, that’s a clear signal something is environment-dependent and needs investigation.
The sharing problem
Blueprints reference plugin zips by URL. For public plugins, you can host them anywhere (GitHub releases, a CDN, the plugin’s own download URL). For private/in-development builds, you need the zip somewhere the Playground iframe can fetch. GitHub Actions artifacts are behind auth, which Playground can’t follow. Three workable patterns:
- GitHub Releases, even for in-development versions. Create a pre-release, attach the build zip, blueprint references it.
- Cloudflare R2 / S3 with a short-lived signed URL. Cheap, fast, ephemeral.
- A small Cloudflare Worker redirector at a domain you control (we use
qa.wprssaggregator.comfor this), which points to the right artifact and lets you swap targets without changing the blueprint.
Customer-facing repros
The blueprint pattern works in reverse too. Customers can send you their environment as a blueprint. Make a “Report a bug” form that asks for the WordPress version, plugin versions, and a description, then auto-generates a blueprint URL the customer can verify before sending. You’ve effectively turned the customer into a reproducer of their own bug.
8. CI integration
The local environment definition that runs on your laptop should run identically in GitHub Actions. With wp-env or DDEV, this is essentially free.
wp-env in GitHub Actions
A workflow that runs PHPUnit on every push:
name: tests
on: [push, pull_request]
jobs:
phpunit:
runs-on: ubuntu-latest
strategy:
matrix:
wp: ["6.7", "6.8", "trunk"]
php: ["8.1", "8.2", "8.3"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20" }
- name: Configure wp-env for matrix
run: |
jq --arg wp "${{ matrix.wp }}" --arg php "${{ matrix.php }}" \
'.core = "WordPress/WordPress#\($wp)" | .phpVersion = $php' \
.wp-env.json > .wp-env.tmp && mv .wp-env.tmp .wp-env.json
- name: Start wp-env
run: npx @wordpress/env@latest start
- name: Run PHPUnit
run: npx @wordpress/env@latest run tests-cli ./vendor/bin/phpunit
That’s it. The matrix runs your tests against every WP × PHP combination in parallel. The .wp-env.json you use locally is the same one CI uses; the only difference is matrix overrides.
Playwright in GitHub Actions
For cross-browser UI tests, add a separate job:
playwright:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20" }
- run: npm ci && npx playwright install --with-deps
- run: npx @wordpress/env@latest start
- run: npm run bootstrap # your wp-cli setup
- run: npx playwright test
- uses: actions/upload-artifact@v4
if: failure()
with: { name: playwright-report, path: playwright-report/ }
The Playwright report uploaded on failure gives you the videos and screenshots of what went wrong. Combined with blueprint-based bug reports, you have a complete loop: reproduction in CI → identical reproduction locally → fix → verify in CI.
Linking blueprints in PR comments
When a CI build produces a build artifact (a built plugin zip), a small workflow step can generate a blueprint URL that loads that exact build into Playground, and post it as a PR comment. Reviewers click, see the scenario, sign off. I’ve seen this pattern transform review cycles.
9. The AI-first stack at scale
Walking through what a day in the life of an AI agent on the team looks like, against the stack above.
A customer reports a bug to support: “Imported feed items show the wrong publish date — 5 hours earlier than the actual date — on my site running UTC-5.” Support tags the ticket with a draft GitHub issue. The AI agent assigned to triage reads the issue.
The agent reproduces the bug. It runs pnpm wp:up against the plugin repo, which gives it a fresh WordPress install with the plugin active and timezone pre-set to America/New_York. It uses WP-CLI to import the customer’s reported feed and triggers a fetch. Then it queries the database for the imported items’ post_date_gmt and post_date, compares them to what the source feed actually contained, and confirms the discrepancy.
The agent identifies the root cause by reading the relevant code. The conversion from feed pubDate to WordPress post_date is happening twice — once in the import handler and once in the display layer. The fix is a one-line change in the import handler.
The agent writes a Playground blueprint at qa/blueprints/issue-NNN-timezone.json that stages the bug from a clean install. It runs the blueprint locally, captures screenshots showing the wrong date, attaches them to the PR. It applies the fix on a new branch, re-runs the blueprint with the fix applied (via installPlugin pointing at the fresh build), confirms the date is now correct. It writes a regression test in tests/integration/test-timezone-handling.php that fails without the fix and passes with it. It pushes the branch, opens the PR with:
- A link to the Playground blueprint that reproduces the bug
- A link to a second blueprint that demonstrates the fix
- Screenshots of before and after
- The regression test
- A QA checklist for human reviewers
The PR triggers CI. wp-env runs PHPUnit across the matrix; Playwright runs the cross-browser checks. Everything passes. Gaby on the QA team opens the blueprint, sees the same bug the agent saw, opens the fix blueprint, sees it’s gone, ticks off the checklist, approves.
The PR merges. The customer gets an email pointing to a third blueprint that demonstrates the fix on their reported scenario. They reply: “Confirmed, thank you.”
End to end, no clicks from the developer side. The whole loop runs on infrastructure that costs nothing beyond a Docker daemon and a few hundred lines of YAML.
This is what an AI-first local WordPress dev stack makes possible. None of it is hypothetical. I run this every day on a real production plugin and I’m building the platform that makes it available to other teams at AgentVania.
10. Common pitfalls
A grab-bag of gotchas, hard-won.
Docker file permission weirdness on macOS. The user inside a wp-env container is www-data (UID 33), but files mounted from your Mac are owned by your user. Plugins that create files at runtime (caches, logs) sometimes write them as www-data and your editor then can’t modify them. Fix: set the directory’s group sticky bit, or chmod -R 777 for dev environments (don’t do this in production).
SQLite ≠ MySQL. Studio and browser Playground use SQLite by default. The new SQLite driver handles most MySQL syntax via AST translation, but advanced queries (window functions, certain JOIN patterns, MySQL-specific functions) can behave differently. If you’re testing a query that relies on MySQL-specific semantics, run it against wp-env (real MySQL) before trusting the SQLite result.
wp-env shell quoting. npx wp-env run cli sh -c "command1 && command2" mangles compound commands because of how the args get parsed. For multi-step shell work, drop down to Docker directly: docker exec -w /var/www/html wordpress-1 bash -c "...".
Plugin licensing in Playground. Most premium plugins use a licensing layer (Freemius, EDD, custom) that needs to phone home. Playground can’t do that reliably (network blocked, ephemeral domain, no persistent license storage). For dev environments, either (a) use the plugin’s built-in dev-license mechanism if it has one, (b) inject a mu-plugin that short-circuits the license check, or (c) accept that some premium features won’t be testable inside Playground and use wp-env for those.
Onboarding wizards. Many plugins show an onboarding wizard on a fresh install. In a scripted/Playground environment, the wizard masks the plugin’s actual UI. Pre-set whatever option the plugin uses to mark onboarding complete (myplugin_version, acme_setup_complete, etc.) in your blueprint or bootstrap script.
OrbStack first-launch permissions. On fresh Mac installs, OrbStack needs system permissions on first launch (network extension, virtualisation entitlements). Open the app once from the Applications folder, click through the prompts, then your scripts will work.
Localhost vs host.docker.internal. From inside a wp-env container, your Mac’s localhost is not the container’s localhost. To reach a service running on your Mac (e.g., a local SMTP server), use host.docker.internal instead.
The Mac/Windows/Linux split. wp-env works on all three but startup time and IO performance vary significantly. Linux is fastest, OrbStack-on-Mac is good, Docker-Desktop-on-Mac is slowest, Windows-WSL2 is in between. If you’re hiring, this matters.
What to install today
If you’re starting fresh and want the AI-first stack:
- OrbStack (
brew install --cask orbstack) - Node 20+ (you probably have this)
@wordpress/env(npm install -g @wordpress/env)@wp-playground/cli(no global install needed; usenpxper invocation)- Playwright (
npm install -D @playwright/testper repo)
Plus pick one of:
- DDEV if your work includes full WordPress sites:
brew install ddev/ddev/ddev - Local WP if you have client sites you want a friendly GUI for: download from localwp.com
That’s the toolbox. Most of the cost is in the discipline of using it, not in the install.
Where this is going
The local dev environment question is the first place the AI shift shows up in WordPress, but it’s not the last. WP-CLI is positioning itself as the agent-ready foundation for WordPress, with MCP support and the new Abilities API turning standard WP-CLI commands into something LLMs can call reliably. Hosting providers are shipping MCP servers. The whole WordPress ecosystem is shifting from “GUIs for humans” to “interfaces for both.”
The teams that picked CLI-first dev environments two years ago are walking into the AI era already set up. The teams that picked GUI-first tools are going to have to migrate. The good news is that the tools to do this well already exist, free and open-source, and have for years. You just have to choose them deliberately.
I’m shipping production WordPress code every day with AI agents on the team. At AgentVania I’m building the platform that makes that work for any team. If you’re thinking about how to bring AI into your WordPress development workflow, this stack is the starting line.
Further reading
- Local WP — Flywheel’s desktop app
@wordpress/envon npm — the WordPress core team’s CLI tool- DDEV — full-featured Docker dev environment
- WordPress Studio —
wp-now-based desktop app from WordPress.com - WordPress Playground — browser + CLI WP-on-WASM, plus the blueprint format
- Playground blueprint schema — reference for authoring blueprints
- DevKinsta — free Docker-based GUI from Kinsta
- Lando — Docker Compose wrapper
- OrbStack — Docker Desktop replacement on Mac
- Playwright — cross-browser test automation, with built-in visual regression
- WP-CLI — the command-line interface for WordPress, positioning as the agent-ready foundation
- Local WP vs Docker: When to Use Each for WordPress Development — my deeper dive on the Local-WP-vs-Docker question specifically

Leave a Reply