Dylan McGuire
June 19, 2026 · 6 min read

$ npm run e2e
Running 329 tests across 31 files...
✓ 315 passed
✗ 14 failed (selector mismatch)
42m 18s.For most of my career, end-to-end UI tests came with a heavy tax.
The tax took multiple forms. Every new test had to be authored: locating elements, wiring selectors, modeling state, handling async. Every UI overhaul required corresponding changes across the test suite, and the more thorough the suite, the more changes piled up. Every flake required triage time, and a real suite generated enough flakes that teams built dedicated dashboards and triage rules just to sort signal from noise. Each of these was tractable in isolation. The cumulative drag made senior engineers across the industry hold an opinion about how much E2E coverage was "too much." The answer was always smaller than you'd want if the tests were free.
Over a decade ago I started at Pearson VUE, which operates the infrastructure behind a lot of high-stakes certification exams: the NCLEX, the GMAT, the kinds of exams where someone's career hinges on the result. The quality bar was high, and our E2E suite reflected it. Keeping that suite functional was real work. Engineers wrote and maintained fixtures the QA team could use to author tests in semi-natural language, first on top of a product called Twist and later Gauge. I built the nightly runner that consumed on-prem compute for hours, and the dashboards that sliced results into "real product issue" vs "flake" so the suite stayed useful. I also wrote an Atom plugin that gave syntax highlighting and autocomplete for our Gauge fixture library to make the authoring experience less painful. The team's actual job was building the product, but a meaningful chunk of our time went to keeping our E2E suite from rotting.
When I moved to Amazon a few years later, the calculus was different. The products I worked on were primarily internal Amazon tools. The tax was acknowledged, and the org chose not to pay it. E2E UI tests, if present at all, were generally scoped down to smoke tests. On my team we kept exactly two: sanity checks that the main pages rendered without producing errors, and an auth flow. Everything else leaned on heavy unit coverage, API integration tests, code review, and broad production monitoring. When something tripped, you'd scan logs, check for recent changes, consult the runbook, and decide what to do; some teams had auto-rollback wired in, others triggered rollback manually. Either way, you usually reverted in minutes.
The easy thing to miss about that model is that it's a function of what kind of software Amazon ships. When most of your logic lives on servers you own and the browser is a thin client over them, you get to instrument the runtime, watch real traffic, and roll back in minutes when something goes sideways. That option doesn't exist for shipped software: desktop apps, mobile binaries, CLIs, self-hosted infrastructure that runs inside someone else's environment. The runtime is the user's machine. You can't observe most of what goes wrong, and by the time a bug report surfaces, the broken version is already in the wild.
So the industry settled into one of two failure modes. Either you paid the tax (engineering time, a triage pipeline, an ongoing maintenance burden), or you skipped E2E and pushed the cost downstream: into bugs that shipped, into manual regression days before each release, into release cadence slowing down because nobody trusts a green build that doesn't exercise the product.
Then code throughput went up 10x.
Testing didn't accelerate with it. E2E sits squarely in the gap between them: the part of testing that's still too expensive to scale up alongside the new throughput, but too important to skip. Either you slow down to test, or you ship features you haven't fully tested. Neither is sustainable.
The 10x lift can apply to automated tests too. Writing an E2E test used to mean: locate the elements, wire the selectors, model the state, handle the async, debug the flake. Most of that is now something a competent coding agent can do for you, given the right framework and the right context about your app. The tax isn't just lifted. For greenfield work and well-instrumented apps, automated E2E coverage is now cheaper upfront than the equivalent manual test plan.
Take one example from Spruce, our local-first product delivery tool. It has a project-setup wizard with two top-level flows (create a new project, import an existing one), and in both, each linked code repo can be created from scratch, linked to an existing folder on disk, or cloned from a remote. The interesting test cases are the combinations: a project where one repo is created fresh, another is linked to an existing local checkout, and a third is cloned. Testing that well means setting up filesystem state before each run (some folders pre-created with git init, others left empty, others entirely absent), driving the wizard through each variation, and tearing it all down so the next case starts clean. The cost lives in the cross-product. Each variation is straightforward on its own, but the combinations multiply, and the manual coverage has to keep up with that. With an agent that can read the wizard's source, enumerate the matrix, and write the setup and teardown helpers alongside each test, that cost collapses.
The question is the same as it ever was: does the value justify the cost of writing and maintaining the test? Both costs dropped enough that the answer is now almost always yes. A lot of teams haven't internalized that yet.
At Coniferous, we've been building the tools to do this effectively, specifically on the Tauri side, where the existing E2E story is thin. We'll have more to share soon.
If your team backed off E2E coverage because the math didn't work, the math has changed. Worth revisiting.
Work with usWe work fast at Coniferous, in part by noticing where the old methods no longer fit. If your team is figuring out what to keep, what to drop, and where the new math actually leads, we'd love to hear about it.