OpenAI's Symphony project shipped with a bold claim: give your favorite coding agent a specification document and it will build a working agent orchestrator in any language. The README invited exactly that experiment. Gabriella Gonzalez, a Haskell engineer and author of the popular Haskell for all blog, took them up on it. She pointed Claude Code at the SPEC.md file, asked it to build Symphony in Haskell, and documented every result.
The implementation did not work. Multiple bugs required manual prompting to fix. Even after the fixes compiled cleanly, the orchestrator spun silently on a trivial test ticket — "create a blank git repository" — without making progress.
That outcome alone would be a useful data point. But Gonzalez went further and examined why it failed, and her findings turn the entire "specs replace code" argument inside out.
SPEC.md reads like the code it was supposed to replace
Gonzalez methodically catalogues what Symphony's specification actually contains. It includes prose dumps of database schemas (field names, types, nullability). It includes backoff formulas written in pseudocode (delay = min(10000 * 2^(attempt - 1), agent.max_retry_backoff_ms)). It includes a section explicitly labeled "Cheat Sheet" that the document admits is "intentionally redundant so a coding agent can implement the config layer quickly." And it includes a full reference implementation section with function signatures, state dictionaries, and control flow written in a language-agnostic pseudocode that is, functionally, code.
The SPEC.md file clocks in at roughly one-sixth the length of Symphony's included Elixir implementation. That ratio matters. The promise of spec-driven development is that specifications are simpler than code — that writing a spec is a cheaper, higher-leverage activity than writing the implementation. A spec that is already one-sixth the size of the codebase and still growing toward it has not escaped the complexity; it has redistributed it into a format with less tooling, no compiler, and no test suite.
Gonzalez invokes Dijkstra's 1979 observation on the fantasy of communicating with machines in natural language: "We have to challenge the assumption that this would simplify man's life." Greek mathematics stalled because it stayed verbal. Modern mathematics only accelerated when it embraced formal symbolism. The parallel is direct — you cannot make a specification precise enough to generate reliable code without the specification converging on code.
The YAML test: even mature specs produce non-conforming implementations
One of the strongest pieces of evidence in the post has nothing to do with Symphony. Gonzalez points to the YAML specification — one of the most detailed, widely-used, community-tested specs in software — and notes that the vast majority of YAML implementations still do not fully conform to it. YAML has a formal conformance test suite. It has decades of iteration. Implementations still diverge.
If a specification that mature and that heavily scrutinized cannot reliably produce conforming implementations, expecting a markdown document in a GitHub repo to do so is unreasonable on its face. The Symphony SPEC.md has no conformance test suite, no formal grammar, and sections that Gonzalez describes as reading like "an agent's work product: lacking coherence, purpose, or understanding of the bigger picture."
Where this breaks for teams buying agentic coding tools
A solo developer or an agency owner evaluating agentic coding tools in 2026 is hearing two pitches simultaneously. Pitch one: agents will write your code, and you manage them by writing specs. Pitch two: agentic coding makes developers more productive by handling implementation details while humans focus on architecture.
Gonzalez's analysis collapses pitch one. A specification document detailed enough to produce working code is roughly as expensive to write as the code, requires the same engineering judgment to get right, and offers no compilation, no type checking, and no automated testing to catch mistakes along the way. The Symphony experiment demonstrates a spec that was detailed and intentional — and still could not generate a working result.
Pitch two survives, but with a critical constraint that most tool marketing glosses over: the human still needs to read and evaluate the code. Dex Horthy, an engineer who builds developer tools, summarized the tension on X in response to Gonzalez's post: "A spec that is sufficiently detailed to generate code with a reliable degree of quality is roughly the same length and detail as the code itself. So don't review those things, just review the code at that point." His conclusion — find a way to steer the model before it produces thousands of lines, not after — captures the gap in current agentic tooling that no spec format has solved.
The procurement question nobody is asking
If your team is evaluating an agentic coding platform that markets spec-to-code as a core workflow, ask for the conformance data. What percentage of generated implementations pass the project's own test suite on the first run? What is the average number of human correction cycles before the output is production-ready? How does spec length scale relative to codebase size as project complexity increases?
Those questions will almost certainly not have answers yet, because the industry has not standardized how to measure spec-driven generation quality. That absence is itself informative. The agentic coding space is selling a workflow — specs in, working software out — without the instrumentation to verify whether the workflow actually works at the scale customers care about.
Gonzalez's closing line applies beyond Symphony: "Specifications were never meant to be time-saving devices." For anyone budgeting engineering hours around the assumption that specs will shrink them, that sentence is worth taping to the monitor.
