Artifact-First Development

“It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the ‘progress as usual’ way, but specifically this last December. Coding agents basically didn’t work before December and basically work since.” — Andrej Karpathy, Feb 25, 2026

“You need to use AI,” your boss says. Intent on developing a new feature using AI, you use some common web tools and download coding agents, but they all struggle with brownfield software development – it can’t handle large codebases and mature repositories of code. Clearly these tools have a long way to come.

Or do they?

Introducing Artifact-First Development

Roon’s tweet regarding GPT-4o-mini being released is what first got me thinking about this paradigm:

people get mad at any model release that’s not immediately agi or a frontier capabilities improvement. think for a second why was this made? how did this research artifact come to be? what is it on the path to?
— roon (@tszzl) July 19, 2024

As we know, GPT-4o-mini led to swarms of smaller models being used in a multi-reasoning method as seen in OpenAI’s o1 and o1-mini model releases afterwards, building on Deepseek’s breakthrough findings that led to the Deepseek-R1 model.

Dex Horthy’s research at HumanLayer shows that without deliberate context management, agents blow most of their reasoning budget on understanding existing code rather than writing new code. Artifact-first development is his “frequent intentional compaction” principle applied at the project level: You give the agent a clean, small, fully comprehensible scope, extract the learnings, and start fresh.

This is not “develop a prototype and throw it over the wall” work, which is what common rapid application development frameworks tend to promote, but rather the idea of letting coding agents explore the space of a feature in a minimal artifact harness which is later then merged into a complete project via AI. This type of development would be either impossible or extremely annoying to deal with before you could use Claude Code or Codex to merge artifacts together and as such tends not to come to mind as a solution to software development woes.

Blocking the Macro and Micro

“The future belongs to those who can maintain a coherent mental model of the macro while agents handle tactical drudgery of the micro.” — Addy Osmani, Google, “The 80% Problem in Agentic Coding,” Jan 2026

Prematurely building features into a large codebase clashes macro and micro context management – both in the literal coding agent and the more important context window in your brain. Without the freedom to explore the latent space of the idea in your mind or the reasoning agent, you make it extremely difficult to develop the features you will eventually need. You clash the micro and the macro too early.

Developer wants to add a new feature or AI capability to an existing tool
Opens the codebase, immediately drowning in thousands of lines of legacy code
Spends too much effort on compatibility
This is the default instinct for every technical person: “I need to improve X, so I start with X”

Begin with the artifact.

An Example of Development

“Every plugin, every skill, every line in your CLAUDE.md occupies space in the context window.” — Ryan Spletzer, “Shedding Dead Context,” Mar 2026

Driveline Baseball frequently integrates existing technologies into its own platforms and tools, prioritizing autonomous operation, capture, and management whenever possible. It is a major anti-pattern to employ interns or full-time employees to operate machinery or programs when they could be doing significantly more valuable work. The problem, as you probably suspect, is most baseball technology is not developed with interoperability in mind, nor have their developers or executives read Jeff Bezos’s famous API mandate.

Fortunately, coding agents have now made brute forcing interoperability possible via reverse engineering techniques. This requires taste and first principles understanding of engineering by the agent operator (the human), because primitives never go out of style and help guide the agent to the ultimate destination. Knowing your basics of Digital Logic, Linear Algebra, and Data Systems/Architecture concepts are vital at this stage of building the artifact as lean as possible for re-integration into your production systems.

Recently, there was a piece of technology that was (and still is) a brilliant hardware solution to an extremely difficult problem in baseball, but had substandard controlling mechanisms for enterprise use. This controller method was great for consumer-grade use, but using it at scale broke down for any number of reasons. Frustrating baseball coaches and analysts inside of our company, I took it upon myself to merge this controller into our systems via reverse engineering – but not by starting with figuring out how it would work inside our existing tools, but by building a standalone, barely operating artifact with a terrible user interface that only existed to scaffolding up the API as it was discovered, reverse engineered, and explored by the coding agents; guided by primitives and weeks of meticulous note taking, building context handoff documents (thousands written on this alone, half of them by human hand), stenciling flowcharts (all made by human hand; captured by the agent using its VLM), and of course the frequent loop of prompting/compacting/refreshing and cross-checking with other agents (a neo-Mixture of Agents style approach) eventually got us to a working, beautiful artifact that operated in alpha status until it was ready to merge into our existing tool.

This artifact is destined for death – the repository meant to be archived as a pure library to reference back as we want to integrate its ideas into other tools at Driveline Baseball. The user interface, the data structures, the build files? All irrelevant in the end – but by freeing itself and the developer’s mind from the context of the existing tools, you can much more efficiently and effectively develop the artifact that you will use to augment your existing stack.

Skills: The Artifact’s Artifact

In my recent artifact-led development path, I realized that skills thrown off from the coding agents are their own form of meta-artifact that survive the initial development as well – able to be folded into your agents.md or claude.md files for later use.

Have your coding agents make skills, or use a meta-skill-maker like claudeception, and collect them for later use in a context-light manner.

One thought on “Artifact-First Development”

Jeremy Goins says:

April 1, 2026 at 5:54 am

I am so new to all this, all of this is sounding like a new language to me, but I really want to be able to get good or at least know what questions to ask to help me and my work.

kyle boddy