Posts Tagged "software-eng-auto" on Alex Leighton's Blog

Filesystems as Personal Memory

2026-03-09T13:00:00Z

Maybe plain files and git are all you need.

Published on 2026-03-09
Tags: commentary, git, llms, software-eng-auto

Daniel Phiri on 2026-02-23:
Here's my actual take on all of this, the thing I think people are dancing around but not saying directly.

Filesystems can redefine what personal computing means in the age of AI.

Not in the "everything runs locally" sense (but maybe?). In the sense that your data, your context, your preferences, your skills, your memory — lives in a format you own, that any agent can read, that isn't locked inside a specific application.

I like this vision of the future — personal data in whatever form is easiest or convenient, stored as the person chooses, arbitrary computation enabled by the natural language interface of LLMs. As I read this superb summary of the current state of coding agents intersecting with the filesystem, I had a couple thoughts.

...
Read the full post →

KBS: Fast Forward

2026-03-01T03:45:00Z

Code generation on cruise control.

Published on 2026-03-01
Tags: article, llm, ocaml, python, software-eng, software-eng-auto

On today's episode: a lot of code. The previous work prepared the project codebase to guide agents in generating good quality code, which is what we did.

Update, Resolve, Archive Commands

After the Kb_service module was broken apart, the coding agent easily generated full verticals for update, resolve, and archive commands. The process I'm following:

Work with the agent to peel off a chunk of functionality from docs/product-requirements.md.
Then write a prompts/activities/implementation-plan.md for building the functionality.
After the code is generated, I review the code, as well as the agent (prompts/activities/code-review.md), and apply changes for all issues.

Related commit: 0eb5584 — feat: add update, resolve, and archive commands

...
Read the full post →

KBS: List and Show

2026-02-26T05:00:00Z

Code generation picks up speed.

Published on 2026-02-26
Tags: article, llm, ocaml, software-eng, software-eng-auto

Today's log consists mostly of code generation. We've laid enough foundation that new functionality is easily produced.

List Functionality

The functionality for bs list is well defined after the work for the previous dev log. I think the experiment of directing the agents from the requirements document rather than my own intuition of what to tackle was successful, though perhaps only at the scale of a project like Knowledge Bases. The coding agent was able to select a reasonable chunk of functionality to peel off and turn into an implementation plan, with a different agent generating the implementation.

"Flaky" Tests

...
Read the full post →

Antirez's Z80 Experiment

2026-02-25T16:30:00Z

More research on automatic software development.

Published on 2026-02-25
Tags: commentary, llm, software-eng, software-eng-auto

More software engineering research has dropped, this time from Salvatore Sanfilippo of Redis fame, in the vein of the experiment from StrongDM and OpenAI (and my own incomplete experiment), to build a Z80 emulator.

Salvatore Sanfilippo on 2026-02-24:
I wrote a markdown file with the specification of what I wanted to do. Just English, high level ideas about the scope of the Z80 emulator to implement.

...

This file also included the rules that the agent needed to follow, like:

Accessing the internet is prohibited, but you can use the specification and test vectors files I added inside ./z80-specs.

Code should be simple and clean, never over-complicate things.

Each solid progress should be committed in the git repository.

Before committing, you should test that what you produced is high quality and that it works.

Write a detailed test suite as you add more features. The test must be re-executed at every major change.

Code should be very well commented: things must be explained in terms that even people not well versed with certain Z80 or Spectrum internals details should understand.

Never stop for prompting, the user is away from the keyboard.

At the end of this file, create a work in progress log, where you note what you already did, what is missing. Always update this log.

Read this file again after each context compaction.

...
Read the full post →

KBS: Nesting and Ideation

2026-02-24T14:30:00Z

More scaffolding before future work.

Published on 2026-02-24
Tags: article, llm, ocaml, software-eng, software-eng-auto

Today's changes follow on from the previous post's "next steps": nesting and ideation.

Nesting

I worked with the agent to find code with deep nesting, decide how to flatten it, and generalize the approach into suggestions for a nesting principle. We then applied that principle across the codebase, and I think we were successful as the diff in the refactor commit is noticeably flatter.

4. Nesting Depth

Keep expressions nested at most two or three levels deep. Deeply nested code is hard to follow because the reader must hold every enclosing context in their head at once. When nesting grows, treat it as a signal that the code should be restructured.

Approaches for reducing nesting:

Use monadic result operators (let*, let+). ...

Extract resource-management wrappers. ...

Consolidate error mapping into named functions. ...

Factor repeated control-flow shapes into helpers. ...

...
Read the full post →

KBS: Adding Todos

2026-02-20T03:45:00Z

More guidance, and thoughts on automatic review.

Published on 2026-02-20
Tags: article, code-review, llm, ocaml, software-eng, software-eng-auto

Today's changes build on the previous commit's unpacking of Todo, making a repository for Todos, and hooking it up to the command line to provide bs add todo ....

More Guidance

The implementation plan forgot to add a task for integration tests, which resulted in a bin principle saying changes to bin, where we're only keeping command-line parsing and top-level orchestration, should always be accompanied by an integration test. I also had the agent put together an architecture document guiding the structure of integration tests and their purpose.

In the hopes of keeping file sizes down, I added a line length principle to bin and lib. I'm not sure about these principles, because I think this can be automated as a lint. I will keep these in mind and improve the linting situation when I get another structural/syntactic principle that can be automated.

...
Read the full post →

KBS: Unpacking Todo

2026-02-19T04:15:00Z

Guidance, prompts, and TDD.

Published on 2026-02-19
Tags: article, code-review, llm, ocaml, software-eng, software-eng-auto

Today's change to Knowledge Bases is addressing my unhappiness at the beginning of the month: "Data modeling". I've returned to the work of unpacking Note into Todo, to make it easier to control the TypeID prefix, and prepare for making a Todo repository.

Revisiting Guidance Docs

A handy thing about coding agents: they can make such a wide-ranging change with minimal effort (see the number of changed files in the commit). Additionally, the coding principles guidance doc paid for itself — agent code review noted the duplicated validation logic for title and content now in both Note and Todo and recommended deduplication. I redirected it to encapsulate the validation into new types, a favorite software engineering technique of mine (see "Aside: Correct Construction"), as well as adjusted the guidance.

...
Read the full post →

KBS: Going Automatic

2026-02-18T05:30:00Z

Exploring a hands-off agent-driven workflow.

Published on 2026-02-18
Tags: article, git, llm, ocaml, software-eng, software-eng-auto

I've decided to use this project as a testbed for automatic development, to explore the kinds of techniques described in recent results [1] [2], and to speed up development. I'll be making the vast majority of changes to the codebase via coding agents, exploring what controls are needed to stay hands-off yet maintain code quality.

Short AGENTS.md

I've played around with longer AGENTS.md and not seen much value. Agents would ignore pieces, and after a while it felt like simply a waste of context space. I'm keeping AGENTS.md short this time around, "linking" to where more information can be retrieved, with the hope that brevity keeps the content important.

...
Read the full post →

More Software Engineering Research

2026-02-15T05:00:00Z

Strategies for guiding coding agents.

Published on 2026-02-15
Tags: code-review, commentary, llm, software-eng, software-eng-auto

OpenAI has put out yet another software engineering report on coding agent use, closer to StrongDM's Software Factory than to Gas Town.

Ryan Lopopolo for OpenAI on 2026-02-11:
Over the past five months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code.

The product has internal daily users and external alpha testers. It ships, deploys, breaks, and gets fixed. What’s different is that every line of code—application logic, tests, CI configuration, documentation, observability, and internal tooling—has been written by Codex.

Humans steer. Agents execute.

OpenAI provides a number of interesting details here that, I think, complement the practices described by StrongDM. They started by reviewing Codex's commits, and that review load shrank drastically over time as every correction was worked into a set of guiding documents that agents would automatically pick up. It sounds like they did human-to-human reviews of feature and guidance documents. The picture they paint makes a lot of sense — encoding practical software engineering standards into tooling and guidelines documents, and then routinely "garbage collecting" using agents prompted specifically to review and clean up. An interesting thing to me is that they found "rolling forward" via speedy coding agent code generation to be faster and less disruptive to the process than rolling back bugs.

Read the post →

New Software Engineering Modes

2026-02-12T16:30:00Z

Adapting to code abundance.

Published on 2026-02-12
Tags: code-review, commentary, llm, software-eng, software-eng-auto

Complementing Gas Town, a team of engineers at StrongDM have coined the term "Software Factories" (via) for a different kind of coding agent software engineering methodology. They get straight to the heart of things:

Code must not be written by humans

Code must not be reviewed by humans

Coding agents write code faster than a human can read and understand it. This has the potential to be very valuable — quickly producing working programs on its own, but also the amount of work a single engineer can ship. To sustain that speed, the code cannot be reviewed. I think most people who've worked with coding agents have seen that they can tap into the speed, but then you end up forced to slow down and stretch your code review skills. What would need to change to make full use of the speed?

...
Read the full post →

Orchestrating Coding Agents in 2026

2026-01-28T06:00:00Z

Experimental results in the agent orchestration design space.

Published on 2026-01-28
Tags: commentary, erlang, git, llm, software-eng, software-eng-auto

I am gratified to see some of my musings on the direction of coding agents are proving accurate. In September of last year I speculated that, given the non-deterministic and faulty nature of LLMs, folks might be served by adopting fault-tolerant architectures to orchestrate coding agents:

From everything I've seen, we're not yet in a situation where it's either practical or economical to execute multiple coding agents in parallel or orchestrated. However I think in a couple years the technology will be cheap enough that we'll start needing to think about how to orchestrate groups of agents, and the idea of leaning on Erlang's learnings intrigues me. Constructing the agents and their execution frameworks as individual actors for concurrent execution, while arraying some as supervisors and others as workers, seems rich for investigation.

...
Read the full post →