What I Actually Want From AI in Software Delivery: Read the Logs, Write the Fix, Let Me Approve

Written by Mel Collins | 2026 - June 8

Over the last week, I’ve been building a Flutter/Android indoor cycling app with Bluetooth heart rate monitor and trainer support.

From March 8 to March 16, 2026, this repo went from first commit to a phone-deployed app in 28 commits, with 103 files changed and roughly 13,865 lines added. The current codebase is already over 10,000 lines across app and test code.

And here’s the important part:

All of the code was written by AI.

That sounds like the future.

But the real lesson from this week is not “AI writes code.” It’s this:

AI should be in the loop after the code is written too.

Because code generation is only half the story. The other half is what happens when reality hits: logs, crashes, flaky device behavior, broken tests, missing context, and the human developer stuck stitching it all together.

That’s exactly what happened in this project.

What the repo history shows

The commit history tells a familiar story of real software work:

initial app setup
device selection
workout logic
Bluetooth improvements
dropped connection fixes
diagnostics and logging
Firebase integration
scanning refactors
repeated rounds of “working now”, “better connectivity”, and “fixed bug”

That is not failure. That is software development.

But it also shows why code generation alone is not enough.

If AI can generate the app, it should also be able to:

read the exceptions
correlate them to likely root causes
inspect the repo history
identify likely regressions
write failing tests
prepare stories
open pull requests
hand the result to a human for approval

That is the workflow I want.

Three practical examples from this project

1. Firebase Analytics was crashing on boolean parameters

We hit a runtime assertion from Firebase Analytics because event parameters included booleans like bluetooth_ready: true.

Firebase Analytics accepts strings and numbers, not raw booleans.

This was not a business-logic failure. The BLE logic was fine. The analytics wrapper was the problem.

An AI system watching logs could have:

grouped repeated exceptions
traced all analytics event calls
found the shared telemetry wrapper
converted bools to 0/1
added tests
opened a PR automatically

Instead of me reading logs, explaining the issue, locating the wrapper, and manually driving the fix.

2. BLE reconnects were entering a dangerous loop

The more serious issue was a reconnect loop after disconnects.

The app would bounce through states like:

connecting -> disconnected -> reconnecting -> connecting -> disconnected

That is exactly the kind of thing Android BLE stacks do not tolerate well.

The fix was not “retry harder”. It was to impose a proper state machine:

only one connect/reconnect in flight
serialize connection work globally
reconnect heart rate before trainer
avoid reconnecting on stale data/contact loss
add bounded backoff
suppress duplicate pending reconnects
require disconnect-before-switch when changing devices

This is a perfect example of where AI can do more than autocomplete code.

If AI had access to the logs from the start, it could have recognized the reconnect pattern, mapped it to known BLE failure modes, inspected the controller and repository layers, and prepared the remediation as a reviewable PR.

The real bottleneck is not writing code

The biggest time sink was not implementation.

It was waiting around and gathering context from different places:

Crashlytics / runtime logs
repo history
current code shape
failing tests
device state
build and deployment steps

The human becomes the integration layer.

That is exactly the part I want AI to remove.

Not the engineering judgment. Not the approval step. Not the accountability.

Just the manual glue work.

What “AI in the loop” should mean

For me, this is the right model:

AI monitors exceptions, logs, and failing tests continuously.
It clusters recurring issues and links them to likely files and commits.
It drafts a bug story with impact, repro, and probable root cause.
It writes or updates tests first where possible.
It prepares a pull request with the proposed fix.
A human reviews and approves.

That is the sweet spot.

Not “YOLO autonomous production edits”. Not “chatbot writes snippets when asked”.

Continuous AI triage plus human approval.

Why this matters even more in AI-written codebases

When AI writes a lot of code quickly, you get leverage. You also get more surface area, faster.

That means observability and feedback loops matter even more.

If a project is largely AI-generated, then every exception, assertion, and failing test should immediately feed another AI loop that tries to contain it.

That is how you support a zero-bug policy: not by pretending bugs won’t happen, but by treating every bug as an input to an automated repair pipeline.

My estimate of the value

Based on this project, I’d estimate this kind of workflow would save at least 60-90 minutes per non-trivial issue, sometimes more.

Not because the fix itself takes that long. Because the real cost is switching context, finding the right logs, locating the code path, checking history, writing tests, running the suite, and deploying again.

Multiply that across every exception and regression, and the savings become enormous.

Final thought

The future is not just AI writing more code.

The future is AI staying on the job after the code is written.

Reading the logs. Watching the tests. Preparing the fix. Opening the PR. Letting a human approve.

That is the workflow I want.

View full post