Insights | 8 West Consulting

What I Actually Want From AI in Software Delivery: Read the Logs, Write the Fix, Let Me Approve

Written by Mel Collins | 2026 - June 8

Over the last week, I’ve been building a Flutter/Android indoor cycling app with Bluetooth heart rate monitor and trainer support.

 

From March 8 to March 16, 2026, this repo went from first commit to a phone-deployed app in 28 commits, with 103 files changed and roughly 13,865 lines added. The current codebase is already over 10,000 lines across app and test code.

 

And here’s the important part:

 

All of the code was written by AI.

 

That sounds like the future.

But the real lesson from this week is not “AI writes code.” It’s this:

 

AI should be in the loop after the code is written too.

 

Because code generation is only half the story. The other half is what happens when reality hits: logs, crashes, flaky device behavior, broken tests, missing context, and the human developer stuck stitching it all together.

That’s exactly what happened in this project.

 

What the repo history shows

 

The commit history tells a familiar story of real software work:

 

  • initial app setup
  • device selection
  • workout logic
  • Bluetooth improvements
  • dropped connection fixes
  • diagnostics and logging
  • Firebase integration
  • scanning refactors
  • repeated rounds of “working now”, “better connectivity”, and “fixed bug”

 

That is not failure. That is software development.

But it also shows why code generation alone is not enough.

If AI can generate the app, it should also be able to:

 

  • read the exceptions
  • correlate them to likely root causes
  • inspect the repo history
  • identify likely regressions
  • write failing tests
  • prepare stories
  • open pull requests
  • hand the result to a human for approval

 

That is the workflow I want.

 

Three practical examples from this project

 

1. Firebase Analytics was crashing on boolean parameters

 

We hit a runtime assertion from Firebase Analytics because event parameters included booleans like bluetooth_ready: true.

Firebase Analytics accepts strings and numbers, not raw booleans.

This was not a business-logic failure. The BLE logic was fine. The analytics wrapper was the problem.

An AI system watching logs could have:

 

  • grouped repeated exceptions
  • traced all analytics event calls
  • found the shared telemetry wrapper
  • converted bools to 0/1
  • added tests
  • opened a PR automatically

 

Instead of me reading logs, explaining the issue, locating the wrapper, and manually driving the fix.

 

2. BLE reconnects were entering a dangerous loop

 

The more serious issue was a reconnect loop after disconnects.

The app would bounce through states like:

 

connecting -> disconnected -> reconnecting -> connecting -> disconnected

 

That is exactly the kind of thing Android BLE stacks do not tolerate well.

The fix was not “retry harder”. It was to impose a proper state machine:

 

  • only one connect/reconnect in flight
  • serialize connection work globally
  • reconnect heart rate before trainer
  • avoid reconnecting on stale data/contact loss
  • add bounded backoff
  • suppress duplicate pending reconnects
  • require disconnect-before-switch when changing devices

 

This is a perfect example of where AI can do more than autocomplete code.

If AI had access to the logs from the start, it could have recognized the reconnect pattern, mapped it to known BLE failure modes, inspected the controller and repository layers, and prepared the remediation as a reviewable PR.

 

The real bottleneck is not writing code

 

The biggest time sink was not implementation.

It was waiting around and gathering context from different places:

 

  • Crashlytics / runtime logs
  • repo history
  • current code shape
  • failing tests
  • device state
  • build and deployment steps

 

The human becomes the integration layer.

That is exactly the part I want AI to remove.

Not the engineering judgment. Not the approval step. Not the accountability.

Just the manual glue work.

 

What “AI in the loop” should mean

 

For me, this is the right model:

 

  1. AI monitors exceptions, logs, and failing tests continuously.
  2. It clusters recurring issues and links them to likely files and commits.
  3. It drafts a bug story with impact, repro, and probable root cause.
  4. It writes or updates tests first where possible.
  5. It prepares a pull request with the proposed fix.
  6. A human reviews and approves.

 

That is the sweet spot.

Not “YOLO autonomous production edits”. Not “chatbot writes snippets when asked”.

Continuous AI triage plus human approval.

 

Why this matters even more in AI-written codebases

 

When AI writes a lot of code quickly, you get leverage. You also get more surface area, faster.

 

That means observability and feedback loops matter even more.

If a project is largely AI-generated, then every exception, assertion, and failing test should immediately feed another AI loop that tries to contain it.

 

That is how you support a zero-bug policy: not by pretending bugs won’t happen, but by treating every bug as an input to an automated repair pipeline.

 

My estimate of the value

 

Based on this project, I’d estimate this kind of workflow would save at least 60-90 minutes per non-trivial issue, sometimes more.

Not because the fix itself takes that long. Because the real cost is switching context, finding the right logs, locating the code path, checking history, writing tests, running the suite, and deploying again.

 

Multiply that across every exception and regression, and the savings become enormous.

 

Final thought

 

The future is not just AI writing more code.

The future is AI staying on the job after the code is written.

Reading the logs. Watching the tests. Preparing the fix. Opening the PR. Letting a human approve.

 

That is the workflow I want.