A keynote on building with intention

Build
with Intention

You can ship anything now. The hard part is shipping the right thing.

By Furquan Ahmad · Product Designer at Scale

A special time to build

It has never been this easy to build.

Replit Claude ChatGPT Cursor v0

The failure

“Trash. Slop.”

It’s never been easier to build and ship. Replit, Claude and Cursor hand you the keys.
So I did. I put AutoScout on Reddit. Car people were blunt: eight features, none past 80% reliable.
I had five or six features on the go at once. It felt like progress. It was the illusion of productivity.
Shipping was right. Moving to the next feature before I understood the problem was not.

The most valuable work is often the work nobody sees.

autoscout.fyi/cars/audi-rs3

The feedback was brutal, and mostly fair.

AutoScout · r/CarTalkUK

TorqueTinkerer_91 · 1mo ago

’16 Peugeot 208 GT Line

The MOT doesn’t test for this. Half the cars you list don’t have the engines you claim. Stop lying to promote your bullshit AI app.

1 ReplyShare

ChamferedHead · 1mo ago

MOT pass rate is an incredibly poor indicator of reliability. It says more about the people who own the cars than the cars themselves.

5 ReplyShare

BoltGremlin_42 · 18d ago

Claude’s done a good job on this one. (It was not a compliment.)

7 ReplyShare

ProperBrim-08 · 1mo ago

Seat Ibiza FR

ChatGPT analysed 33 million MOT tests you mean.

3 ReplyShare

Posted to r/CarTalkUK the week we shipped. Comments unedited.

The scale of the problem

It isn’t just me · the whole industry

80% rarely or never used. My eight half-built features weren’t the exception. They were the rule. Source: Pendo, 2019 Feature Adoption Report. Usage measured across 615 products.

The turn

When generation becomes cheap, craft and judgement become the only real moat.

So what’s the point of building fast if it solves a problem for no one?

The slot machine in your IDE

Dopamine peaks in anticipation,
not consumption.

Vibe-coding feels productive, but the rush comes from the next prompt, not the shipped product. Here’s why the loop is so hard to leave.

01 · The pull

Every prompt is a lever pull

The hit is in walking to the freezer, not the ice cream. It is in the next prompt, not the shipped product. Each prompt is a slot lever, and intermittent good output keeps you pulling.

02 · The illusion

Productive, but in the right direction?

Every prompt fires back something that looks like work, so you feel productive. But are you going the right way at all, building something people actually want?

03 · The debt

Who maintains all this?

Each new feature is more long-term tech debt than you realise, and someone has to keep it alive. Are you even watching feature usage and adoption, or just shipping more?

Three biases keep you in the casino

Why you can’t zoom out

01 · Anchoring

The reference point

Your first idea becomes the bar. You ask ‘better than mine?’ and stop asking ‘is mine even right?’

02 · Confirmation

Weighted evidence

The signal that supports your idea is remembered. The one that contradicts it gets explained away.

03 · Sunk cost

Harder to leave

Every prompt and prototype you commit makes the idea harder to abandon.

Cal Newport · Slow Productivity

Traditional productivity is output over time.

AI inflates the output, not the judgement. A manager can’t tell judgement applied from judgement skipped. Both produce the same-shaped artifact, so pseudo-productivity wins even harder.

Newport’s answer: do fewer things · work at a natural pace · obsess over quality.

Lines of code0/week ↑

Human judgement0/week

Faster every second. The judgement is the part that never speeds up.

Jenny Wen, Anthropic · Julie Zhuo

The design process is dead.

We lead with prototypes, not mocks and docs. Hold many ideas loosely, stay anchored on the problem, not your first solution, and prune toward the one that works.

The old way

Idea → Mocks → PRD → Code → Launch → Surprise

The new way · a loop

Prototype→ Test→ Learn→ Prune↻ repeat

Prune · 01

Capability

Can today’s tech even do this?

Prune · 02

Accuracy

Will it consistently meet expectations?

Prune · 03

Speed

Fast enough for real use? Kill the idea if any answer is no.

Karri Saarinen · Co-founder, Linear

Craft isn’t a choice. It’s about being intentional about it.

Like the sushi chef refining one knife for a decade, or the maker who knows wood. The skill is not producing. It is judging, reinterpreting, challenging what the tool gives back.

Intuition is compressed experience. AI simulates output. It cannot simulate the years that tell you this output is wrong even when it looks right.

Every ring · a year of judgement

⠐⡀⢂⠐⠠⠐⡀⢀⠀⡀⢀⠀⡀⠀⡀⢀⠀⢀⠀⡀⢀⠀⢀⠀⡀⠀⡀⢀⠀⢀⠀⡀⢀⠀⢀⠀⡀⠀⡀⢀⠀⢀⠀⡀⢀⠀⢀⠀⡀⠀⡀⢀⠀⢀⠀⡀⠀⡀⢀⠀⡀⠀⡀⢀⠀⡀⠀
⠐⡀⠂⠌⠠⠁⡀⠂⢀⠐⠀⡀⠀⡁⢀⠀⢈⠀⡀⠄⠀⡈⢀⠀⡀⠁⡀⢀⠈⠀⡀⠄⠀⡈⢀⠀⡀⠁⡀⢀⠈⠀⡀⠄⠀⡈⢀⠀⡀⠁⡀⢀⠈⢀⠀⡀⠁⢀⠠⠀⢀⠁⡀⠠⠐⠀⡀
⠐⠠⢁⠂⠄⠡⢀⠂⠄⠐⢀⠀⡁⠠⠀⢈⠀⠠⠀⠄⠁⠠⠀⡀⠄⠂⠀⠄⠠⠁⠀⠄⠁⠀⠄⠀⠄⠂⠀⠄⠠⠁⠀⠄⠁⠀⠄⠀⠄⠂⠀⠄⠐⠀⠠⠀⠌⠀⠠⠈⠀⠠⠀⠐⢀⠂⠀
⠈⡐⠠⠈⠠⠁⡀⠂⠠⢈⠀⠄⠐⠀⠂⠠⠈⢀⠐⠠⠈⠠⠐⠀⡐⠠⠁⡀⠂⠄⠁⡐⠈⠠⠈⢀⠐⠠⠁⡀⠂⠄⠁⡐⠈⠠⠈⢀⠐⠈⡀⠂⠈⠄⠁⡐⠠⠈⢀⠂⠁⠄⠁⠂⠠⠐⠀
⢀⠐⡀⠂⢁⠀⡐⠠⠁⠠⢀⠐⠈⡀⠌⢀⠂⠠⠐⠀⠂⢁⠐⠀⠄⡐⠀⠐⠠⠐⠀⠄⠂⠁⡐⠀⡐⠀⠂⠠⠐⠀⠂⠄⠂⠁⡐⠀⠠⠁⡀⠄⠡⠐⠀⠄⡀⠂⠄⠐⠈⡀⠂⠁⡐⠠⠀
⠀⢂⠠⠐⠠⠀⠄⠂⠈⠄⠠⠀⡁⠠⠀⠄⠠⠁⠠⠁⢈⠀⠄⠈⡀⠄⠈⠄⢁⠠⠈⡀⠂⢁⠠⠐⠀⠌⠀⡁⠄⠡⠀⠂⠈⠄⠠⠈⠠⠐⠀⠠⠐⢀⠈⠀⠄⠐⠈⠠⠐⠀⠌⢀⠐⠀⡀
⠀⢂⠀⠂⡁⠄⠠⠁⡈⠐⢀⠂⠐⡀⠁⡐⠀⠂⢁⠐⠀⢂⠈⡀⠄⠂⡐⠈⡀⢀⠂⠄⠂⠠⢀⠂⠈⠄⠂⡀⠐⢀⠂⠁⠂⡀⢁⠐⠀⠂⠁⡐⠀⠄⠠⠁⡀⠂⢁⠐⠀⡁⠄⠂⡀⠂⠀
⠐⠠⠈⠐⢀⠐⠀⡁⠠⢈⠀⠄⠂⢀⠂⠠⠈⠐⠀⠂⢈⠀⠠⠀⡐⠀⠄⠐⠀⠄⢀⠂⠄⠁⠠⠐⠀⠂⠄⠐⠈⢀⠠⠁⢂⠀⠄⠠⠁⠈⠄⠀⠄⠂⢀⠐⠀⠐⠠⠀⠂⡀⠄⠂⠐⠀⠁
⠀⢁⠀⠁⠆⠀⠆⡀⠁⢀⠀⠆⠈⠀⡀⠁⠰⠈⢀⠁⠀⡈⠀⠁⠀⠆⠈⠀⢁⠈⠀⡀⠆⠁⠰⢀⠁⠈⠀⡈⠰⠀⡀⠆⢀⠀⡈⠀⡀⠁⠀⠆⢀⠰⠀⢀⠈⠀⢁⠀⠆⢀⠀⡈⠰⠈⠀
⠐⡀⢈⠐⠠⠈⢀⠐⠈⡀⠄⠂⠈⠄⠀⠄⠁⠠⠀⠠⠁⠠⠐⠈⢀⠂⠌⠐⠀⡀⠂⠄⠠⠈⠄⠠⠀⡁⠂⢀⠐⠀⠠⠐⠀⠠⠀⠂⠀⠄⠁⢠⡶⠶⣾⡦⠀⢈⠀⠄⠐⡀⠄⠀⡐⠠⠀
⠀⡐⠠⠐⠀⠂⠠⠈⠐⠀⠄⢈⠠⠀⢁⠠⠈⠀⠌⠀⠄⠁⠠⠈⠀⠄⡀⠌⠀⠄⠐⡀⠁⠄⠐⠠⠐⠀⡐⠀⠌⠀⡁⠄⠈⡀⠄⠁⡀⠂⣨⠟⠀⣸⣿⠃⡀⠂⠠⠈⠀⠄⡀⠁⠠⠀⠄
⠠⠐⢀⠈⠄⢁⠠⠈⡐⠈⠀⠄⢀⠂⠄⠐⠀⡁⠠⠈⢀⠈⡀⠄⠁⠠⠀⠠⠈⢀⠐⠀⠂⢈⠀⠂⠠⠁⠀⢂⠀⡁⠀⠄⠂⠀⠄⠂⠀⣴⠏⠀⢀⣿⡟⠀⢀⠐⠀⠂⢁⠀⠄⠂⢁⠠⠀
⠀⠌⡀⠐⠈⡀⢀⠂⠀⠌⢀⠈⠀⡀⢄⣂⣄⡀⠄⠁⡀⠂⠄⡀⠌⠀⠄⠁⠐⠀⠠⠈⠀⠄⠂⠁⡐⠈⠀⠄⠂⠀⡐⠀⠠⠁⢀⠐⣼⠃⠀⠀⣸⣿⠃⢀⠠⠀⢈⠐⠀⠄⠂⠐⢀⠠⠀
⠀⠂⠄⠁⢂⠠⠀⠄⠡⠀⠂⠀⠁⠺⢿⣏⠉⠛⠲⢦⣀⡠⠀⠀⠄⠁⠂⡈⠄⢈⠀⡁⠈⡀⠄⠁⡀⠄⠁⠂⢀⠁⠀⣸⣷⡞⠲⣾⠃⠀⠀⢠⣿⡿⣆⠀⢀⠐⠀⠠⠈⠀⠄⠁⠠⠀⡀
⠈⡐⠈⢀⠂⠀⠄⡈⠠⠀⡁⠈⠀⠌⠈⠻⣷⣆⡀⠀⠈⠙⠳⠮⣤⣐⠀⢀⠐⠀⠂⢀⠐⠀⡀⠁⣀⣀⣂⣌⣤⣶⡴⠴⠿⠟⠻⠷⢶⣿⣫⣽⠿⠶⠿⢤⣤⣀⡈⠀⠄⠁⠂⠈⡐⠀⠀
⠀⡐⠈⢀⠐⠈⢀⠐⠀⢂⠀⠄⠁⠠⠈⢀⠘⢿⣿⣦⡀⠀⠀⠀⠀⠉⠛⠶⣤⣆⣬⡤⠶⠶⡞⠻⠛⠉⠉⡃⣁⡈⣼⣆⠀⠀⠀⠀⠛⣿⣿⡿⢷⣶⣶⣤⣤⣉⣽⣷⠶⠀⠈⠄⠠⠈⠀
⠀⠐⠠⠀⠌⠀⠄⠠⠁⢀⠀⠂⠈⢀⠐⢀⣀⣀⣙⣿⣿⡶⠴⠶⠒⠛⠛⠋⠉⠁⠀⢀⠀⣄⡂⣷⣰⣷⡼⣿⣻⣿⠻⠿⠀⢀⣠⣶⡿⠟⠋⠀⠀⠀⠀⠉⠉⠉⠁⠀⢀⠠⠁⢀⠂⢈⠀
⠀⠡⢀⠈⡀⠂⠈⢀⠂⠀⠄⢈⣤⠶⠛⠋⠉⠉⠉⠀⢀⠀⢀⠆⢀⣂⣀⣧⡀⣾⡦⢾⣿⢿⡧⣳⣿⣻⣇⣉⠀⣠⣤⡴⣿⡿⠛⠉⠀⠀⠐⠀⡁⠈⠐⡀⠂⡀⠂⢁⠠⠀⠂⡀⠐⠀⡀
⠀⠐⡀⠐⠀⠠⠁⡀⠠⢈⣴⣏⡁⠀⡀⠀⠀⠀⠀⠀⠸⣿⣿⣷⡀⠿⠏⢻⣗⣽⣿⣼⠿⠟⠛⠉⠉⠀⠈⠉⠛⠷⢷⣞⡁⠀⠄⠐⠀⠡⠀⠂⠀⡁⠄⠀⠁⡀⠌⠀⠠⠀⢁⠀⠌⠀⠀
⠀⠐⠀⠄⠁⠄⠐⢀⣴⡟⠛⠿⠿⠽⠿⠀⠀⠀⠀⠀⠀⢿⣿⣿⣇⠀⠀⠘⠿⣿⣿⣿⣶⣦⣤⣀⡀⠀⠀⠐⠀⠀⠀⠈⠙⠓⠶⣄⣈⠀⠄⠁⠂⢀⠠⠁⠂⢀⠐⠈⠀⠡⠀⡀⢂⠈⠀
⠀⠈⠄⠈⡀⠄⢸⣿⣻⣿⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⣿⠟⣣⣄⣲⣶⣵⠿⠿⠛⣹⣿⣿⣿⣿⣿⣶⣦⣤⣀⣀⠀⠀⠀⠀⠀⠉⠙⠶⢦⣌⣀⠀⠐⠈⠀⠐⠈⡀⠐⡀⠠⠀⡐⠀
⠀⢈⠀⠂⡀⠠⠈⠻⣿⣿⣇⡤⣦⣤⣤⣤⣶⣴⣶⣾⣿⠬⠿⠞⠛⠉⠉⠀⠀⠀⢀⠠⣿⣷⣿⣿⣿⡟⠙⢛⣿⠿⠿⣿⣶⣶⣤⣄⡀⠀⠀⠀⠈⠉⠛⠶⣤⣁⡀⠂⠀⢁⠠⠐⠀⠠⠀
⠀⢀⠂⠐⠀⡐⠀⡀⠀⠈⠉⠉⠛⠛⠛⠋⠉⠉⠉⠀⠀⡀⠠⠀⠠⠐⠀⡁⢈⠠⠀⢀⠙⢿⣾⣿⣿⠷⠖⠋⠁⠀⡀⠀⠈⠉⠛⠛⠿⢿⣷⣶⣤⣄⣀⠀⣀⣩⡿⠗⠈⠀⠠⠐⠈⠀⠄
⠀⠠⠀⢁⠂⢀⠐⠀⡁⠐⢀⠂⠀⠄⠀⡐⠀⢂⠠⠁⡐⠀⠄⠁⡀⠂⠁⡀⠠⢀⠐⠀⠠⠀⠁⠀⠀⠀⡀⠄⢈⠀⡀⠁⠂⠁⠠⢀⠀⡀⠀⠈⠉⠙⠛⠉⠉⠀⡀⠄⠁⡈⠀⠄⡁⠈⠀
⠀⠂⢁⠀⢂⠀⠐⠠⠀⠌⠀⠠⠁⠈⠄⠐⠈⠀⠄⠂⠀⠌⠠⠐⠀⠄⠁⠠⠐⠀⠠⠈⠄⠐⠈⠀⠂⠁⢀⠀⠂⠠⠀⠁⠂⠌⠀⠄⠠⠀⠂⠁⡐⠀⠂⠐⠈⠀⠄⠀⠂⢀⠁⠠⠐⠀⠁
⠀⢁⠀⠰⢀⠀⠈⠰⢀⠈⢀⠰⠀⠁⠀⠆⠈⢀⠰⠈⠀⠰⢀⠀⠆⡀⠈⢀⠰⠈⠀⠁⢀⠈⠀⠁⢀⠁⠀⠰⠈⠀⠈⡀⠁⢀⠈⠀⠆⠁⡀⠁⢀⠰⠈⠀⠁⠈⢀⠈⠰⢀⠀⠁⡀⠈⠀
⢀⠂⠈⠄⠠⠐⠈⡀⠄⠐⠀⡀⠂⠁⠄⠈⠠⠀⠠⠐⠈⠀⠄⠀⠂⠠⠐⠀⠠⠀⠌⠀⠄⠠⠈⠠⠀⠄⠂⠁⢀⠈⢀⠀⠐⠀⠠⠈⢀⠐⠀⠌⠀⠠⠀⠡⠈⢀⠂⠠⠀⠄⠐⠀⡐⠀⠂
⠀⠄⠡⠀⠡⠀⠂⠄⠐⢈⠠⠀⠌⠀⠌⢀⠁⠌⠀⠂⢁⠈⠠⠁⠈⠄⠠⠁⠄⠁⠠⠁⡀⠂⠁⠄⡐⠠⠀⠡⢀⠈⡀⠌⠐⠈⡀⠐⡀⠄⠂⠠⠈⡀⠄⠁⠠⠀⠄⡐⠠⠈⡀⢁⠠⠐⠀
⠈⡀⢂⠈⠄⡀⠡⠈⠐⡀⢀⠂⠠⠁⠂⡀⠂⠐⠈⡐⠀⡈⢀⠂⢁⠐⠀⢂⠈⢀⠁⡐⠀⠄⡁⠠⢀⠐⡀⠁⠠⠐⠀⠄⠂⢁⠀⢂⠀⡐⠀⡁⠐⢀⠐⠈⡀⠂⡐⠀⠐⡀⠐⡀⠠⠐⠀
⢀⠐⡀⠐⡀⠠⢀⠁⢂⠠⠀⠄⠁⡐⠠⠐⠈⢀⠁⠠⠐⠀⠄⠀⠂⠠⠁⢀⠂⠄⠐⡀⠄⠂⠠⠐⠀⠄⠠⠈⠄⠐⠀⠂⠄⠂⢈⠀⠄⠀⠂⠠⠈⠀⠄⠁⠠⠐⢀⠈⠄⠐⡀⠄⠐⠠⠀
⠀⢂⠀⡁⠄⠠⠀⠂⠄⠐⠈⢀⠂⠄⠠⢀⠡⠀⡈⠄⢀⠡⠈⢀⠁⠂⢈⠀⠄⠈⡀⠄⡀⠌⠀⠄⠁⠂⠄⠁⡐⠈⠠⠁⠠⢈⠀⠄⢈⠠⠁⠄⢁⠈⠠⠈⠄⠐⠠⢀⠈⠄⠀⠄⠡⠀⡀
⠀⠂⠄⡀⠂⠄⢁⠈⠠⢈⠐⢀⠀⢂⠐⠀⡀⢂⠀⡐⠀⡀⢂⠀⠂⡁⠀⢂⠈⡀⠄⠐⠀⡐⠈⡀⠌⠐⠀⢂⠀⠂⠁⠄⡁⠀⢂⠈⠀⠄⠂⠐⡀⠂⡀⢁⠐⠈⠠⠀⢂⠀⡁⠂⠐⢀⠀
⢀⠁⠂⠐⠠⠐⢀⠈⠄⠠⠀⠂⡀⠂⢀⠂⠄⠠⠀⠄⠂⠠⢀⠐⠠⠀⢁⠠⠀⠄⠠⠁⠂⠠⠐⠀⠄⠂⠁⡀⠂⠌⠀⠂⡀⠌⠀⡐⠈⢀⠂⠐⢀⠐⠀⠄⠂⢈⠐⠠⠀⠂⠠⠀⠡⠀⠀
⠄⠌⠰⠁⠆⠄⠂⠠⠈⠀⠄⠁⢀⠐⠀⠠⠐⠀⠁⡀⠂⠁⡀⠀⠂⠁⡀⠠⠈⢀⠐⠀⠁⡀⠂⠈⠀⠂⠁⢀⠐⠀⠁⠠⠀⠐⠀⡀⠌⠀⡀⠁⠠⠀⢈⠀⠐⠀⡀⠂⢀⠁⡀⠁⠐⠈⠀

The pre-flight checklist

Intention starts with finding the right direction.

We don’t anchor on the first idea and just make subtle changes from there. Find the right direction first, then build toward it.

Pilots run the pre-flight checklist. Not to fly slowly. To reach the destination they intended, in a plane that works.

Answer five questions on paper first.

The Intention Brief · 15 minutes

01ProblemWhat am I solving? One sentence a stranger could understand.e.g. Show the next trains and the right platform for the stations I actually use.

02PersonWho exactly is it for? Name five real people, not adjectives.e.g. My mum in Essex, two weekend commuters, a colleague who travels for work.

03ScopeWhat does success look like at the smallest scope?e.g. One station, the next three departures, the correct platform, refreshed live.

04QualityWhat does ‘good’ look like for this?e.g. Loads in under a second and never shows the wrong platform.

05HumanWhat part can only a human decide?e.g. Which stations matter, and what ‘reliable enough’ feels like to an anxious commuter.

Dogfooding · traintimesuk

Live the problem, every single day.

I took my lessons from AutoScout and the early versions of traintimesuk and started being intentional about what I built: fewer features, with far more focus.

I built traintimesuk for my own commute, and hit every wrong platform and failed load myself, day after day, until I knew it in my bones.

traintimesuk.co.uk

The product · live

This is traintimesuk.

The board I check every morning. Live departures, platforms and disruptions for any UK station, built to be glanced at in seconds.

traintimesuk.co.uk

One person. A small, sharp stack.

Under the hood

Frontend

React + Vite

TypeScript, Tailwind and shadcn/ui. TanStack Query for live data, React Router for the 2,600 pages.

Backend

Supabase

Postgres, edge functions and pg-cron. Caches the boards, runs the warmers, tracks API quota.

Hosting

Vercel

Static build plus one serverless proxy that fronts the edge function and keeps keys server-side.

Insight

PostHog + Playwright

Every fetch logs latency, cache tier and errors. Playwright guards the regressions.

No team, no microservices. A Vite app on Vercel, a Supabase backend, and the whole product hanging on two train-data APIs.

The tool stack was deliberately small.

Tool stack · what did the work

React + Vite

Interface · routing · static build

Supabase

Postgres · edge functions · cron

Vercel

Hosting · proxy · production deploys

PostHog

Analytics · session replay · errors

Playwright

Regression checks · browser proof

National Rail Darwin

Always-on timetable backbone

RealTimeTrains

Platform enrichment · quota guarded

Codex

Implementation agent · review partner

GitHub

PRs · history · rollback trail

Figma

Interface judgement · visual checks

I kept shipping features.

Heads-down on the roadmap

Station health

Status per station

Live disruptions and how each station was running, at a glance.

Live location

The train on the map

Real-time position of the train along its route, updating as it moved.

Date & time

Plan ahead

Future departures with date and time pickers, not just “now”.

All useful. None of it mattered while the board itself wasn’t reliable.

The whole product hangs on two APIs.

The API story

The free one · always on

Darwin (National Rail)

Times, status, destinations and most platforms. Sanctioned, effectively unmetered, ~1.7s. This is the board.

The rich one · rationed

RealTimeTrains

Exact platforms, calling points, live movement. But rate-limited to 9,000 calls a day, and slow: one call per train, 6–19s for a busy board.

Cache · 0.6s→ Darwin · 1.7s→ RTT enrich · 6–19s behind a quota circuit-breaker

At first I relied on RealTimeTrains alone. It rate-limited me, the agent quietly hammered the quota, and the board went down. The model never warned me. I found out the hard way. Now Darwin is the always-on backbone, and RTT only enriches behind a circuit-breaker.

The model narrates. The data asserts.

Anchored prompts · grounded output

I stopped asking the model to infer product truth from memory. Before it suggested a fix, I gave it the evidence: API limits, latency traces, cache paths, error rates, station edge cases, session replays and failing tests.

API quotasLatency tracesFetch errorsCache tierStation edge casesSession replaysPlaywright failuresPR diffs

The craft isn’t the prompt. It’s the evidence you refuse to let it guess.

I kept changing the prompt until it stopped guessing.

Prompt iteration · from vibe to spec

01Vague ask“Fix the platform issue.” The model gave me plausible code and a confident story, but it did not know the shape of the failure.

02Constrained task“Use these files, this failing path, this API limit, and this observed behaviour. Do not touch unrelated code.” The output got smaller and safer.

03Evidence-backed spec“Here are the PostHog traces, Playwright failure, cache tier, station examples and expected user outcome. Propose the smallest patch, then explain how to verify it.”

The prompt became the interface between my judgement and the machine.

Same prompt. Three frontier models.

Choosing for quality

For the hard fixes, I run the same anchored prompt across frontier models, measuring whether they respect the data, spot edge cases, and produce a patch I can actually verify.

Deep reasoning

Best diagnosis

Strongest at connecting RTT quota burn, cache paths, timeout spikes and the circuit-breaker design.

Codex-style agent

Best patch

Cleanest surgical diffs when the prompt includes PostHog traces, failing tests and exact files to touch.

Fast model

Best sweep

Useful for summarising logs, drafting test cases and checking copy, weaker on architectural tradeoffs.

Anchored prompts matter more than model choice. Fast-and-grounded beats clever-and-loose.

Analytics · PostHog

PostHog changed how I saw the product.

For the first time I could watch how people actually used the site, where they tapped, where they gave up, which stations they searched again and again.

And it showed me what was breaking: the failed fetches, the slow live path, the rage clicks. I stopped guessing and started fixing what the data pointed at.

eu.posthog.com/project/traintimesuk

The bugs were real. I had the data.

traintimesuk · PostHog

1.6%

of live departure fetches errored over 30 days, mostly timeouts on the slow live path.

Peak error week4.1%

Worst single fetch92s

PostHog, departures_fetch_metrics, last 30 days.

Then I shipped fixes, one PR at a time.

11 pull requests

#1Stop the burnStopped cron jobs spending the RealTimeTrains API quota; reserved it for live enrichment.

#10·11PlatformsBackfilled missing ‘TBA’ platforms after load, refresh and ‘Show more’.

#2Show more‘Show more’ now loads later trains via the RTT fallback at busy stations.

#3–8IndexableSEO: pre-rendered 2,599 station pages, upgraded schema, added pillar + FAQ pages.

#9Health + mobileRebuilt the station-health charts and fixed the mobile search header.

The errors fell.

Weekly fetch-error rate

Peak 4.1% the week of May 10. Two weeks after the platform and warming fixes merged, it hit 0.6%. The craft is ongoing. It now hovers near 2%, and the next target is the latency tail. PostHog.

traintimesuk · reliability

The guardrails that held the line.

Two changes did most of the work: a circuit-breaker on the rationed RealTimeTrains quota, and pre-warmed caches so the board loads before anyone asks.

Errors 4.1% → 0.6%. Uptime climbed and held.

guardrails.ts

Vibe coding got me to 90%. The last 10% is the job.

The limits

Complex bugs

The model runs out of road

Load-more breaks. Platforms don’t display. On the genuinely hard problems, the LLM stalls.

“Solved”

Solved isn’t solved

In plan mode it tells me the fix has landed. The next morning, the bug is still there.

Reliability

Reliability is the product

For a train app, reliability is the whole point. Vibe coding quietly trades it away. You can’t be fully reliant on the model.

So you dogfood it yourself. The last 10%, the reliability, is the hardest to close, and where I spend the most time. It’s also what makes the app good.

Build intuition, then build with intention.

The four moves AI can’t make for you

01

Dogfood

Use your own product daily and live every flaw. Watch PostHog session replays. The reps compound into instinct.

02

Research

Scrape Reddit, X and LinkedIn for real complaints. Go deeper with NotebookLM. Talk to real customers, not just friends.

03

Craft

Use the psychology: Hick’s Law, social proof. Julie Zhuo: the eye over the hand. Saarinen: craft is intentional.

04

Preview

Ship in preview like Anthropic. Set expectations and learn in public.

AI cannot decide what to build. That is your job.

Thank you

Build with intention.
Find me after.

Furquan Ahmad · Product Designer at Scale

The deck lives online

Scan to keep the slides.

traintimesuk.co.uk/designing-with-intention-build/deck

Buildwith Intention

It has never been this easy to build.

“Trash. Slop.”

The feedback was brutal, and mostly fair.

The scale of the problem

Dopamine peaks in anticipation,not consumption.

Every prompt is a lever pull

Productive, but in the right direction?

Who maintains all this?

Three biases keep you in the casino

The reference point

Weighted evidence

Harder to leave

Traditional productivity is output over time.

The design process is dead.

Capability

Accuracy

Speed

Answer five questions on paper first.

One person. A small, sharp stack.

React + Vite

Supabase

Vercel

PostHog + Playwright

The tool stack was deliberately small.

I kept shipping features.

Status per station

The train on the map

Plan ahead

The whole product hangs on two APIs.

Darwin (National Rail)

RealTimeTrains

The model narrates. The data asserts.

I kept changing the prompt until it stopped guessing.

Same prompt. Three frontier models.

Best diagnosis

Best patch

Best sweep

The bugs were real. I had the data.

Then I shipped fixes, one PR at a time.

The errors fell.

The guardrails that held the line.

Vibe coding got me to 90%. The last 10% is the job.

The model runs out of road

Solved isn’t solved

Reliability is the product

Build intuition, then build with intention.

Dogfood

Research

Craft

Preview

Scan to keep the slides.

Build
with Intention

Dopamine peaks in anticipation,
not consumption.