← Hammer.ai blog
TEACHING, NOT HOARDING — PART 3

Teaching the Agents to Play

A lending world model, built one piece at a time. Each piece below is one visual + one idea.
In one line — Don't make an AI decide every loan by hand. Make it write the rulebook, check the rulebook, then let cheap rules do the work. New to this? Start right below. In a hurry? skip to the game ↓.

START HERE · PLAIN WORDS

Three words you'll need. No finance degree required.

A loan A bank gives you money for a house; you pay it back monthly, for years. The bank's job: guess if you'll actually pay it back.
An "agent" Here it just means an automated decision-maker — a piece of software that says approve / deny on each loan. It can be simple rules, or an AI.
The cast borrower wants the loan → broker finds them & passes it on → lender says yes/no → investor buys the loan. Risk flows down this chain.

→ The whole question: who (or what) should make the yes/no call — and can you trust it?

MEET THE PLAYERS — every one is an agent

Four players in the chain. Each is its own agent — an automated decision-maker with its own goal. None of them is a person; all of them are software making calls.

Borrower agent
decides: apply? how much to reveal?
Wants: the loan, at the lowest rate.
Trick: some fudge the truth to look safer.
wins: approved cheap · loses: rejected / can't pay
Broker agent
decides: pass this lead on? to whom?
Wants: a commission per loan passed along.
Tension: paid to push deals — even shaky ones.
wins: deals close · loses: pushes fraud → clawed back
Lender agent
decides: approve / deny / refer
Wants: fund good loans, reject bad ones.
Bind: too strict → loses business; too loose → eats defaults.
wins: grows safely · loses: broke / breaks rules
Bank / Investor agent
decides: how much credit? repurchase?
Wants: steady returns, no nasty surprises.
Power: a loan built on a lie gets sent back.
wins: loans pay off · loses: a wave sours at once

→ Four agent types. And each can run on any of four brains

Watch them play. Notice: a loan flows down the chain — then one defaults, and everyone reacts.

AND EACH AGENT HAS A BRAIN

Same agent, four ways to make it decide. This is the real experiment: 4 player types × 4 brains.

① Plain rules
A fixed rulebook. Free, instant, fully readable. Never breaks the rules — but never improvises.
② AI picks, rules guard
The AI chooses each move, but only from legal options. Flexible and can't cheat. Costs a little per decision.
③ AI writes the rules ★
The AI writes the rulebook once; we check it; then plain rules run it free. Best of both — the punchline of this post.
④ AI decides freely
No rails. Most flexible, most expensive, and it sometimes breaks the rules outright.

→ The game runs all four player types, and you can swap any of them onto any of these four brains. Now let's watch.

STEP 1 · SEE THE DECISION

Build an applicant yourself — drag the two dials. watch all four "brains" decide, live.

You decide who the applicant is. Notice: the rules give the same call every time; the free LLM sometimes disagrees — on the exact same person.

→ Same applicant, four brains, four different answers. Now: which one could you defend to a regulator?

Below, the actual recorded reasoning for two of them on one loan — rules land in 5 clean steps; the unconstrained LLM wanders 40+ to the same place.

Same loan, two reasonings.

→ If you can't read how it decided, you can't trust or regulate it.

What the steps mean (plain version)

income (average two years of pay) → can they afford it? (does the payment eat too much of that income — lenders call this DTI) → do they have savings as a cushion? → decision. These are the real checks a human underwriter runs; the rules just do them for free.

One loan, fine. But what happens at forty thousand? ↓

STEP 2 · COST IT AT SCALE

The rule planner is "free" — 0 tokens. So it's the cheapest, right? drag the loan count and watch.

Cost at 40,000 loans. Notice: the per-loan LLM columns explode with volume; the generated policy stays flat.

→ "Free" just moves the cost. A model is only honest once it charges for that.

So far it's a quiz. Real loans don't resolve at the closing table. ↓

STEP 3 · GIVE IT A CYCLE

Nobody sets a "boom" or a "bust" — the agents just act, and the weather (the colored bands) emerges. All four agent types, over six years:

All four agents, one world. Notice: borrower demand and lender health rise together in the boom — then they're chained, so they fall together too.
Your turn — drag the lending dial. Nobody dialed a "boom." Loosen credit and the bubble inflates itself; the looser you lend, the harder the crash.

→ A decision isn't real until it has a delayed, coupled consequence — and every agent feels every other's.

Zoom all the way in — what does one loan's whole life look like? ↓

STEP 4 · FOLLOW ONE LOAN

A broker pushes a shaky file through. Who's holding the bag when it blows up? read to the last row

The life of a defected loan. Notice: it performs for 18 months — then the loss flows back up the chain.

→ Once a defect gets put back and the commission clawed back, incentives bite.

Now the real question: can an LLM write rules this good? ↓

STEP 5 · GENERATE THE POLICY

Hand the LLM the policy and say "write the rulebook." hit 🎲 — make it write a fresh one a few times.

Free-write vs tune-and-validate. Notice: every time it free-writes raw rules it scores near-random; constrain it to tune parameters we check, and it nails 100% — every time.

→ The win isn't "more LLM." It's generate → validate → run free.

STEP 6 · THE REAL POLICIES

Real mortgages come in types (government-backed, low-down-payment, rural, rental…), each with its own rulebook. The AI writes all of them; we check them.

The rulebooks the AI generated, one per loan type. Notice: each row is a different real-world program's limits — one engine runs them all.

→ The "approve" rule literally can't fire unless the file meets the limits — so the rules can't approve something the guide forbids.

See one agency as a real GOAP rulebook + its plans
The FNMA GOAP rulebook — from the Fannie Mae Selling Guide, $0 to run
clean file 718 · DTI 36 · 8mo  →  approve_fnma
low credit 528 · DTI 66  →  deny_low_credit (approve unreachable)
flagged 719 · needs_review  →  refer_to_human

STEP 7 · WATCH IT — a 30-year LLM replay

This is a real run: 30 years of loans, every approve/deny/refer made by an actual LLM and recorded. Scrub the timeline; tap AGENTS to open any of the four — borrower, broker, lender, bank — and read its reasoning.

The Macro Arena — real-LLM underwriting, all four agents, replayed