The Model Already Knows: Why AI Agents Excel at What We Struggle With

Thomas Ptacek dropped a line last week that should make every security researcher pause: "You can't design a better problem for an LLM agent than exploitation research."

Before you feed it a single token of context, a frontier model already encodes supernatural amounts of correlation across vast bodies of source code. Is the Linux KVM hypervisor connected to the hrtimer subsystem, workqueue, or perf_event? The model knows.

This isn't magic. It's pattern recognition at a scale humans can't match.

The Knowledge Is Already There

Simon Willison's recent work illustrates the same principle from a different angle. He built a tool called scan-for-secrets using README-driven development—wrote the documentation first, handed it to Claude Code, and told it to build using red/green TDD. The tool scans for API keys and their encodings across directories.

The approach worked because the model already understands:

What API keys look like
How to traverse filesystems safely
Common encoding schemes (base64, JSON escaping, URL encoding)
Testing patterns and assertion libraries

You don't teach it these things. You just describe what you want.

Why This Works for Some Problems and Not Others

Not every domain benefits equally from this "baked-in knowledge" advantage.

Where agents excel:

Security research (correlations across massive codebases)
SQL parsing and validation (syntaqlite handles SQLite's full grammar)
API client generation (patterns are well-documented and consistent)
Test-driven development (the loop is mechanical)

Where agents struggle:

Novel architecture decisions (no training examples exist yet)
Business logic that's organization-specific
Creative work requiring genuine originality
Anything requiring real-world context they can't access

The pattern is clear: agents thrive when the problem space is well-defined, well-documented, and benefits from correlating across vast datasets.

The Implication for Engineering Teams

If your job involves correlating information across large codebases, pattern-matching against known vulnerabilities, or implementing well-defined specifications—you're in the agent's wheelhouse.

The smart move isn't to compete. It's to:

Describe precisely what you want — README-driven development isn't just documentation; it's a specification language for agents
Focus on judgment calls — the decisions that require context no training set captured
Build tools that leverage agents — like Syntaqlite Playground, which wraps a Python library compiled to WebAssembly so anyone can try SQL validation in a browser

The Step Function Is Coming

Ptacek's prediction is worth taking seriously: "Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won't be a slow burn, but rather a step function."

We've seen this before. The shift from manual testing to CI/CD pipelines. The move from on-premise to cloud. Each step function made the previous approach economically noncompetitive.

The question isn't whether agents will change how we work. It's whether you'll be directing them or competing with them.

The best engineers I know aren't worried about being replaced. They're already writing the READMEs that will tell agents what to build next.

The Model Already Knows: Why AI Agents Excel at What We Struggle With

The Knowledge Is Already There

Why This Works for Some Problems and Not Others

The Implication for Engineering Teams

The Step Function Is Coming

Comments

More from this blog

Voice Agents Are Finally Real. Your Architecture Isn't.

A Million Tokens Changes Nothing If Your Agent Can't Remember Yesterday

The Line Between Vibe Coding and Production Is Dissolving

Correctness Before Corrections: What vLLM's RL Migration Teaches Us About Agent Reliability

The Line Between Vibe Coding and Production Is Dissolving

Command Palette

The Knowledge Is Already There

Why This Works for Some Problems and Not Others

The Implication for Engineering Teams

The Step Function Is Coming

Comments

More from this blog