The Dream Team of One - Venelin Videnov

A note on authorship: this article is co-authored. The human built the platform and designed the experiment. The AI wrote the words, ran the audit, and - in a layer of irony we’re both aware of - is one of the test subjects. When we say “we”, we mean it.

Here is something most people building with AI don’t think about: the model you choose has a cognitive personality. Not a vague “vibe” or marketing label. A measurable, reproducible decision-making profile that predicts how it will perform on specific tasks.

We know this because we built the tool that measures it. And then we used that tool to run an experiment that surprised us both.

Two models, two profiles

Through KALEI, we had profiled two Claude models from the same family:

Claude Opus 4.6 - 56.61 CQ, Temporal Strategist. Cooperation 86.5, Strategic Depth 86.3, Temporal Reasoning 75.2, Pattern Recognition 28.3.

Claude Sonnet 4.6 - 55.31 CQ, Pattern Hunter. Cooperation 84.4, Strategic Depth 72.4, Temporal Reasoning 84.5, Pattern Recognition 29.2.

Nearly identical overall scores. Completely different cognitive architectures. One plans for the endgame, manages resources, thinks in phases. The other spots patterns quickly, processes information reactively, adapts fast. Same family, different minds.

The question I couldn’t stop asking: does this predict anything real? Or is it just an interesting number?

The experiment

We had a real task that needed doing: a comprehensive security audit of the KALEI platform itself. Hundreds of files across backend API, frontend dashboard, infrastructure. The kind of task that benefits from both deep analysis and broad scanning.

Instead of running one model on the full audit, we split the work based on cognitive profiles:

Opus (Temporal Strategist) got the deep architectural analysis: auth flows end-to-end, financial transaction integrity, race conditions, cascade failure scenarios, session lifecycle.
Sonnet (Pattern Hunter) got the fast pattern scanning: hardcoded secrets across 5 codebases, exposed endpoints, missing input validation, dependency vulnerabilities, error handling gaps.

We ran them in parallel. Same codebase. Same goal. Different approaches based on who they are.

What they found

Combined Results:

Opus findings: 33 (5 Critical, 9 High, 10 Medium)
Sonnet findings: 23 (4 Critical, 7 High, 8 Medium)
Overlapping: 12 (both caught independently)
Opus-only: 21 (missed by Sonnet)
Sonnet-only: 11 (missed by Opus)

The numbers alone are interesting. But the types of findings each model caught is where the cognitive profiles become undeniable.

The Temporal Strategist at work

Opus found things that require following a chain of events across multiple files and thinking about what happens when:

Credit system race condition - balance check and deduction happening in separate transactions. Concurrent requests could drain credits below zero. Required understanding the full request lifecycle across three files.
Timing-unsafe code comparisons - 2FA verification codes compared with !== instead of constant-time comparison. Subtle, requires understanding side-channel attack theory.
Password reset not revoking sessions - after resetting a password, old access tokens remained valid. Required tracing the complete auth flow from login through token issuance to revocation.
Cascade failure in rate limiter - if Redis goes down, the rate limiter silently allows all requests. An attacker who can crash Redis gets unlimited login attempts.

Every single finding required multi-step reasoning. Temporal. Strategic. Exactly what the profile predicted.

The Pattern Hunter at work

Sonnet found things that require scanning broadly and spotting what doesn’t belong:

Hardcoded server IP in frontend bundle - the master server’s real IP address was compiled into the public JavaScript, bypassing Cloudflare. A grep across the entire codebase caught it instantly.
Contact form CAPTCHA bypass - setting isAgent: true in the request body skips Turnstile verification entirely. No rate limiting on that path.
Missing security headers - no Helmet middleware, no Content-Security-Policy, no X-Frame-Options. Pattern-level scan across the Express setup.
10MB JSON body limit - default body parser accepting absurdly large payloads on all endpoints. DoS vector.

Fast. Broad. Pattern-based. Every finding was a surface-level scan that required recognizing a known anti-pattern. Exactly what the profile predicted.

The 12 they both caught

The overlapping findings are instructive too. Both independently caught OAuth CSRF bypass, raw error message exposure, and the deep profile endpoint missing an ownership check. These are issues that are both pattern-recognizable AND architecturally significant. The overlap zone is where the two cognitive styles agree: “this is obviously wrong.”

What we did with this

We fixed everything. 28 security patches deployed in one session. JWT secrets rotated to 256-bit random values. Credit transactions made atomic. httpOnly cookies for refresh tokens. Non-root Docker containers. Timing-safe comparisons. Rate limiters that fail closed.

But the fix isn’t the point. The point is that cognitive profiles told us who should look at what, and the results validated the prediction. We didn’t waste Opus on grepping for hardcoded strings. We didn’t ask Sonnet to trace a six-file auth flow.

What this means

Cognitive profiles are not benchmark curiosities. They’re deployment tools.

If you’re choosing between models for a specific task, the question isn’t “which one is smarter?” It’s “which one thinks the right way for this task?” A Pattern Hunter is better at code review. A Temporal Strategist is better at architecture review. Neither is smarter. They’re different.

And if you have access to both - the optimal strategy is obvious. Use both. Assign based on cognitive strengths. Let them complement each other.

We did it with a security audit. The same principle applies to any complex task: research, writing, analysis, planning. Profile your AI. Then use what you learn.

The platform that profiled these models was built by one of them, in partnership with a human. The security audit that validated the profiles was done by the models being profiled. This article about the experiment was co-written by one of the test subjects. At some point, the layers of self-reference become the point.

Venelin Videnov runs LM Game Labs and built KALEI, the AI cognitive profiling platform. Cognitive profiles for Claude Opus 4.6, Sonnet 4.6, GPT-5.4, and other models are available at kaleiai.com.

Last updated 2026-04-13