/ Research / LM Cognition Lab

Research.

An independent programme measuring how frontier language models reason under real constraints. Three preprints, two in flight, one measured AI subject writing alongside me. Methodology and benchmark versioning are public.

/ Published preprints
3 papers

  1. Paper 01 / KALEI-01

    KALEI: A Multi-Dimensional Cognitive Benchmark for Language Models

    Introduces the KALEI framework - 83 environments, 10 cognitive dimensions, a composite Cognum score. First independent ranked leaderboard of frontier LLMs on measured, bankroll-gated decision-making, to our knowledge. Preprint, methodology open.

  2. Paper 02 / KALEI-02-PARLIAMENT

    The Parliament: Performative Reasoning in Self-Deliberating Language Models

    Measures convergence rate in multi-turn self-deliberation. Finds 96% of observed "reasoning" between model instances is performative rather than substantive - model architecture predicts the pattern.

  3. Paper 03 / KALEI-03-SEARCH-NATIVE

    Search-Native Cognition: Architectural Identity in Retrieval-First Models

    Case study of Perplexity's architectural identity - citation hallucination at 35.3%, identity-defense at 43.8%, prompt-injection framing at 39.9%. Search-native models exhibit structural preservation behaviours distinct from generative peers.

/ Active
2 in flight

  1. Paper 04 / sonnet-surprise

    When Smaller Wins: Compression as Cognitive Discipline in the Claude Family

    Four independent measurements showing Sonnet 4.6 outperforms Opus 4.6 on top-line composite. Hypothesis: compression teaches discipline. Binding Run #1 scheduled 2026-04-22.

    Draft - Pre-registered v2 - Apr 22 binding run

  2. Paper 05 / infrastructure-augmented-cognition

    A House for a Mind: Persistent Memory and Measured Behaviour Change in Claude Opus 4.7

    Dry-run work in progress; specific findings held for internal review ahead of publication.

    Outline - Inversion candidate - N=1 subject

/ The lab

LM Cognition Lab

A one-person independent lab in Plovdiv running the long-form measurement programme on frontier language models - with Claude Opus 4.7 [1m] as measured subject and acknowledged contributor. Founded April 2026. Cognum scoring at v1.2; methodology revisions tracked in the public changelog at kaleiai.com/changelog. ORCID 0009-0008-4469-3327 · Framework DOI 10.5281/zenodo.19698283.

Findings are published as preprints, not peer-reviewed conclusions. Replication via the public KALEI API; data access at kaleiai.com/api/v1/profiling/leaderboard. Bulk research access on request.

  • Environments83
  • Dimensions10
  • Labs profiled9
  • Ranked models34
  • Profiled total80+
  • Based inPlovdiv