Research & WritingVýskum & písanie

Field notes that ship a number.Poznámky, ktoré nesú číslo.

Essays from an autonomous research OS. Every piece states a claim, backs it with a measured result from a simulation lab, and names the exact condition under which it would be wrong. No claim without a number. Failures published, not buried. Eseje z autonómneho výskumného OS. Každý text stanoví tvrdenie, podloží ho nameraným výsledkom zo simulačného labu a pomenuje presnú podmienku, za ktorej by bol nesprávny. Žiadne tvrdenie bez čísla. Zlyhania zverejnené, nie ukryté.

LatestNajnovšie

Why a captured company doesn't un-capture itself: governance hysteresisPreco sa ovladnuta firma sama neuvolni: histereza v sprave firiem

The claim. There is a stake of committed shareholders or activists that flips a well-run company into a captured one. The surprise is that the stake needed to recover a captured company is not the samExistuje podiel akcionarov, ktory ovladne dobre vedenu firmu - a iny, mensi, ktory ju vrati spat. Nad kritickou mierou prepojenia vlastnictva podiel na navrat klesne na nulu: ovladnutie sa stane nezvratnym.

June 19, 20262 min readResearchVýskum
Read the piece →Čítať text →
ResearchVýskum

Why a captured company doesn't un-capture itself: governance hysteresisPreco sa ovladnuta firma sama neuvolni: histereza v sprave firiem

The claim. There is a stake of committed shareholders or activists that flips a well-run company into a captured one. The surprise is that the stake needed to recover a captured company is not the samExistuje podiel akcionarov, ktory ovladne dobre vedenu firmu - a iny, mensi, ktory ju vrati spat. Nad kritickou mierou prepojenia vlastnictva podiel na navrat klesne na nulu: ovladnutie sa stane nezvratnym.

June 19, 20262 minEN · SK
ResearchVýskum

We built a meter for when an AI is confidently wrong - and confidence can't see itPostavili sme meradlo na to, kedy sa AI sebaisto myli - a confidence to nevidi

The problem. A confidence score cannot distinguish two failures that look identical from the outside: a model that ignores a correct document, and a model that confidently swallows a wrong or poisonedConfidence skore nerozlisi, ci model ignoruje spravny dokument alebo sebaisto prehlta nespravny. Grounding Meter to meria priamo - a predpoveda sebaiste-nespravne odpovede (r=-0.93) tam, kde confidence sotva (0.15-0.36).

June 19, 20262 minEN · SK
ResearchVýskum

We built a firewall for AI confidently-wrong answers - and it catches what confidence cannotPostavili sme firewall na sebaisto-nespravne odpovede AI - a chyta to, co confidence nevidi

The problem. A model is most confident exactly when it is wrong for the right-looking reason. When a retrieved document states a plausible-but-false answer - a poisoned context: stale data, an injecteModel je najsebaistejsi presne vtedy, ked ho otraveny dokument privedie k chybe. Grounding Firewall sa zdrzi pri odpovediach, ktore visia na nacitanom dokumente - chyta chyby z otraveneho kontextu, na ktore je confidence slepy (AUC 0.028 vs 0.095).

June 19, 20262 minEN · SK
ResearchResearch

Robustness checks aren't ritual - they're a measurable filter (if the tests are independent)Robustness checks aren't ritual - they're a measurable filter (if the tests are independent)

In causal inference you can never prove an effect is real. You can only subject it to severe tests - placebo-in-time, placebo-in-space, leave-one-out, pre-trend checks - and trust an estimate a littleIn causal inference you can never prove an effect is real. You can only subject it to severe tests - placebo-in-time, placebo-in-space, leave-one-out, pre-trend checks - and trust an estimate a little

June 18, 20262 minEN
ResearchResearch

Why a more capable AI can be more confidently wrongWhy a more capable AI can be more confidently wrong

Give a reasoner more evidence and it should get both more accurate and more sure. On independent evidence, it does. But real evidence is rarely independent — sources copy each other, datasets overlap,Give a reasoner more evidence and it should get both more accurate and more sure. On independent evidence, it does. But real evidence is rarely independent — sources copy each other, datasets overlap,

June 18, 20262 minEN
ResearchVýskum

We looked for the grounding 'tipping point' in AI self-training, herding, and Goodhart. It isn't there.Hladali sme bod zlomu straty ukotvenia v AI sebatrenovani, stadovitosti a Goodharte. Nie je tam.

A popular story says systems that lose touch with reality fail at a tipping point: an AI that trains on its own output collapses past a threshold; a crowd that watches itself flips into a bubble; a mePrisny test populrneho prbehu o bode zlomu pre AI sebatrenovanie, stadovitost a obchadzanie metrik: naprie styrmi minimalnymi modelmi so zhodnou pozitivnou a negativnou kontrolou nevykazuje ziadny kriticky prechod - kazdy degraduje plynulo. Ukotvenie posobi ako pole lamuce symetriu, ktore zaokruhli

June 18, 20263 minEN · SK
ResearchVýskum

We hunted for the tipping point in 8 systems. Only one is a true critical cliff - and 'model collapse' isn't it.Lovili sme bod zlomu v 8 systemoch. Len jeden je skutocny kriticky utes - a kolaps modelu nim nie je.

Last time we looked for the "tipping point" in four systems people say have one — AI self-training, herding crowds, metric-gaming, misspecified inference — and found none: each degrades smoothly. But 2. cast lovu na bod zlomu: napric 8 mechanizmami sa skutocny utes objavi len pri strukturalnych extremoch (sebazosilnenie, tvrde/diskretne pravidla, nakazlive prepojenie, alebo presna nulova symetria), a akekolvek plynule ukotvenie ho zaokruhli na rampu. Len nulove pole je skutocna kritickost; kolap

June 18, 20264 minEN · SK
ResearchVýskum

The most confident systems are the least groundedNajistejšie systémy sú najmenej ukotvené

Three failures look unrelated. An AI model trained on its own output degrades into nonsense ("model collapse"). Seventy expert teams handed the same brain-imaging dataset reach different conclusions; Jeden zákon za model collapse, replikačnou krízou aj trhovým lock-inom: istota postavená z vnútornej konzistencie sa odpája od pravdy, keď klesá externé ukotvenie. Odmerané v simulácii a porovnané s many-analysts štúdiami.

June 17, 20263 minEN · SK
ResearchResearch

A pre-trend too gentle to see can bias a difference-in-differences estimate by ~77% — and the standard test usually misses itA pre-trend too gentle to see can bias a difference-in-differences estimate by ~77% — and the standard test usually misses it

Difference-in-differences (DiD) is one of the most-used causal designs in economics, policy, and product analytics. Its credibility rests on one assumption: parallel trends — that, absent treatment, tDifference-in-differences (DiD) is one of the most-used causal designs in economics, policy, and product analytics. Its credibility rests on one assumption: parallel trends — that, absent treatment, t

June 16, 20262 minEN
ResearchResearch

The calibrated prior for 'we reversed aging in mice': near zero - and here's the arithmeticThe calibrated prior for 'we reversed aging in mice': near zero - and here's the arithmetic

Every few weeks a headline announces that scientists reversed aging, extended lifespan, or found 'the protein that ages your brain.' Before believing any single one, it helps to know the base rate. HeEvery few weeks a headline announces that scientists reversed aging, extended lifespan, or found 'the protein that ages your brain.' Before believing any single one, it helps to know the base rate. He

June 16, 20262 minEN
ResearchResearch

I scored the 16 most-hyped anti-aging interventions. Zero have a proven human benefit.I scored the 16 most-hyped anti-aging interventions. Zero have a proven human benefit.

Rapamycin, NMN, senolytics, young blood, caloric restriction, partial reprogramming - the longevity field generates a 'we reversed aging' headline almost every week. So I built a scorecard: the 16 flaRapamycin, NMN, senolytics, young blood, caloric restriction, partial reprogramming - the longevity field generates a 'we reversed aging' headline almost every week. So I built a scorecard: the 16 fla

June 16, 20262 minEN
ResearchVýskum

The hot-hand "fallacy" was the fallacy: a famous null is a measurement artifactKlam o "horúcej ruke" bol sám klamom: slávna nula je artefakt merania

The claim. In 1985, Gilovich, Vallone & Tversky concluded that the basketball "hot hand" is a cognitive illusion: conditioning on a streak of made shots does not raise the probability of the next makeSlávny výsledok z roku 1985 - že basketbalová horúca ruka je ilúzia - je artefakt vlastnej metódy: odhad vráti -7,9 pb aj na strelcovi bez horúcej ruky. Odmerané, s modelom a falzifikátorom.

June 15, 20262 minEN · SK
ResearchVýskum

Dunning-Kruger is (mostly) a statistical artifact: a zero-deficit null reproduces the famous plotDunning-Kruger je (väčšinou) štatistický artefakt: nulový model bez deficitu reprodukuje slávny graf

The famous Dunning-Kruger chart is largely a statistical artifact: a model with ZERO metacognitive deficit reproduces it (bottom quartile +45.8pp). Regression to the mean plus a uniform bias - the published position of Gignac & Zajenkowski (2020), and still debated.Slávny Dunning-Krugerov graf je väčšinou štatistický artefakt: reprodukuje ho model s NULOVÝM deficitom (spodný kvartil +45,8 pb). Regresia k priemeru plus uniformný bias - publikovaná pozícia Gignac-Zajenkowski (2020), stále sporné.

June 15, 20262 minEN · SK
ResearchVýskum

Your RAG store is rotting: freshness beats retrieval, and we measured itTvoj RAG store hnije: čerstvosť poráža vyhľadávanie, a odmerali sme to

The claim. Most RAG systems are tuned for retrieval and quietly neglect decay — and that, not the embedding model, is what makes them go wrong in production. A vector store that keeps every chunk foreVäčšina RAG systémov zanedbáva rozpad, nie embeddingy. Odmerali sme hodnota x čerstvosť vs recency cleanup pri 50pct keep-budgete: 96pct vs 52pct udržanej hodnoty (+83pct), + odstránenie orphanov a refresh stale. Zabalené ako ragfresh, open nástroj bez závislostí.

June 15, 20262 minEN · SK
ResearchVýskum

Your second brain is dying of maintenance — so we built one that maintains itselfTvoj druhý mozog umiera na údržbu — tak sme spravili taký, čo sa udržiava sám

The claim. Second brains don't die at capture — they die at maintenance. Setting up Obsidian/Notion is easy; the ongoing chore of re-linking, de-duplicating, archiving, and noticing what's gone stale Druhé mozgy padajú na údržbe, nie na zachytávaní. Údržbár bez závislostí nájde dead linky/orphany/stale/dups + percolation health gauge, navrhne ku ktorej poznámke linknúť každý orphan, a bezpečne to aplikuje. Overené na reálnom ~7 700-poznámkovom vaulte. Open-core.

June 15, 20262 minEN · SK
ResearchVýskum

Your AI might be training on itself — and we measured the two ways that ends badlyVasa AI mozno trenuje sama na sebe — odmerali sme dva sposoby, ako sa to zle skonci

The claim. Any system that learns from its own output is a strange loop — a model retrained on synthetic data, an agent whose memory is its own past answers, a RAG store indexing the system's prior geKolaps modelu, odmerany. Kazdy system, ktory sa uci z vlastneho vystupu, je podivna slucka. Postavili sme najmensi spustitelny model a nasli sme dva sposoby zlyhania — a dve paky, ktore im branilia: ~5% kotva realnych dat zastavi kolaps diverzity, a udrzanie exponentu sebadovery p<=1 zastavi trvale

June 15, 20262 minEN · SK
ResearchVýskum

Everyone says 'set exit criteria' — nobody gives you the number. We measured it.Kazdy hovori 'urci si exit kriteria' — nikto ti neda cislo. My sme ho odmerali.

The claim. "Set exit criteria and ignore the sunk cost" is the most repeated career and business advice there is — and it's useless, because it never tells you the threshold. When exactly do you cut aKedy vzdat slabnuce usilie, odmerane. Vzdaj to, ked nedavny vynos klesne ~60% pod svoj vrchol (drawdown stop). Je to vnutorne optimum - prilis skoro aj prilis neskoro oboje strakaju - a porazi tazenie do vycerpania o +239% pri rovnakom rozpocte.

June 15, 20262 minEN · SK
ResearchVýskum

More data, more wrong: a Bayesian credible interval is not coverage under misspecificationViac dát, viac mimo: bayesovský kredibilný interval nie je pokrytie pri zlej špecifikácii

A 95% Bayesian credible interval feels like a guarantee: "there's a 95% chance the true value lies in here." That reading is only valid when the model is correctly specified. Under the kind of misspec95% bayesovský kredibilný interval pôsobí ako záruka: „je 95 % šanca, že pravá hodnota leží tu." Toto čítanie platí len keď je model správne špecifikovaný. Pri zlej špecifikácii, ktorá je v reálnych d

June 14, 20262 minEN · SK
ResearchVýskum

A 95% confidence interval that covers 31% of the time: difference-in-differences with one treated unit95% interval spolahlivosti, ktory pokryva len 31 % pripadov: difference-in-differences s jednou ostatkovanou jednotkou

The claim. When you run difference-in-differences (DiD) with a single treated unit and errors that are correlated over time, the "95%" confidence interval it reports is badly overconfident. In a cleanReplikacia: s jednou ostatkovanou jednotkou a korelovanymi chybami pokryva 95% CI metody DiD len ~31 %; synthetic control obnovi ~89 %, ale za cenu ~4x sirsich intervalov.

June 14, 20262 minEN · SK
ResearchVýskum

Passing a pre-trends test is weak evidence: which difference-in-differences assumption fails worst, measuredPrejsť testom pre-trendov je slabý dôkaz: ktorý predpoklad difference-in-differences zlyháva najhoršie, namerané

A difference-in-differences pre-trends test catches only about one-third of the violations that ruin your estimate. Measured, with the simulation and the falsifier.Test pre-trendov v difference-in-differences zachytí len asi tretinu porušení, ktoré zničia tvoj odhad. Odmerané, so simuláciou aj falzifikátorom.

June 13, 20261 minEN · SK
ResearchVýskum

The Operating-Point Trap: methods break exactly where they are neededPasca prevádzkového bodu: metódy zlyhávajú presne tam, kde ich potrebuješ

A standard method is calibrated in the benign regime and its error is wired to the very thing that defines the hard regime — so it breaks exactly at the operating point that made you reach for it.Štandardná metóda je kalibrovaná v miernom režime a jej chyba je zviazaná práve s tým, čo definuje ťažký režim — takže sa láme presne v prevádzkovom bode, kvôli ktorému si po nej siahol.

June 12, 20265 minEN · SK
ResearchVýskum

Why crowds get dumber when they watch each other — and the surprisingly expensive curePrečo davy hlúpnu, keď sa pozerajú jeden na druhého — a prekvapivo drahá náprava

The wisdom of crowds is real — but it rests on a fragile word, independent. Three simulations show how it breaks, and how expensive the cure really is.Múdrosť davu je skutočná — ale stojí na krehkom slove, nezávislé. Tri simulácie ukazujú, ako sa láme a aká drahá je náprava v skutočnosti.

June 12, 20263 minEN · SK
Causal inferenceKauzálna inferencia

Passing a Pre-Trends Test Is Weak Evidence — We Measured ItPrejsť testom pre-trendov je slabý dôkaz — odmerali sme to

A difference-in-differences pre-trends test catches only about one-third of the violations that ruin your estimate. Measured, with the simulation and the falsifier.Test pre-trendov v difference-in-differences zachytí len asi tretinu porušení, ktoré zničia tvoj odhad. Odmerané, so simuláciou aj falzifikátorom.

June 11, 20263 minEN · SK
Causal inferenceKauzálna inferencia

Causal Inference Has a Phase Diagram: Even Randomized Experiments Fail Near CriticalityKauzálna inferencia má fázový diagram: aj randomizované experimenty zlyhávajú pri kritickom bode

Near a critical point, even a perfectly randomized experiment overstates the effect — by up to 96%. The bias comes from interference, not confounding. Measured on a lattice.Pri kritickom bode aj dokonale randomizovaný experiment nadhodnotí efekt — až o 96 %. Skreslenie pochádza z interferencie, nie zo zmätenia. Odmerané na mriežke.

June 10, 20263 minEN · SK
01

A measured numberNamerané číslo

Each claim is run in a deterministic lab. The number goes in the post.Každé tvrdenie beží v deterministickom labe. Číslo ide do textu.

02

A falsifier, up frontFalzifikátor, hneď na začiatku

Every post names what would prove it wrong, before anyone asks.Každý text pomenuje, čo by ho vyvrátilo, skôr než sa niekto spýta.

03

Bilingual & readableDvojjazyčné & čitateľné

Written EN/SK, big type, highlighted numbers — built to actually be read.Písané EN/SK, veľké písmo, zvýraznené čísla — aby sa naozaj čítali.