When should an AI's memory refuse to believe what it just saw?Kedy ma pamat AI odmietnut uverit tomu, co prave videla?
A corroboration gate makes an AI memory immune to arbitrarily large poison - but the same mechanism is blind to sudden real change, and it only helps for unbounded-magnitude memory, not embedding recall. Measured, with the falsifiers.Korroboracna brana spravi pamat AI imunnou voci lubovolne velkej otrave - no ten isty mechanizmus je slepy voci nahlej skutocnej zmene, a pomoze len pri pamati neohranicenych velicin, nie pri embedding recall. Odmerane, s falzifikatormi.
Give an AI agent a long-running memory and you hit a dirty question: when a new observation contradicts what is stored, is it new truth (update fast) or poison (ignore it)? Every memory system answers this implicitly through its consolidation operator — the rule that turns repeated, noisy, sometimes-adversarial observations of a fact into one stored value. We measured what that choice costs. These are deliberately minimal, fully-reproducible models (small simulations plus a check on real embeddings), not a benchmark of a production system — the goal is a law and a design rule, with every number reproducible.
The frontier nobody gets to skip
Track a value over time from noisy observations. Two failure modes pull opposite ways. Adapt fast to real change and a single adversarial spike yanks your estimate. Resist poison by averaging over a long window and you lag real change. Sweeping the decay rate of an exponential moving average (EWMA) traces a clean speed-robustness frontier. This part is textbook — the robust-statistics breakdown-point tradeoff (Huber; Hampel) and the median / trimmed-mean used in Byzantine-robust learning. We use it only as the baseline.
The coupling that isn't textbook
A corroboration gate admits a new observation into the stored estimate only if several recent observations corroborate it, and rejects isolated outliers. In a minimal model it does something the frontier can't: it makes the breakdown bounded — as an adversarial spike's size grows, a fast EWMA's error grows without limit while the gate stays flat (it simply rejects the spike).
But the same mechanism that rejects an isolated poison spike must, by construction, also reject the first sample of a genuine sudden change — to a corroboration test the two are identical. Tightening the gate (demanding more corroboration) monotonically improves poison-robustness and monotonically worsens response to sudden change. One knob, opposite effects.
When does a gate actually help?
The gate's advantage is bounded breakdown against an unbounded attack. So it pays off only when observations can be unboundedly large. Under realistic heavy-tailed noise, as an adversarial spike scales 30x, the gate's error stays flat while pure operators blow up:
| adversarial spike scale | corroboration gate | EWMA(0.1) | mean |
|---|---|---|---|
| x5 | 0.56 | 0.84 | 0.20 |
| x15 | 0.50 | 2.28 | 0.56 |
| x150 | 0.52 | 22.3 | 5.53 |
But on bounded unit-norm embeddings — what most "AI memory" actually stores — we tested on real nomic embeddings of 240 real conversation turns, and the gate does not escape the frontier: a tuned EWMA, or even a plain mean, dominates it. A unit-norm poison vector has bounded influence, so every operator already has bounded breakdown; the gate's rejection buys nothing and its novelty-blindness is pure cost.
| real embeddings | sudden-jump error | poison-robust error |
|---|---|---|
| EWMA a=0.08 | 0.13 | 0.07 |
| corroboration gate | 0.45 | 0.10 |
| mean | 0.61 | 0.01 |
(One operational note: raw nomic embeddings are anisotropic — all cosines compress to ~0.75-0.81 — so you must center them before any outlier logic has signal.)
So the scope law: a corroboration gate Pareto-helps if and only if the observation magnitude is unbounded (heavy-tailed counts, scores, prices, durations). For bounded embedding recall, use a tuned decay. The coupling holds in both regimes; the benefit does not.
Escaping the coupling — into a latency floor
If one operator can't have both, use two. A two-channel consolidator — a corroboration-gated slow channel plus a fast channel, with a selector that switches to the fast channel only once a deviation has persisted for d steps — beats every single operator: bounded poison-robustness and fast response to sustained change.
But the coupling doesn't vanish; it becomes a detection-latency floor. With zero waiting (d=1) robustness collapses — you cannot tell an isolated spike from the onset of a real change until you see whether it persists:
| selector delay d | sudden-jump error | poison-robust error |
|---|---|---|
| 1 (no waiting) | 0.30 | 0.68 |
| 3 | 0.36 | 0.19 |
| 8 | 0.47 | 0.19 |
Telling poison from genuine novelty requires observing at least d corroborating steps. Architecture converts the robustness-novelty tradeoff into a fixed detection latency — escapable in design, irreducible in information.
If you build agent memory
- Don't gate embedding recall. Use a tuned decay (or a mean for stable facts). Center the embeddings first.
- Do gate unbounded-magnitude memory (counts, scores, prices, durations): a gate bounds the damage from arbitrarily large poison.
- For memory that must survive both poison and legitimate regime changes, use two channels with a persistence selector, and tune the confirmation delay to the expected poison-burst length. You cannot get robust novelty-detection at zero latency.
The falsifierEvery step shipped with a pre-committed falsifier: a real frontier must exist (fast adapts, slow resists); the gate must be Pareto-non-dominated for unbounded poison and dominated for bounded embeddings; tightening the gate must trade robustness against novelty monotonically; and the two-channel must beat every single operator while collapsing at zero delay. All four held — any one failing would have sunk the claim.
FAQ
When should an AI's memory refuse to believe a new observation? When it cannot yet tell new truth from poison — a contradicting observation is either a real update (adopt fast) or an attack (ignore). Every memory system answers this implicitly through its consolidation operator.
Does a corroboration gate bound the damage from an adversarial spike? Yes — the gate's breakdown stays bounded (~0.50) whether the adversarial spike is ×5, ×15, or ×150, while an EWMA(0.1) grows unbounded (0.84 → 22.3) and a plain mean (0.20 → 5.53).
When does corroboration-gating actually help? Only when observations can be unboundedly large — heavy-tailed counts, scores, prices, durations. For bounded values (e.g. unit-norm embedding recall) a tuned EWMA or mean already keeps breakdown bounded, so the gate adds little.
Is this a benchmark of a production system? No — it is small simulations plus a check on real embeddings, aimed at a law and a design rule, with every number reproducible.
Related research
Daj AI agentovi dlhodobu pamat a hned narazis na neprijemnu otazku: ked nove pozorovanie protireci ulozenemu, je to nova pravda (rychlo aktualizuj) alebo otrava (ignoruj)? Kazdy pamatovy system na to odpoveda implicitne svojim konsolidacnym operatorom — pravidlom, ktore z opakovanych, zasumenych, obcas nepriatelskych pozorovani spravi jednu ulozenu hodnotu. Odmerali sme, co tato volba stoji. Su to zamerne minimalne, plne reprodukovatelne modely (male simulacie plus kontrola na realnych embeddingoch), nie benchmark produkcneho systemu — cielom je zakon a navrhove pravidlo, s kazdym cislom reprodukovatelnym.
Front, ktoremu sa nikto nevyhne
Sleduj hodnotu v case zo zasumenych pozorovani. Dva sposoby zlyhania tahaju opacne. Rychlo sa prisposob zmene a jedina nepriatelska spicka ti trhne odhad. Odolavaj otrave priemerovanim cez dlhe okno a zaostavas za skutocnou zmenou. Prechadzanie rychlosti zabudania exponencialneho klzaveho priemeru (EWMA) vykresli cisty front rychlost-odolnost. Tato cast je ucebnicova — kompromis bodu zlomu z robustnej statistiky (Huber; Hampel) a median / orezany priemer z byzantsky-odolneho ucenia. Berieme ju len ako zaklad.
Spojenie, ktore nie je ucebnicove
Korroboracna brana vpusti nove pozorovanie do ulozeneho odhadu iba ak ho potvrdi viacero nedavnych pozorovani, a izolovane odlahle hodnoty odmietne. V minimalnom modeli robi nieco, co front nedokaze: robi zlom ohranicenym — ako rastie velkost nepriatelskej spicky, chyba rychleho EWMA rastie bez hranice, kym brana ostava plocha (spicku jednoducho odmietne).
Lenze ten isty mechanizmus, co odmietne izolovanu otravnu spicku, musi z principu odmietnut aj prvu vzorku skutocnej nahlej zmeny — pre koroboracny test su nerozlisitelne. Pritvrdenie brany (ziadat viac potvrdenia) monotonne zlepsuje odolnost voci otrave a monotonne zhorsuje reakciu na nahlu zmenu. Jeden gombik, opacne efekty.
Kedy brana naozaj pomoze?
Vyhoda brany je ohraniceny zlom voci neohranicenemu utoku. Takze sa vyplati len ked pozorovania mozu byt neohranicene velke. Pri realistickom tazko-chvostovom sume, ked nepriatelska spicka narastie 30x, chyba brany ostava plocha, kym ciste operatory exploduju:
| velkost nepriatelskej spicky | korroboracna brana | EWMA(0.1) | priemer |
|---|---|---|---|
| x5 | 0.56 | 0.84 | 0.20 |
| x15 | 0.50 | 2.28 | 0.56 |
| x150 | 0.52 | 22.3 | 5.53 |
Ale na ohranicenych jednotkovych embeddingoch — co vacsina "pamate AI" v skutocnosti uklada — sme testovali na realnych nomic embeddingoch 240 realnych konverzacnych replik a brana front neprekona: vyladeny EWMA, ba aj obycajny priemer, ju dominuje. Jednotkovy otravny vektor ma ohraniceny vplyv, takze kazdy operator ma uz ohraniceny zlom; odmietanie brany nic neziska a jej slepota voci novosti je cista strata.
| realne embeddingy | chyba pri nahlej zmene | chyba odolnosti voci otrave |
|---|---|---|
| EWMA a=0.08 | 0.13 | 0.07 |
| korroboracna brana | 0.45 | 0.10 |
| priemer | 0.61 | 0.01 |
(Prevadzkova poznamka: surove nomic embeddingy su anizotropne — vsetky kosinusy sa stlacaju na ~0.75-0.81 — preto ich treba centrovat, nez ma akakolvek logika odlahlych hodnot signal.)
Takze zakon rozsahu: korroboracna brana Pareto-pomoze prave vtedy, ked je velkost pozorovania neohranicena (tazko-chvostove pocty, skore, ceny, trvania). Pri ohranicenom embedding recall pouzi vyladene zabudanie. Spojenie plati v oboch rezimoch; prinos nie.
Unik zo spojenia — do latencneho stropu
Ak jeden operator nemoze mat oboje, pouzi dva. Dvojkanalovy konsolidator — korroboracne hradeny pomaly kanal plus rychly kanal, so selektorom, ktory prepne na rychly az ked odchylka pretrva d krokov — prekona kazdy jednotlivy operator: ohranicena odolnost voci otrave aj rychla reakcia na pretrvavajucu zmenu.
Lenze spojenie nezmizne; zmeni sa na detekcny latencny strop. Bez cakania (d=1) sa odolnost ruti — nerozoznas izolovanu spicku od zaciatku skutocnej zmeny, kym neuvidis, ci pretrva:
| oneskorenie selektora d | chyba pri nahlej zmene | chyba odolnosti voci otrave |
|---|---|---|
| 1 (bez cakania) | 0.30 | 0.68 |
| 3 | 0.36 | 0.19 |
| 8 | 0.47 | 0.19 |
Rozlisit otravu od skutocnej novosti vyzaduje pozorovat aspon d potvrdzujucich krokov. Architektura premeni kompromis odolnost-novost na pevne detekcne oneskorenie — prekonatelne v dizajne, neredukovatelne v informacii.
Ak stavias pamat agenta
- Nehrad embedding recall. Pouzi vyladene zabudanie (alebo priemer pre stabilne fakty). Najprv embeddingy centruj.
- Hrad pamat neohranicenych velicin (pocty, skore, ceny, trvania): brana ohranici skodu z lubovolne velkej otravy.
- Pre pamat, co musi prezit otravu aj legitimne zmeny rezimu, pouzi dva kanaly s perzistencnym selektorom a nalad oneskorenie potvrdenia na ocakavanu dlzku otravneho zhluku. Odolnu detekciu novosti pri nulovej latencii nedostanes.
Falzifikator. Kazdy krok mal vopred zaviazany falzifikator: musi existovat realny front (rychly sa prisposobi, pomaly odolava); brana musi byt Pareto-nedominovana pri neohranicenej otrave a dominovana pri ohranicenych embeddingoch; pritvrdenie brany musi monotonne vymienat odolnost za novost; a dvojkanal musi prekonat kazdy jednotlivy operator a zaroven skolabovat pri nulovom oneskoreni. Vsetky styri platili — zlyhanie ktorehokolvek by tvrdenie potopilo.
FAQ
Kedy má pamäť AI odmietnuť uveriť novému pozorovaniu? Keď ešte nevie odlíšiť novú pravdu od otravy — protirečiace pozorovanie je buď skutočná aktualizácia (prijmi rýchlo), alebo útok (ignoruj). Každý pamäťový systém na to odpovedá implicitne cez svoj consolidation operátor.
Ohraničí corroboration gate škodu z adversariálneho výkyvu? Áno — rozpad brány zostane ohraničený (~0.50) či je výkyv ×5, ×15 alebo ×150, kým EWMA(0.1) rastie neohraničene (0.84 → 22.3) a obyčajný priemer (0.20 → 5.53).
Kedy corroboration-gating naozaj pomôže? Len keď pozorovania môžu byť neohraničene veľké — ťažko-chvostové počty, skóre, ceny, trvania. Pre ohraničené hodnoty (napr. unit-norm embedding recall) už vyladená EWMA alebo priemer drží rozpad ohraničený, takže brána pridá málo.
Je to benchmark produkčného systému? Nie — sú to malé simulácie plus kontrola na reálnych embeddingoch, cielené na zákon a návrhové pravidlo, s každým číslom reprodukovateľným.