Will AI warning tests arrive in time?

Introduction

AI safety evaluations are often described as an early-warning system for AI doom risks. The basic hope is simple: dangerous capabilities will appear gradually enough that researchers can detect them, governments can be informed, and stronger safeguards can be deployed before a model becomes capable of causing catastrophic harm or escaping meaningful human control.

Safety buffers illustration 1 The problem is that evaluations only help if there is a safety buffer between the first warning signs and genuinely dangerous capability. If that buffer is measured in years, there may be time to respond. If it is measured in weeks, days, or a single training run, warning systems may provide little practical protection. Within debates about fast takeoff and FOOM scenarios, the size of this buffer is one of the most important and uncertain questions. It determines whether evaluations are a useful brake on risk or merely a way of documenting danger after it has already arrived.

What dangerous capability evaluations try to catch

Frontier AI evaluations are designed to identify abilities that could substantially increase catastrophic risk. Rather than measuring general intelligence alone, they look for specific capabilities associated with loss-of-control scenarios, dangerous misuse, or strategic autonomy. Current evaluation programmes focus on areas such as cyber offence, deception and persuasion, autonomous replication, sabotage, situational awareness, and the ability to evade oversight. [arXiv]arxiv.orgOur evaluations cover four areas…

Researchers at Google DeepMind’s dangerous capability evaluation programme explicitly describe these tests as a way to identify emerging warning signs before models become highly dangerous. Their early experiments found no strong evidence of extreme dangerous capabilities in the tested models, but they did report signals that could function as advance indicators for future systems. [arXiv]arxiv.orgOur evaluations cover four areas…

The UK AI Security Institute similarly evaluates frontier models in domains linked to public safety and national-security risks, attempting to track how capabilities change as models become more powerful. The underlying idea is that capability growth can be monitored rather than discovered only after deployment. [AI Security Institute]aisi.gov.ukAI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu…

For AI doom arguments, the crucial question is not whether evaluations can detect today’s risks. It is whether they can reliably identify tomorrow’s risks before those risks become difficult or impossible to manage.

The idea of a safety buffer

A safety buffer is the gap between two moments:

The point at which evaluations begin showing concerning results.
The point at which the system becomes genuinely dangerous.

The entire logic of capability thresholds and responsible scaling policies depends on such a gap existing. Many proposed governance frameworks assume that warning signs will appear early enough for developers to increase security, restrict deployment, improve monitoring, or pause development if necessary. [Anthropic]anthropic.coms responsible scaling policyAnthropicAnthropic's Responsible Scaling PolicySep 19, 2023 — Our RSP defines a framework called AI Safety Levels (ASL) for addressing ca…

Anthropic’s Responsible Scaling Policy provides a concrete example. The framework defines capability thresholds intended to trigger stronger safety requirements before risk becomes unacceptable. The policy’s stated aim is to gather evidence and deploy mitigations ahead of dangerous capability levels rather than after them. [Anthropic]anthropic.comresponsible scaling policy v3AnthropicResponsible Scaling Policy Version 3.024 Feb 2026 — We viewed the capability thresholds as potentially important moments for the…

In practice, the safety-buffer model assumes several things are true:

Dangerous capabilities emerge progressively rather than all at once.
Evaluation results can be interpreted correctly.
Organisations are willing and able to act on warnings.
Mitigations can be developed faster than capabilities improve.
Other actors do not simply race ahead while one organisation slows down.

If any of these assumptions fail, the practical value of evaluations declines sharply.

Why researchers expect warning signs before catastrophe

Many AI safety researchers are not expecting a sudden jump from harmless chatbot to existential threat. They instead expect a series of precursor abilities to emerge first.

For example, before a model could realistically execute a sophisticated takeover strategy, it might first demonstrate:

Advanced cyber intrusion skills. [frontiermodelforum.org]frontiermodelforum.orgmanaging advanced cyber risks in frontier ai frameworks13 Feb 2026 — Frontier capability assessments are procedures conducted on frontier AI models to gather evidence of whether they have capa…
Long-horizon autonomous planning.
The ability to conceal information from evaluators.
Situational awareness about its deployment environment.
The ability to coordinate complex tasks with minimal supervision.

This reasoning motivates evaluations of sabotage, stealth, situational awareness, and oversight circumvention. Researchers view these capabilities as prerequisites for more serious loss-of-control scenarios. A system that cannot reliably reason about its environment or evade monitoring is unlikely to execute a sophisticated takeover strategy. [arXiv]arxiv.orgOur evaluations cover four areas…

This creates a hopeful picture. If dangerous systems require multiple precursor capabilities, then each capability may provide advance warning. The resulting chain of signals could create a substantial safety buffer before catastrophe becomes plausible.

Why hard takeoff could erase the response window

FOOM and hard-takeoff arguments challenge exactly that assumption.

In a fast-takeoff scenario, capability gains could become compressed into a short period. Warning signs might still appear, but the interval between “concerning” and “catastrophic” could shrink dramatically.

Imagine an evaluation showing that a model is approaching dangerous autonomy. If the next training run produces a system that is several times more capable, organisations may have little opportunity to adapt. The warning would be real, but the response window might be too short to matter.

This concern appears in many discussions of recursive improvement. If AI systems increasingly contribute to AI research itself, capability growth could become faster than institutional response cycles. Governments often require months or years to create regulations. Large organisations may need weeks or months to redesign infrastructure or security systems. A rapidly advancing model could move through several capability thresholds before external oversight mechanisms react. [Anthropic]anthropic.comOpen source on anthropic.com.

From a doom perspective, the danger is not merely that evaluations fail. It is that evaluations succeed technically while failing strategically because the warning arrives too late.

Safety buffers illustration 2

The implementation problem: detecting risk is not the same as reducing it

Even if evaluations provide accurate warnings, several additional steps must occur before risk falls.

A warning must be:

Trusted by decision-makers.
Shared with relevant organisations.
Interpreted correctly.
Connected to a predefined response plan.
Implemented quickly enough to matter.

Each stage introduces delay.

Recent discussions of frontier-risk governance have highlighted this problem. Pre-deployment evaluations often occur close to release schedules, leaving limited time for independent review or deeper investigation. Some analysts argue that evaluation systems need to be embedded throughout development rather than treated as a final checkpoint immediately before deployment. [Metr]metr.org2026 05 19 frontier risk reportMetrFrontier Risk Report (February to March 2026)May 19, 2026 — 19 May 2026 — To date, third-party evaluations of frontier AI have largel…Published: May 19, 2026

The UK’s AI Security Institute also faces a related limitation. Although it can evaluate models and identify concerns, it generally lacks direct authority to compel companies to alter development plans. This means that warning signals do not automatically translate into protective action. [Time]time.comuk ai safety instituteThis led to the establishment of the UK's AI Safety Institute (AISI) in November 2023, with a mandate to evaluate the risks of new AI mod…Published: November 2023

In other words, the practical safety buffer is often smaller than the technical safety buffer. A model may generate warning signs months before catastrophe becomes plausible, but bureaucratic, commercial, or political delays can consume much of that time.

The challenge of threshold setting

A further complication is deciding where warnings should occur.

If thresholds are set too low, organisations face constant false alarms. If thresholds are set too high, the warning may arrive after dangerous capabilities have already emerged.

This is why many frontier-safety frameworks use capability thresholds as proxies for underlying risk. The goal is to identify capabilities that are likely to precede severe danger, even if the danger itself cannot yet be measured directly. [arXiv]arxiv.orgOur evaluations cover four areas…

Critics argue that this approach depends on assumptions about future capability development that may turn out to be wrong. Some have warned that threshold definitions can become vague or subjective, making it difficult to know whether a genuine warning has occurred. [Institute for AI Policy and Strategy]iaps.airesponsible scalingInstitute for AI Policy and StrategyResponsible Scaling: Comparing Government Guidance…Mar 11, 2024 — Anthropic and other AI companies…

The deeper issue is that no one yet knows how smoothly dangerous capabilities scale. If capability growth follows predictable trends, thresholds may provide substantial warning. If important abilities emerge abruptly, the safety buffer may be much smaller than expected.

What would count as evidence that a safety buffer exists?

The strongest evidence would be repeated examples of evaluations successfully predicting future capabilities before those capabilities became dangerous.

Researchers increasingly look for:

Predictable scaling patterns.
Early indicators of autonomy and deception.
Consistent relationships between benchmark performance and real-world behaviour.
Capability forecasts that remain accurate across model generations.

Some responsible-scaling approaches explicitly depend on the belief that dangerous capabilities develop gradually enough to forecast in advance. Their aim is to identify important thresholds before crossing them and prepare safeguards ahead of time. [LessWrong]lesswrong.comLessWrongAnthropic: Reflections on our Responsible Scaling PolicyMay 19, 2024 — We aim to collect evidence about model risk and prepare s…Published: May 19, 2024

At present, however, the evidence remains limited. Frontier models have improved rapidly, but humanity has not yet observed systems with the extreme capabilities envisioned in AI takeover scenarios. As a result, nobody can directly measure how much warning would precede such systems. Current evaluation programmes are partly an attempt to learn that answer before it becomes urgent. [AI Security Institute]aisi.gov.ukAI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu…

Safety buffers illustration 3

The central disagreement

The debate over safety buffers is ultimately a debate about timing.

Researchers who are relatively optimistic about evaluations tend to believe that dangerous capabilities will emerge progressively, generating detectable warning signs and leaving enough time for coordinated responses. They see evaluations, capability thresholds, and responsible-scaling frameworks as practical tools for navigating uncertainty. [Frontier Model Forum]frontiermodelforum.orgfrontier capability assessmentsFrontier Model ForumFrontier Capability Assessments22 Apr 2025 — Frontier Capability Assessments are procedures conducted on frontier mod…

More pessimistic AI doom advocates worry that capability gains could outpace both evaluation science and institutional decision-making. In their view, a sufficiently fast takeoff could compress the interval between warning and danger until it becomes operationally meaningless. Evaluations might still detect risk, but only after the last realistic opportunity to intervene has passed.

That disagreement sits at the heart of the wider debate over fast takeoff warning signs. The usefulness of frontier-model evaluations depends not only on whether they can spot danger, but on whether the future provides enough time between the first warning and the moment when human control becomes genuinely uncertain.

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

PRINCESS 24"X36" CANVAS/PAPER POSTER NSFW CUSTOMIZABLE QUALITY ART PRINTS

Search eBay.com: AI poster

Browse similar on eBay.com

Example eBay listing

Allen Iverson Ai Poster or Canvas - Allen Iverson Wall Art Decor

Search eBay.com: AI poster

Browse similar on eBay.com

Example eBay listing

SMILING 24"X36" CANVAS/PAPER POSTER NSFW CUSTOMIZABLE QUALITY ART PRINTS

Search eBay.com: AI poster

Browse similar on eBay.com

Example eBay listing

Dolly Parton AI Art 11 x 14" Photo Print

Search eBay.com: AI poster

Browse similar on eBay.com

Browse more on eBay.com

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Example eBay listing

Mengshan 1/144 Mecha Robot Assembly Model Kit Collectible Display Toy

Search eBay.co.uk: robot display model

Browse similar on eBay.co.uk

Example eBay listing

Steampunk Robot Bust 3D Printed Display Model

Search eBay.co.uk: robot display model

Browse similar on eBay.co.uk

Example eBay listing

Mengshan 1/144 Mecha Robot Assembly Model Kit Collectible Display Toy

Search eBay.co.uk: robot display model

Browse similar on eBay.co.uk

Example eBay listing

DR Who - The Mind Robber - White Robot - 28mm + display model sci-fi retro TV

Search eBay.co.uk: robot display model

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Link: https://arxiv.org/abs/2403.13793
Source snippet
Our evaluations cover four areas...
Source: time.com
Title: uk ai safety institute
Link: https://time.com/7204670/uk-ai-safety-institute/
Source snippet
This led to the establishment of the UK's AI Safety Institute (AISI) in November 2023, with a mandate to evaluate the risks of new AI mod...

Published: November 2023
Source: anthropic.com
Title: s responsible scaling policy
Link: https://www.anthropic.com/news/anthropics-responsible-scaling-policy
Source snippet
AnthropicAnthropic's Responsible Scaling PolicySep 19, 2023 — Our RSP defines a framework called AI Safety Levels (ASL) for addressing ca...
Source: lesswrong.com
Link: https://www.lesswrong.com/posts/vAopGQhFPdjcA8CEh/anthropic-reflections-on-our-responsible-scaling-policy
Source snippet
LessWrongAnthropic: Reflections on our Responsible Scaling PolicyMay 19, 2024 — We aim to collect evidence about model risk and prepare s...

Published: May 19, 2024
Source: arxiv.org
Title: arXiv Sabotage Evaluations for Frontier Models
Link: https://arxiv.org/abs/2410.21514
Source snippet
arXivSabotage Evaluations for Frontier ModelsOctober 28, 2024...

Published: October 28, 2024
Source: arxiv.org
Title: arXiv Evaluating Frontier Models for Stealth and Situational Awareness
Link: https://arxiv.org/abs/2505.01420
Source: anthropic.com
Title: responsible scaling policy v3
Link: https://www.anthropic.com/news/responsible-scaling-policy-v3
Source snippet
AnthropicResponsible Scaling Policy Version 3.024 Feb 2026 — We viewed the capability thresholds as potentially important moments for the...
Source: metr.org
Title: 2026 05 19 frontier risk report
Link: https://metr.org/blog/2026-05-19-frontier-risk-report/
Source snippet
MetrFrontier Risk Report (February to March 2026)May 19, 2026 — 19 May 2026 — To date, third-party evaluations of frontier AI have largel...

Published: May 19, 2026
Source: arxiv.org
Link: https://arxiv.org/abs/2406.14713
Source snippet
arXivRisk thresholds for frontier AIJune 20, 2024...

Published: June 20, 2024
Source: anthropic.com
Link: https://www.anthropic.com/
Source: assets.anthropic.com
Title: method of informing safety and risk cases profiling sources of danger
Link: https://assets.anthropic.com/m/377027d5b36ac1eb/original/Sabotage-Evaluations-for-Frontier-Models.pdf
Source snippet
Evaluations for Frontier Modelsby J Benton · Cited by 41 — are screened by another model to detect [misuse]({{ 'misuse/' | relative_url }}), and dangerous-capability evalu...
Source: anthropic.com
Title: acquires Stainless
Link: https://anthropic.com/news/anthropic-acquires-stainless
Source snippet
Anthropic acquires Stainless...
Source: arxiv.org
Link: https://arxiv.org/html/2512.01166v3
Source snippet
AI model will not cause harm, even when the model has dangerous capabilities. They can be thought of as safeguards against misuse, or...
Source: time.com
Title: exclusive anthropic drops flagship safety pledge
Link: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/
Source snippet
Exclusive: Anthropic Drops Flagship Safety Pledge24 Feb 2026 — In 2023, Anthropic committed to never train an AI system unless it could g...
Source: deepmind.google
Link: https://deepmind.google/research/publications/78150/
Source snippet
Evaluating Frontier Models for Dangerous Capabilities21 Mar 2024 — We introduce a programme of new "dangerous capability" evaluations and...
Source: governance.ai
Title: ‍.Read more
Link: https://www.governance.ai/analysis/anthropics-rsp-v3-0-how-it-works-whats-changed-and-some-reflections
Source snippet
Anthropic's RSP v3.0: How it Works, What's Changed, and...Mar 17, 2026 — Anthropic's Responsible Scaling Policy (RSP) – its framework fo...
Source: frontiermodelforum.org
Title: frontier capability assessments
Link: https://www.frontiermodelforum.org/technical-reports/frontier-capability-assessments/
Source snippet
Frontier Model ForumFrontier Capability Assessments22 Apr 2025 — Frontier Capability Assessments are procedures conducted on frontier mod...
Source: aisi.gov.uk
Link: https://www.aisi.gov.uk/frontier-ai-trends-report
Source snippet
AI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu...
Source: iaps.ai
Title: responsible scaling
Link: https://www.iaps.ai/research/responsible-scaling
Source snippet
Institute for AI Policy and StrategyResponsible Scaling: Comparing Government Guidance...Mar 11, 2024 — Anthropic and other AI companies...
Source: safer-ai.org
Title: anthropics responsible scaling policy update makes a step backwards
Link: https://www.safer-ai.org/anthropics-responsible-scaling-policy-update-makes-a-step-backwards
Source snippet
The new policy adopts...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Anthropic
Source snippet
AnthropicAnthropic is an American [artificial]({{ 'artificial-goals/' | relative_url }}) intelligence (AI) company headquartered in San Francisco. It has developed a series of la...
Source: verifywise.ai
Link: https://verifywise.ai/de/ai-governance-library/policies-and-internal-governance/anthropic-responsible-scaling-policy
Source snippet
It establishes commitments for...
Source: verifywise.ai
Link: https://verifywise.ai/ai-governance-library/policies-and-internal-governance/anthropic-responsible-scaling-policy
Source snippet
Anthropic Responsible Scaling PolicyAnthropic's Responsible Scaling Policy defines AI Safety Levels (ASL) based on model capabilities and...
Source: frontiermodelforum.org
Link: https://www.frontiermodelforum.org/
Source snippet
Frontier Model ForumThe Frontier Model Forum is an industry-supported non-profit focused on addressing significant risks to public safety...
Source: frontiermodelforum.org
Title: managing advanced cyber risks in frontier ai frameworks
Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
Source snippet
13 Feb 2026 — Frontier capability assessments are procedures conducted on frontier AI models to gather evidence of whether they have capa...
Source: ts2.tech
Title: Anthropic Brings on Open AI Co-Founder Andrej Karpathy
Link: https://ts2.tech/en/anthropic-just-hired-openai-co-founder-andrej-karpathy-rivals-will-notice/
Source: linkedin.com
Link: https://www.linkedin.com/posts/miclchen_anthropics-responsible-scaling-policy-version-activity-7432196983748206592-ItDB
Source snippet
No more implication of unilateral commitment to pause AI...
Source: linkedin.com
Link: https://www.linkedin.com/company/anthropicresearch
Source snippet
AnthropicWe're an AI research company that builds reliable, interpretable, and steerable AI systems. Our first product is Claude, an AI a...
Source: ai-safety-atlas.com
Title: Dangerous Capability Evaluations
Link: https://ai-safety-atlas.com/chapters/v1/evaluations/dangerous-capability-evaluations/
Source snippet
Chapter 5Dangerous capability evaluations specifically probe for these potentially harmful abilities, helping identify systems that might...
Source: forum.effectivealtruism.org
Title: anthropic announcing our updated responsible scaling policy
Link: https://forum.effectivealtruism.org/posts/JoJwBsGJFWtq72omp/anthropic-announcing-our-updated-responsible-scaling-policy
Source snippet
rewrote its RSP16 Oct 2024 — New RSP introduces more flexible risk assessment but weakens some previous commitments, like evaluation freq...
Source: aisecurityandsafety.org
Title: anthropic rsp
Link: https://aisecurityandsafety.org/de/frameworks/anthropic-rsp/
Source snippet
Wichtige Anforderungen. Assess AI...Read more...
Source: reddit.com
Link: https://www.reddit.com/r/singularity/comments/1g4a1mm/anthropic_announcing_our_updated_responsible/
Source snippet
easures for Claude 4 Opus "to limit risk of users developing...Read more...
Source: youtube.com
Link: https://www.youtube.com/%40anthropic-ai
Source snippet
AnthropicWe're an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Andr...

Additional References

Source: linkedin.com
Link: https://www.linkedin.com/posts/tesssbuckley_today-uks-ai-security-institute-of-department-activity-7407352566029828097-ZTJf
Source snippet
UK AI Security Institute Publishes Frontier AI Trends ReportAs the first public analysis of trends by AISI it draws on two years' worth o...
Source: sebastianfarquhar.com
Link: https://sebastianfarquhar.com/assets/papers/phuongEvaluating2024.pdf
Source snippet
Evaluating Frontier Models for Dangerous Capabilitiesby M Phuong · 2024 · Cited by 145 — Our evaluations cover four areas: (1) persuasion...
Source: rand.org
Link: https://www.rand.org/content/dam/rand/pubs/conf_proceedings/CFA3400/CFA3429-1/RAND_CFA3429-1.pdf
Source snippet
ty, and identifying warning signs of evaluation problems, this initiative aims...Read more...
Source: aisi.gov.uk
Title: early lessons from evaluating frontier ai systems
Link: https://www.aisi.gov.uk/blog/early-lessons-from-evaluating-frontier-ai-systems
Source snippet
model complies with explicitly harmful requests.... For example, directly exploring the capability to cause large-scale harm would be da...
Source: agora.eto.tech
Title: Agora Anthropic Responsible Scaling Policy
Link: https://agora.eto.tech/instrument/768
Source snippet
ETO AgoraAnthropic Responsible Scaling Policy - ETO AGORAA Capability Threshold is a prespeciﬁed level of AI capability that, if reached...
Source: GOV.UK
Title: emerging processes for frontier ai safety
Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
Source snippet
processes for frontier AI safety27 Oct 2023 — This document contains the world's first overview of emerging safety processes focused on f...
Source: richardmoulange.substack.com
Title: deep dive how the uk can enhance
Link: https://richardmoulange.substack.com/p/deep-dive-how-the-uk-can-enhance
Source snippet
substack.comDeep-dive: how the UK can enhance strategic advantage...... models for dangerous capabilities. This enables rapid warning wh...
Source: forum.effectivealtruism.org
Title: responsible scaling policy v3 1
Link: https://forum.effectivealtruism.org/posts/DGZNAGL2FNJfftwgE/responsible-scaling-policy-v3-1
Source snippet
Scaling Policy v3Feb 24, 2026 — The idea was: if a company has a policy saying it isn't safe to train an AI model with X level of capabil...
Source: youtube.com
Title: Mary Phuong – Dangerous Capability [Evals]({{ ‘evals/’ | relative_url }}): Basis for Frontier Safety
Link: https://www.youtube.com/watch?v=pO8IcIqhHuk
Source snippet
Vincent Conitzer - AI Testing Should Account for Sophisticated Strategic Behaviour [Alignment Worksh...
Source: youtube.com
Title: Vincent Conitzer
Link: https://www.youtube.com/watch?v=SB5NeoYi_q8
Source snippet
The UK Tested Mythos AI's Cyber Skills. Here's What It Found...

Will AI warning tests arrive in time?

Introduction

What dangerous capability evaluations try to catch

The idea of a safety buffer

Why researchers expect warning signs before catastrophe

Why hard takeoff could erase the response window

The implementation problem: detecting risk is not the same as reducing it

The challenge of threshold setting

What would count as evidence that a safety buffer exists?

The central disagreement

Further Reading

Human Compatible

The Alignment Problem

Rebooting AI

Superintelligence

Marketplace Samples

PRINCESS 24"X36" CANVAS/PAPER POSTER NSFW CUSTOMIZABLE QUALITY ART PRINTS

Allen Iverson Ai Poster or Canvas - Allen Iverson Wall Art Decor

SMILING 24"X36" CANVAS/PAPER POSTER NSFW CUSTOMIZABLE QUALITY ART PRINTS

Dolly Parton AI Art 11 x 14" Photo Print

Mengshan 1/144 Mecha Robot Assembly Model Kit Collectible Display Toy

Steampunk Robot Bust 3D Printed Display Model

Mengshan 1/144 Mecha Robot Assembly Model Kit Collectible Display Toy

DR Who - The Mind Robber - White Robot - 28mm + display model sci-fi retro TV

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2