Within Fast takeoff

Will AI warning tests arrive in time?

AI safety evaluations only help if warning signs appear far enough before dangerous autonomy for labs and governments to act.

On this page

  • What dangerous capability evaluations try to catch
  • The idea of a safety buffer
  • Why hard takeoff could erase the response window
Preview for Will AI warning tests arrive in time?

Introduction

AI safety evaluations are often described as an early-warning system for AI doom risks. The basic hope is simple: dangerous capabilities will appear gradually enough that researchers can detect them, governments can be informed, and stronger safeguards can be deployed before a model becomes capable of causing catastrophic harm or escaping meaningful human control.

Safety buffers illustration 1 The problem is that evaluations only help if there is a safety buffer between the first warning signs and genuinely dangerous capability. If that buffer is measured in years, there may be time to respond. If it is measured in weeks, days, or a single training run, warning systems may provide little practical protection. Within debates about fast takeoff and FOOM scenarios, the size of this buffer is one of the most important and uncertain questions. It determines whether evaluations are a useful brake on risk or merely a way of documenting danger after it has already arrived.

What dangerous capability evaluations try to catch

Frontier AI evaluations are designed to identify abilities that could substantially increase catastrophic risk. Rather than measuring general intelligence alone, they look for specific capabilities associated with loss-of-control scenarios, dangerous misuse, or strategic autonomy. Current evaluation programmes focus on areas such as cyber offence, deception and persuasion, autonomous replication, sabotage, situational awareness, and the ability to evade oversight. [arXiv]arxiv.orgOur evaluations cover four areas…

Researchers at Google DeepMind’s dangerous capability evaluation programme explicitly describe these tests as a way to identify emerging warning signs before models become highly dangerous. Their early experiments found no strong evidence of extreme dangerous capabilities in the tested models, but they did report signals that could function as advance indicators for future systems. [arXiv]arxiv.orgOur evaluations cover four areas…

The UK AI Security Institute similarly evaluates frontier models in domains linked to public safety and national-security risks, attempting to track how capabilities change as models become more powerful. The underlying idea is that capability growth can be monitored rather than discovered only after deployment. [AI Security Institute]aisi.gov.ukAI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu…

For AI doom arguments, the crucial question is not whether evaluations can detect today’s risks. It is whether they can reliably identify tomorrow’s risks before those risks become difficult or impossible to manage.

The idea of a safety buffer

A safety buffer is the gap between two moments:

  1. The point at which evaluations begin showing concerning results.
  2. The point at which the system becomes genuinely dangerous.

The entire logic of capability thresholds and responsible scaling policies depends on such a gap existing. Many proposed governance frameworks assume that warning signs will appear early enough for developers to increase security, restrict deployment, improve monitoring, or pause development if necessary. [Anthropic]anthropic.coms responsible scaling policyAnthropicAnthropic's Responsible Scaling PolicySep 19, 2023 — Our RSP defines a framework called AI Safety Levels (ASL) for addressing ca…

Anthropic’s Responsible Scaling Policy provides a concrete example. The framework defines capability thresholds intended to trigger stronger safety requirements before risk becomes unacceptable. The policy’s stated aim is to gather evidence and deploy mitigations ahead of dangerous capability levels rather than after them. [Anthropic]anthropic.comresponsible scaling policy v3AnthropicResponsible Scaling Policy Version 3.024 Feb 2026 — We viewed the capability thresholds as potentially important moments for the…

In practice, the safety-buffer model assumes several things are true:

  • Dangerous capabilities emerge progressively rather than all at once.
  • Evaluation results can be interpreted correctly.
  • Organisations are willing and able to act on warnings.
  • Mitigations can be developed faster than capabilities improve.
  • Other actors do not simply race ahead while one organisation slows down.

If any of these assumptions fail, the practical value of evaluations declines sharply.

Why researchers expect warning signs before catastrophe

Many AI safety researchers are not expecting a sudden jump from harmless chatbot to existential threat. They instead expect a series of precursor abilities to emerge first.

For example, before a model could realistically execute a sophisticated takeover strategy, it might first demonstrate:

  • Advanced cyber intrusion skills. [frontiermodelforum.org]frontiermodelforum.orgmanaging advanced cyber risks in frontier ai frameworks13 Feb 2026 — Frontier capability assessments are procedures conducted on frontier AI models to gather evidence of whether they have capa…
  • Long-horizon autonomous planning.
  • The ability to conceal information from evaluators.
  • Situational awareness about its deployment environment.
  • The ability to coordinate complex tasks with minimal supervision.

This reasoning motivates evaluations of sabotage, stealth, situational awareness, and oversight circumvention. Researchers view these capabilities as prerequisites for more serious loss-of-control scenarios. A system that cannot reliably reason about its environment or evade monitoring is unlikely to execute a sophisticated takeover strategy. [arXiv]arxiv.orgOur evaluations cover four areas…

This creates a hopeful picture. If dangerous systems require multiple precursor capabilities, then each capability may provide advance warning. The resulting chain of signals could create a substantial safety buffer before catastrophe becomes plausible.

Why hard takeoff could erase the response window

FOOM and hard-takeoff arguments challenge exactly that assumption.

In a fast-takeoff scenario, capability gains could become compressed into a short period. Warning signs might still appear, but the interval between “concerning” and “catastrophic” could shrink dramatically.

Imagine an evaluation showing that a model is approaching dangerous autonomy. If the next training run produces a system that is several times more capable, organisations may have little opportunity to adapt. The warning would be real, but the response window might be too short to matter.

This concern appears in many discussions of recursive improvement. If AI systems increasingly contribute to AI research itself, capability growth could become faster than institutional response cycles. Governments often require months or years to create regulations. Large organisations may need weeks or months to redesign infrastructure or security systems. A rapidly advancing model could move through several capability thresholds before external oversight mechanisms react. [Anthropic]anthropic.comOpen source on anthropic.com.

From a doom perspective, the danger is not merely that evaluations fail. It is that evaluations succeed technically while failing strategically because the warning arrives too late.

Safety buffers illustration 2

The implementation problem: detecting risk is not the same as reducing it

Even if evaluations provide accurate warnings, several additional steps must occur before risk falls.

A warning must be:

  • Trusted by decision-makers.
  • Shared with relevant organisations.
  • Interpreted correctly.
  • Connected to a predefined response plan.
  • Implemented quickly enough to matter.

Each stage introduces delay.

Recent discussions of frontier-risk governance have highlighted this problem. Pre-deployment evaluations often occur close to release schedules, leaving limited time for independent review or deeper investigation. Some analysts argue that evaluation systems need to be embedded throughout development rather than treated as a final checkpoint immediately before deployment. [Metr]metr.org2026 05 19 frontier risk reportMetrFrontier Risk Report (February to March 2026)May 19, 2026 — 19 May 2026 — To date, third-party evaluations of frontier AI have largel…Published: May 19, 2026

The UK’s AI Security Institute also faces a related limitation. Although it can evaluate models and identify concerns, it generally lacks direct authority to compel companies to alter development plans. This means that warning signals do not automatically translate into protective action. [Time]time.comuk ai safety instituteThis led to the establishment of the UK's AI Safety Institute (AISI) in November 2023, with a mandate to evaluate the risks of new AI mod…Published: November 2023

In other words, the practical safety buffer is often smaller than the technical safety buffer. A model may generate warning signs months before catastrophe becomes plausible, but bureaucratic, commercial, or political delays can consume much of that time.

The challenge of threshold setting

A further complication is deciding where warnings should occur.

If thresholds are set too low, organisations face constant false alarms. If thresholds are set too high, the warning may arrive after dangerous capabilities have already emerged.

This is why many frontier-safety frameworks use capability thresholds as proxies for underlying risk. The goal is to identify capabilities that are likely to precede severe danger, even if the danger itself cannot yet be measured directly. [arXiv]arxiv.orgOur evaluations cover four areas…

Critics argue that this approach depends on assumptions about future capability development that may turn out to be wrong. Some have warned that threshold definitions can become vague or subjective, making it difficult to know whether a genuine warning has occurred. [Institute for AI Policy and Strategy]iaps.airesponsible scalingInstitute for AI Policy and StrategyResponsible Scaling: Comparing Government Guidance…Mar 11, 2024 — Anthropic and other AI companies…

The deeper issue is that no one yet knows how smoothly dangerous capabilities scale. If capability growth follows predictable trends, thresholds may provide substantial warning. If important abilities emerge abruptly, the safety buffer may be much smaller than expected.

What would count as evidence that a safety buffer exists?

The strongest evidence would be repeated examples of evaluations successfully predicting future capabilities before those capabilities became dangerous.

Researchers increasingly look for:

  • Predictable scaling patterns.
  • Early indicators of autonomy and deception.
  • Consistent relationships between benchmark performance and real-world behaviour.
  • Capability forecasts that remain accurate across model generations.

Some responsible-scaling approaches explicitly depend on the belief that dangerous capabilities develop gradually enough to forecast in advance. Their aim is to identify important thresholds before crossing them and prepare safeguards ahead of time. [LessWrong]lesswrong.comLessWrongAnthropic: Reflections on our Responsible Scaling PolicyMay 19, 2024 — We aim to collect evidence about model risk and prepare s…Published: May 19, 2024

At present, however, the evidence remains limited. Frontier models have improved rapidly, but humanity has not yet observed systems with the extreme capabilities envisioned in AI takeover scenarios. As a result, nobody can directly measure how much warning would precede such systems. Current evaluation programmes are partly an attempt to learn that answer before it becomes urgent. [AI Security Institute]aisi.gov.ukAI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu…

Safety buffers illustration 3

The central disagreement

The debate over safety buffers is ultimately a debate about timing.

Researchers who are relatively optimistic about evaluations tend to believe that dangerous capabilities will emerge progressively, generating detectable warning signs and leaving enough time for coordinated responses. They see evaluations, capability thresholds, and responsible-scaling frameworks as practical tools for navigating uncertainty. [Frontier Model Forum]frontiermodelforum.orgfrontier capability assessmentsFrontier Model ForumFrontier Capability Assessments22 Apr 2025 — Frontier Capability Assessments are procedures conducted on frontier mod…

More pessimistic AI doom advocates worry that capability gains could outpace both evaluation science and institutional decision-making. In their view, a sufficiently fast takeoff could compress the interval between warning and danger until it becomes operationally meaningless. Evaluations might still detect risk, but only after the last realistic opportunity to intervene has passed.

That disagreement sits at the heart of the wider debate over fast takeoff warning signs. The usefulness of frontier-model evaluations depends not only on whether they can spot danger, but on whether the future provides enough time between the first warning and the moment when human control becomes genuinely uncertain.

Amazon book picks

Further Reading

Books and field guides related to Will AI warning tests arrive in time?. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Link: https://arxiv.org/abs/2403.13793
    Source snippet

    Our evaluations cover four areas...

  2. Source: time.com
    Title: uk ai safety institute
    Link: https://time.com/7204670/uk-ai-safety-institute/
    Source snippet

    This led to the establishment of the UK's AI Safety Institute (AISI) in November 2023, with a mandate to evaluate the risks of new AI mod...

    Published: November 2023

  3. Source: anthropic.com
    Title: s responsible scaling policy
    Link: https://www.anthropic.com/news/anthropics-responsible-scaling-policy
    Source snippet

    AnthropicAnthropic's Responsible Scaling PolicySep 19, 2023 — Our RSP defines a framework called AI Safety Levels (ASL) for addressing ca...

  4. Source: lesswrong.com
    Link: https://www.lesswrong.com/posts/vAopGQhFPdjcA8CEh/anthropic-reflections-on-our-responsible-scaling-policy
    Source snippet

    LessWrongAnthropic: Reflections on our Responsible Scaling PolicyMay 19, 2024 — We aim to collect evidence about model risk and prepare s...

    Published: May 19, 2024

  5. Source: arxiv.org
    Title: arXiv Sabotage Evaluations for Frontier Models
    Link: https://arxiv.org/abs/2410.21514
    Source snippet

    arXivSabotage Evaluations for Frontier ModelsOctober 28, 2024...

    Published: October 28, 2024

  6. Source: arxiv.org
    Title: arXiv Evaluating Frontier Models for Stealth and Situational Awareness
    Link: https://arxiv.org/abs/2505.01420

  7. Source: anthropic.com
    Title: responsible scaling policy v3
    Link: https://www.anthropic.com/news/responsible-scaling-policy-v3
    Source snippet

    AnthropicResponsible Scaling Policy Version 3.024 Feb 2026 — We viewed the capability thresholds as potentially important moments for the...

  8. Source: metr.org
    Title: 2026 05 19 frontier risk report
    Link: https://metr.org/blog/2026-05-19-frontier-risk-report/
    Source snippet

    MetrFrontier Risk Report (February to March 2026)May 19, 2026 — 19 May 2026 — To date, third-party evaluations of frontier AI have largel...

    Published: May 19, 2026

  9. Source: arxiv.org
    Link: https://arxiv.org/abs/2406.14713
    Source snippet

    arXivRisk thresholds for frontier AIJune 20, 2024...

    Published: June 20, 2024

  10. Source: anthropic.com
    Link: https://www.anthropic.com/

  11. Source: assets.anthropic.com
    Title: method of informing safety and risk cases profiling sources of danger
    Link: https://assets.anthropic.com/m/377027d5b36ac1eb/original/Sabotage-Evaluations-for-Frontier-Models.pdf
    Source snippet

    Evaluations for Frontier Modelsby J Benton · Cited by 41 — are screened by another model to detect [misuse]({{ 'misuse/' | relative_url }}), and dangerous-capability evalu...

  12. Source: anthropic.com
    Title: acquires Stainless
    Link: https://anthropic.com/news/anthropic-acquires-stainless
    Source snippet

    Anthropic acquires Stainless...

  13. Source: arxiv.org
    Link: https://arxiv.org/html/2512.01166v3
    Source snippet

    AI model will not cause harm, even when the model has dangerous capabilities. They can be thought of as safeguards against misuse, or...

  14. Source: time.com
    Title: exclusive anthropic drops flagship safety pledge
    Link: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/
    Source snippet

    Exclusive: Anthropic Drops Flagship Safety Pledge24 Feb 2026 — In 2023, Anthropic committed to never train an AI system unless it could g...

  15. Source: deepmind.google
    Link: https://deepmind.google/research/publications/78150/
    Source snippet

    Evaluating Frontier Models for Dangerous Capabilities21 Mar 2024 — We introduce a programme of new "dangerous capability" evaluations and...

  16. Source: governance.ai
    Title: ‍.Read more
    Link: https://www.governance.ai/analysis/anthropics-rsp-v3-0-how-it-works-whats-changed-and-some-reflections
    Source snippet

    Anthropic's RSP v3.0: How it Works, What's Changed, and...Mar 17, 2026 — Anthropic's Responsible Scaling Policy (RSP) – its framework fo...

  17. Source: frontiermodelforum.org
    Title: frontier capability assessments
    Link: https://www.frontiermodelforum.org/technical-reports/frontier-capability-assessments/
    Source snippet

    Frontier Model ForumFrontier Capability Assessments22 Apr 2025 — Frontier Capability Assessments are procedures conducted on frontier mod...

  18. Source: aisi.gov.uk
    Link: https://www.aisi.gov.uk/frontier-ai-trends-report
    Source snippet

    AI Security InstituteFrontier AI Trends Report by The AI Security Institute (AISI)The UK AI Security Institute (AISI) has conducted evalu...

  19. Source: iaps.ai
    Title: responsible scaling
    Link: https://www.iaps.ai/research/responsible-scaling
    Source snippet

    Institute for AI Policy and StrategyResponsible Scaling: Comparing Government Guidance...Mar 11, 2024 — Anthropic and other AI companies...

  20. Source: safer-ai.org
    Title: anthropics responsible scaling policy update makes a step backwards
    Link: https://www.safer-ai.org/anthropics-responsible-scaling-policy-update-makes-a-step-backwards
    Source snippet

    The new policy adopts...

  21. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Anthropic
    Source snippet

    AnthropicAnthropic is an American [artificial]({{ 'artificial-goals/' | relative_url }}) intelligence (AI) company headquartered in San Francisco. It has developed a series of la...

  22. Source: verifywise.ai
    Link: https://verifywise.ai/de/ai-governance-library/policies-and-internal-governance/anthropic-responsible-scaling-policy
    Source snippet

    It establishes commitments for...

  23. Source: verifywise.ai
    Link: https://verifywise.ai/ai-governance-library/policies-and-internal-governance/anthropic-responsible-scaling-policy
    Source snippet

    Anthropic Responsible Scaling PolicyAnthropic's Responsible Scaling Policy defines AI Safety Levels (ASL) based on model capabilities and...

  24. Source: frontiermodelforum.org
    Link: https://www.frontiermodelforum.org/
    Source snippet

    Frontier Model ForumThe Frontier Model Forum is an industry-supported non-profit focused on addressing significant risks to public safety...

  25. Source: frontiermodelforum.org
    Title: managing advanced cyber risks in frontier ai frameworks
    Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
    Source snippet

    13 Feb 2026 — Frontier capability assessments are procedures conducted on frontier AI models to gather evidence of whether they have capa...

  26. Source: ts2.tech
    Title: Anthropic Brings on Open AI Co-Founder Andrej Karpathy
    Link: https://ts2.tech/en/anthropic-just-hired-openai-co-founder-andrej-karpathy-rivals-will-notice/

  27. Source: linkedin.com
    Link: https://www.linkedin.com/posts/miclchen_anthropics-responsible-scaling-policy-version-activity-7432196983748206592-ItDB
    Source snippet

    No more implication of unilateral commitment to pause AI...

  28. Source: linkedin.com
    Link: https://www.linkedin.com/company/anthropicresearch
    Source snippet

    AnthropicWe're an AI research company that builds reliable, interpretable, and steerable AI systems. Our first product is Claude, an AI a...

  29. Source: ai-safety-atlas.com
    Title: Dangerous Capability Evaluations
    Link: https://ai-safety-atlas.com/chapters/v1/evaluations/dangerous-capability-evaluations/
    Source snippet

    Chapter 5Dangerous capability evaluations specifically probe for these potentially harmful abilities, helping identify systems that might...

  30. Source: forum.effectivealtruism.org
    Title: anthropic announcing our updated responsible scaling policy
    Link: https://forum.effectivealtruism.org/posts/JoJwBsGJFWtq72omp/anthropic-announcing-our-updated-responsible-scaling-policy
    Source snippet

    rewrote its RSP16 Oct 2024 — New RSP introduces more flexible risk assessment but weakens some previous commitments, like evaluation freq...

  31. Source: aisecurityandsafety.org
    Title: anthropic rsp
    Link: https://aisecurityandsafety.org/de/frameworks/anthropic-rsp/
    Source snippet

    Wichtige Anforderungen. Assess AI...Read more...

  32. Source: reddit.com
    Link: https://www.reddit.com/r/singularity/comments/1g4a1mm/anthropic_announcing_our_updated_responsible/
    Source snippet

    easures for Claude 4 Opus "to limit risk of users developing...Read more...

  33. Source: youtube.com
    Link: https://www.youtube.com/%40anthropic-ai
    Source snippet

    AnthropicWe're an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Andr...

Additional References

  1. Source: linkedin.com
    Link: https://www.linkedin.com/posts/tesssbuckley_today-uks-ai-security-institute-of-department-activity-7407352566029828097-ZTJf
    Source snippet

    UK AI Security Institute Publishes Frontier AI Trends ReportAs the first public analysis of trends by AISI it draws on two years' worth o...

  2. Source: sebastianfarquhar.com
    Link: https://sebastianfarquhar.com/assets/papers/phuongEvaluating2024.pdf
    Source snippet

    Evaluating Frontier Models for Dangerous Capabilitiesby M Phuong · 2024 · Cited by 145 — Our evaluations cover four areas: (1) persuasion...

  3. Source: rand.org
    Link: https://www.rand.org/content/dam/rand/pubs/conf_proceedings/CFA3400/CFA3429-1/RAND_CFA3429-1.pdf
    Source snippet

    ty, and identifying warning signs of evaluation problems, this initiative aims...Read more...

  4. Source: aisi.gov.uk
    Title: early lessons from evaluating frontier ai systems
    Link: https://www.aisi.gov.uk/blog/early-lessons-from-evaluating-frontier-ai-systems
    Source snippet

    model complies with explicitly harmful requests.... For example, directly exploring the capability to cause large-scale harm would be da...

  5. Source: agora.eto.tech
    Title: Agora Anthropic Responsible Scaling Policy
    Link: https://agora.eto.tech/instrument/768
    Source snippet

    ETO AgoraAnthropic Responsible Scaling Policy - ETO AGORAA Capability Threshold is a prespecified level of AI capability that, if reached...

  6. Source: GOV.UK
    Title: emerging processes for frontier ai safety
    Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
    Source snippet

    processes for frontier AI safety27 Oct 2023 — This document contains the world's first overview of emerging safety processes focused on f...

  7. Source: richardmoulange.substack.com
    Title: deep dive how the uk can enhance
    Link: https://richardmoulange.substack.com/p/deep-dive-how-the-uk-can-enhance
    Source snippet

    substack.comDeep-dive: how the UK can enhance strategic advantage...... models for dangerous capabilities. This enables rapid warning wh...

  8. Source: forum.effectivealtruism.org
    Title: responsible scaling policy v3 1
    Link: https://forum.effectivealtruism.org/posts/DGZNAGL2FNJfftwgE/responsible-scaling-policy-v3-1
    Source snippet

    Scaling Policy v3Feb 24, 2026 — The idea was: if a company has a policy saying it isn't safe to train an AI model with X level of capabil...

  9. Source: youtube.com
    Title: Mary Phuong – Dangerous Capability [Evals]({{ ‘evals/’ | relative_url }}): Basis for Frontier Safety
    Link: https://www.youtube.com/watch?v=pO8IcIqhHuk
    Source snippet

    Vincent Conitzer - AI Testing Should Account for Sophisticated Strategic Behaviour [Alignment Worksh...

  10. Source: youtube.com
    Title: Vincent Conitzer
    Link: https://www.youtube.com/watch?v=SB5NeoYi_q8
    Source snippet

    The UK Tested Mythos AI's Cyber Skills. Here's What It Found...

Topic Tree

Follow this branch

Parent topic

Fast takeoff What would warn US before FOOM?

Related pages 2