Within Shared Rules

When should AI labs be forced to pause?

Capability thresholds aim to stop labs treating dangerous model abilities as private judgement calls during a competitive race.

On this page

  • What counts as a dangerous capability threshold
  • How common thresholds reduce racing incentives
  • Why compute, capability, and risk triggers remain disputed
Preview for When should AI labs be forced to pause?

Introduction

In debates about AI doom and existential risk, one of the persistent governance challenges is deciding when AI systems become dangerous enough that developers should be required to change course — either by stopping development, imposing stricter safeguards, or limiting deployment. A core response in recent policy and safety frameworks is the use of capability thresholds: defined levels of model competence that act as triggers for stronger safety controls and potentially forced pauses in deployment choices. Capability thresholds aim to shift decisions from ad‑hoc internal judgement calls to shared, measurable triggers that make cautious behaviour a common standard, reducing incentives for any one lab to rush ahead without safeguards.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…Published: December 16, 2025

Thresholds illustration 1

What Counts as a Dangerous Capability Threshold

At its simplest, a capability threshold is a pre‑specified point in a model’s abilities — such as performance on certain tasks, degree of autonomy, or ability to meaningfully assist harmful actors — that activates new governance obligations. These obligations can include deeper safety evaluations, enhanced security controls, restrictions on certain kinds of deployment, and in some frameworks, deliberate pauses in training until mitigations are in place.[Juncture Policy]juncturepolicy.orgJuncture Policy Capability ThresholdJuncture PolicyCapability Threshold - Juncture Policy…

Across the frontier AI safety frameworks published by major developers, the notion of “dangerous” varies but generally aligns with abilities that could enable large‑scale harm without substantial mitigation: assisting in biological misuse, automating sophisticated cyberattacks, or providing autonomous capability to execute harmful strategies. These thresholds are not based solely on broad resource proxies like compute but are tied to specific, identifiable capabilities that correlate with societal risk vectors.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…

A typical structure seen in these frameworks involves two linked concepts:

  • Enabling capability thresholds signal that a model has reached skills that could make certain harmful outcomes plausible if not mitigated.
  • Deployment or residual risk thresholds then assess whether a model that has crossed an enabling threshold can be safely deployed after specified safeguards are in place.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…

Because true risk — in terms of societal harm probability — is very hard to estimate for novel technologies, many frameworks currently use capability thresholds as proxies for risk triggers, balancing measurability with risk relevance.[GovAI]governance.airisk thresholds for frontier aiGovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024…Published: June 20, 2024

How Capability Thresholds Can Reduce Competitive Race Incentives

A central problem that drives premature deployment in the AI race is the fear of losing advantage if a lab slows while competitors keep developing or deploying. Shared capability thresholds help realign incentives by making safety obligations predictable and common rather than private policy choices. If all actors agree that, for example, ability to generate detailed actionable biological synthesis instructions, or to autonomously coordinate harmful digital operations, triggers compulsory safeguards or deployment restrictions, then no single actor can treat those capabilities as a private risk judgement.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…Published: December 16, 2025

This shared triggering reduces the strategic advantage of secrecy around risk signalling. Developers know that exceeding an agreed threshold will automatically elevate safety requirements, so there is less value in hiding or downplaying risky features to reach the market faster. It also helps external auditors, regulators, and governments to understand when and why higher safety controls should apply, providing a basis for consistent oversight.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…

Importantly, capability thresholds are sometimes embedded in tiered safety regimes, much like biosafety levels in laboratories. Lower thresholds might require additional internal testing and oversight, while higher ones temporarily halt open deployment until concrete mitigations — such as containment protocols, external audits, or behavioural constraints — are proven effective.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…Published: December 16, 2025

Thresholds illustration 2

Why Compute, Capability and Risk Triggers Remain Disputed

Although capability thresholds are gaining traction, they are not without controversy or limits.

  • Measurement challenges: Evaluating whether a model truly possesses a dangerous capability is difficult. Benchmark scores or task success rates are imperfect proxies for real‑world harm potential, and capabilities may emerge unpredictably outside defined tests.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…Published: December 16, 2025
  • Compute versus capability proxies: Some propose simpler triggers based on training compute (e.g., FLOPs thresholds) because they are easy to measure. But relying on resource use alone can miss small models with harmful behaviours or overflag benign systems, making compute thresholds a rougher tool than capability‑based ones.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
  • Risk thresholds versus capability thresholds: True risk thresholds — explicit limits on acceptable harm probability or impact — are arguably more principled but currently too hard to compute with any confidence for unforeseen AI risks. Capability thresholds therefore serve as a stand‑in, though experts caution against treating them as direct measures of ultimate danger.[GovAI]governance.airisk thresholds for frontier aiGovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024…Published: June 20, 2024
  • Governance implementation: Even when thresholds are defined, enforcing them across labs globally is difficult. Without binding regulation or mutual verification, voluntary frameworks risk divergence and strategic non‑compliance, especially under competitive pressure. A recent example is the shift in some companies’ safety policies away from explicit pause commitments, which raises concerns about the reliability of self‑imposed thresholds in practice.[PC Gamer]pcgamer.comPreviously, under its Responsible Scaling Policy (RSP), Anthropic pledged to halt AI development should new systems reach dangerous capab…

Implementation and Practical Limits

Capability thresholds are becoming a staple of frontier AI safety frameworks, with many of the largest developers embedding them into staged plans where crossing a threshold escalates requirements for evaluation, mitigation, security, and in some cases, deployment constraints. These frameworks typically include:

  • A catalogue of hazardous capabilities identified through threat modelling;
  • Evaluation protocols that test models against those capabilities;
  • Decision processes that link test outcomes to governance steps;
  • Escalation rules that require stronger safeguards or pauses if thresholds are crossed.[AI Security & Safety Directory]aisecurityandsafety.orgAI Security & Safety DirectoryFrontier AI Safety Framework — AI Governance Definition & Guide | AI Safety DirectoryMarch 27, 2026…Published: March 27, 2026

Critically, setting thresholds in advance — rather than deciding on the fly — helps create external accountability and transparency. It also allows ecosystem actors, including regulators and civil society, to understand and critique the basis for safety escalations. But because frontier capabilities evolve rapidly, thresholds must be iteratively updated, and their validity re‑tested against real‑world outcomes and new risk evidence.[GOV.UK]GOV.UKEmerging processes for frontier AI safety27, 2023…

Thresholds illustration 3

Looking Ahead: Thresholds in Governance and Public Policy

Capability thresholds have emerged as one of the most tractable governance tools for aligning incentives toward safer deployment choices in an accelerating race. By providing shared signals about what counts as potentially dangerous, they can help make safety expectations predictable and less dependent on unilateral lab judgements. However, they are not a panacea: their effectiveness depends on measurement quality, international cooperation, and integration with broader regulatory frameworks that can enforce consequences when thresholds are crossed. As AI capabilities continue to advance, refining these thresholds and their implementation will remain a key frontier in efforts to manage existential risks from advanced systems.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…

Amazon book picks

Further Reading

Books and field guides related to When should AI labs be forced to pause?. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: metr.org
    Title: Common Elements of Frontier AI Safety Policies
    Link: https://metr.org/common-elements
    Source snippet

    METRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025...

    Published: December 16, 2025

  2. Source: governance.ai
    Title: risk thresholds for frontier ai
    Link: https://www.governance.ai/research-paper/risk-thresholds-for-frontier-ai
    Source snippet

    GovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024...

    Published: June 20, 2024

  3. Source: GOV.UK
    Title: Emerging processes for frontier AI safety
    Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
    Source snippet

    27, 2023...

  4. Source: governance.ai
    Title: coordinated pausing evaluation based scheme
    Link: https://www.governance.ai/research-paper/coordinated-pausing-evaluation-based-scheme
    Source snippet

    Coordinated Pausing: An Evaluation-Based Coordination Scheme for Frontier AI Developers | GovAISeptember 30, 2023 — COORDINATED PAUSING...

    Published: September 30, 2023

  5. Source: frontiermodelforum.org
    Title: Frontier Model Forum Risk Taxonomy and Thresholds for Frontier AI Frameworks
    Link: https://www.frontiermodelforum.org/technical-reports/risk-taxonomy-and-thresholds/
    Source snippet

    Frontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum...

  6. Source: juncturepolicy.org
    Title: Juncture Policy Capability Threshold
    Link: https://juncturepolicy.org/glossary/terms-c/capability-threshold/
    Source snippet

    Juncture PolicyCapability Threshold - Juncture Policy...

  7. Source: frontiermodelforum.org
    Title: Frontier Model Forum Frontier AI Biosafety Thresholds
    Link: https://www.frontiermodelforum.org/issue-briefs/frontier-ai-biosafety-thresholds/
    Source snippet

    Frontier AI Biosafety Thresholds - Frontier Model ForumMay 12, 2025 — ISSUE BRIEF FRONTIER AI BIOSAFETY THRESHOLDS Posted on: 12th May 20...

    Published: May 12, 2025

  8. Source: frontiermodelforum.org
    Title: Frontier Model Forum Issue Brief: Thresholds for Frontier AI Safety Frameworks
    Link: https://www.frontiermodelforum.org/updates/issue-brief-thresholds-for-frontier-ai-safety-frameworks/
    Source snippet

    Frontier Model ForumIssue Brief: Thresholds for Frontier AI Safety Frameworks - Frontier Model ForumFebruary 7, 2025...

    Published: February 7, 2025

  9. Source: pcgamer.com
    Link: [https://www.pcgamer.com/software/ai/anthropic
    Source snippet

    Previously, under its Responsible Scaling Policy (RSP), Anthropic pledged to halt AI development should new systems reach dangerous capab...

  10. Source: aisecurityandsafety.org
    Link: https://aisecurityandsafety.org/en/glossary/frontier-ai-safety-framework/
    Source snippet

    AI Security & Safety DirectoryFrontier AI Safety Framework — AI Governance Definition & Guide | AI Safety DirectoryMarch 27, 2026...

    Published: March 27, 2026

  11. Source: aiwiki.ai
    Title: Responsible Scaling Policy | AI Wiki
    Link: https://aiwiki.ai/wiki/responsible_scaling_policy
    Source snippet

    May 7, 2026 — Responsible Scaling Policy RESPONSIBLE SCALING POLICY AI GovernanceAI PolicyAI SafetyFrontier AI 39 min read Updated May 7...

    Published: May 7, 2026

  12. Source: comparativeai.org
    Title: safety framework
    Link: https://comparativeai.org/en/companies/openai/safety-framework/
    Source snippet

    Comparative AIApril 25, 2026 — SAFETY FRAMEWORK > Snapshot: based on the Preparedness Framework v2.0 (15 April 2025), 2025–2026 blog upda...

    Published: April 25, 2026

  13. Source: frontiermodelforum.org
    Title: Managing Advanced Cyber Risks in Frontier AI Frameworks
    Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
    Source snippet

    Frontier Model ForumFebruary 13, 2026 — 1.3 CURRENT CONSENSUS ON CYBER THRESHOLDS Frontier AI frameworks use thresholds to help determine...

    Published: February 13, 2026

Additional References

  1. Source: pattrndata.io
    Link: https://www.pattrndata.io/blog/ai-governance-committee-decision-rights-charter-approve-pause-terminate
    Source snippet

    | AI Governance Questions | Pattrn DataMarch 16, 2026 — WHAT DECISION RIGHTS AND CHARTER SHOULD AN AI GOVERNANCE COMMITTEE HAVE TO APPROV...

    Published: March 16, 2026

  2. Source: oecd.ai
    Title: Risk thresholds for frontier AI: Insights from the AI Action Summit
    Link: https://oecd.ai/en/wonk/risk-thresholds-for-frontier-ai-insights-from-the-ai-action-summit
    Source snippet

    5, 2025 — RISK THRESHOLDS FOR FRONTIER AI: INSIGHTS FROM THE AI ACTION SUMMIT Eunseo Dana Choi, Dylan Rogers March 5, 2025 — Image: clock...

    Published: March 5, 2025

  3. Source: aigi.ox.ac.uk
    Title: ox.ac.uk Survey on thresholds for advanced AI systems
    Link: https://aigi.ox.ac.uk/publications/survey-on-thresholds-for-advanced-ai-systems/
    Source snippet

    on thresholds for advanced AI systems - Oxford Martin AIGIAugust 29, 2025 — Image: Survey on thresholds for advanced AI systems SURVEY ON...

    Published: August 29, 2025

  4. Source: emergentmind.com
    Title: risk thresholds for frontier ai
    Link: https://www.emergentmind.com/topics/risk-thresholds-for-frontier-ai
    Source snippet

    January 11, 2026 — RISK THRESHOLDS FOR FRONTIER AI Updated 11 January 2026 * Risk Thresholds for Frontier AI are quantitatively defined l...

    Published: January 11, 2026

  5. Source: emergentmind.com
    Title: Frontier AI Regulation
    Link: https://www.emergentmind.com/topics/frontier-ai-regulation
    Source snippet

    THRESHOLDS: RISK, CAPABILITY, AND COMPUTE Threshold-based regulation is a core strategy for scalable oversight (Koessler et al., 2024, Ra...

  6. Source: youtube.com
    Title: Christopher Painter
    Link: https://www.youtube.com/watch?v=0lWXXJ5CY4Y
    Source snippet

    The Most Important Graph in AI Right Now | Beth Barnes, CEO of METR...

  7. Source: youtube.com
    Title: The Most Important Graph in AI Right Now | Beth Barnes, CEO of METR
    Link: https://www.youtube.com/watch?v=jXtk68Kzmms
    Source snippet

    The Pattern Nobody's Talking About | AI Safety Collapse...

  8. Source: youtube.com
    Link: https://www.youtube.com/watch?v=Z19UEZHJzAg
    Source snippet

    Sovereign AI Stacks: The New Strategic National Resource...

  9. Source: youtube.com
    Title: The Pattern Nobody’s Talking About | AI Safety Collapse
    Link: https://www.youtube.com/watch?v=c5Yw4qMgj3s
    Source snippet

    By 2050 we could get "10,000 years of technological progress"...

  10. Source: papers.ssrn.com
    Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5908745
    Source snippet

    Thresholds for Managing Frontier AI Risks by Freeman Jackson:: SSRNDecember 11, 2025 — Download This Paper Open PDF in Browser Add Paper...

    Published: December 11, 2025

Topic Tree

Follow this branch

Parent topic

Shared Rules How Shared Rules Could Slow the Race

Related pages 2