Within Shared Rules
When should AI labs be forced to pause?
Capability thresholds aim to stop labs treating dangerous model abilities as private judgement calls during a competitive race.
On this page
- What counts as a dangerous capability threshold
- How common thresholds reduce racing incentives
- Why compute, capability, and risk triggers remain disputed
Page outline Jump by section
Introduction
In debates about AI doom and existential risk, one of the persistent governance challenges is deciding when AI systems become dangerous enough that developers should be required to change course — either by stopping development, imposing stricter safeguards, or limiting deployment. A core response in recent policy and safety frameworks is the use of capability thresholds: defined levels of model competence that act as triggers for stronger safety controls and potentially forced pauses in deployment choices. Capability thresholds aim to shift decisions from ad‑hoc internal judgement calls to shared, measurable triggers that make cautious behaviour a common standard, reducing incentives for any one lab to rush ahead without safeguards.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…
What Counts as a Dangerous Capability Threshold
At its simplest, a capability threshold is a pre‑specified point in a model’s abilities — such as performance on certain tasks, degree of autonomy, or ability to meaningfully assist harmful actors — that activates new governance obligations. These obligations can include deeper safety evaluations, enhanced security controls, restrictions on certain kinds of deployment, and in some frameworks, deliberate pauses in training until mitigations are in place.[Juncture Policy]juncturepolicy.orgJuncture Policy Capability ThresholdJuncture PolicyCapability Threshold - Juncture Policy…
Across the frontier AI safety frameworks published by major developers, the notion of “dangerous” varies but generally aligns with abilities that could enable large‑scale harm without substantial mitigation: assisting in biological misuse, automating sophisticated cyberattacks, or providing autonomous capability to execute harmful strategies. These thresholds are not based solely on broad resource proxies like compute but are tied to specific, identifiable capabilities that correlate with societal risk vectors.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
A typical structure seen in these frameworks involves two linked concepts:
- Enabling capability thresholds signal that a model has reached skills that could make certain harmful outcomes plausible if not mitigated.
- Deployment or residual risk thresholds then assess whether a model that has crossed an enabling threshold can be safely deployed after specified safeguards are in place.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
Because true risk — in terms of societal harm probability — is very hard to estimate for novel technologies, many frameworks currently use capability thresholds as proxies for risk triggers, balancing measurability with risk relevance.[GovAI]governance.airisk thresholds for frontier aiGovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024…
How Capability Thresholds Can Reduce Competitive Race Incentives
A central problem that drives premature deployment in the AI race is the fear of losing advantage if a lab slows while competitors keep developing or deploying. Shared capability thresholds help realign incentives by making safety obligations predictable and common rather than private policy choices. If all actors agree that, for example, ability to generate detailed actionable biological synthesis instructions, or to autonomously coordinate harmful digital operations, triggers compulsory safeguards or deployment restrictions, then no single actor can treat those capabilities as a private risk judgement.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…
This shared triggering reduces the strategic advantage of secrecy around risk signalling. Developers know that exceeding an agreed threshold will automatically elevate safety requirements, so there is less value in hiding or downplaying risky features to reach the market faster. It also helps external auditors, regulators, and governments to understand when and why higher safety controls should apply, providing a basis for consistent oversight.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
Importantly, capability thresholds are sometimes embedded in tiered safety regimes, much like biosafety levels in laboratories. Lower thresholds might require additional internal testing and oversight, while higher ones temporarily halt open deployment until concrete mitigations — such as containment protocols, external audits, or behavioural constraints — are proven effective.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…
Why Compute, Capability and Risk Triggers Remain Disputed
Although capability thresholds are gaining traction, they are not without controversy or limits.
- Measurement challenges: Evaluating whether a model truly possesses a dangerous capability is difficult. Benchmark scores or task success rates are imperfect proxies for real‑world harm potential, and capabilities may emerge unpredictably outside defined tests.[METR]metr.orgCommon Elements of Frontier AI Safety PoliciesMETRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025…
- Compute versus capability proxies: Some propose simpler triggers based on training compute (e.g., FLOPs thresholds) because they are easy to measure. But relying on resource use alone can miss small models with harmful behaviours or overflag benign systems, making compute thresholds a rougher tool than capability‑based ones.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
- Risk thresholds versus capability thresholds: True risk thresholds — explicit limits on acceptable harm probability or impact — are arguably more principled but currently too hard to compute with any confidence for unforeseen AI risks. Capability thresholds therefore serve as a stand‑in, though experts caution against treating them as direct measures of ultimate danger.[GovAI]governance.airisk thresholds for frontier aiGovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024…
- Governance implementation: Even when thresholds are defined, enforcing them across labs globally is difficult. Without binding regulation or mutual verification, voluntary frameworks risk divergence and strategic non‑compliance, especially under competitive pressure. A recent example is the shift in some companies’ safety policies away from explicit pause commitments, which raises concerns about the reliability of self‑imposed thresholds in practice.[PC Gamer]pcgamer.comPreviously, under its Responsible Scaling Policy (RSP), Anthropic pledged to halt AI development should new systems reach dangerous capab…
Implementation and Practical Limits
Capability thresholds are becoming a staple of frontier AI safety frameworks, with many of the largest developers embedding them into staged plans where crossing a threshold escalates requirements for evaluation, mitigation, security, and in some cases, deployment constraints. These frameworks typically include:
- A catalogue of hazardous capabilities identified through threat modelling;
- Evaluation protocols that test models against those capabilities;
- Decision processes that link test outcomes to governance steps;
- Escalation rules that require stronger safeguards or pauses if thresholds are crossed.[AI Security & Safety Directory]aisecurityandsafety.orgAI Security & Safety DirectoryFrontier AI Safety Framework — AI Governance Definition & Guide | AI Safety DirectoryMarch 27, 2026…
Critically, setting thresholds in advance — rather than deciding on the fly — helps create external accountability and transparency. It also allows ecosystem actors, including regulators and civil society, to understand and critique the basis for safety escalations. But because frontier capabilities evolve rapidly, thresholds must be iteratively updated, and their validity re‑tested against real‑world outcomes and new risk evidence.[GOV.UK]GOV.UKEmerging processes for frontier AI safety27, 2023…
Looking Ahead: Thresholds in Governance and Public Policy
Capability thresholds have emerged as one of the most tractable governance tools for aligning incentives toward safer deployment choices in an accelerating race. By providing shared signals about what counts as potentially dangerous, they can help make safety expectations predictable and less dependent on unilateral lab judgements. However, they are not a panacea: their effectiveness depends on measurement quality, international cooperation, and integration with broader regulatory frameworks that can enforce consequences when thresholds are crossed. As AI capabilities continue to advance, refining these thresholds and their implementation will remain a key frontier in efforts to manage existential risks from advanced systems.[Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum…
Amazon book picks
Further Reading
Books and field guides related to When should AI labs be forced to pause?. Use these as the next step if you want deeper reading beyond the article.
The Alignment Problem
Explains measurement and evaluation challenges around dangerous capabilities.
Superintelligence
Examines capability milestones that could justify intervention or pauses.
The Coming Wave
Discusses thresholds, containment, and when intervention becomes necessary.
Endnotes
-
Source: metr.org
Title: Common Elements of Frontier AI Safety Policies
Link: https://metr.org/common-elementsSource snippet
METRCommon Elements of Frontier AI Safety Policies - METRDecember 16, 2025...
Published: December 16, 2025
-
Source: governance.ai
Title: risk thresholds for frontier ai
Link: https://www.governance.ai/research-paper/risk-thresholds-for-frontier-aiSource snippet
GovAIRisk Thresholds for Frontier AI | GovAIJune 20, 2024...
Published: June 20, 2024
-
Source: GOV.UK
Title: Emerging processes for frontier AI safety
Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safetySource snippet
27, 2023...
-
Source: governance.ai
Title: coordinated pausing evaluation based scheme
Link: https://www.governance.ai/research-paper/coordinated-pausing-evaluation-based-schemeSource snippet
Coordinated Pausing: An Evaluation-Based Coordination Scheme for Frontier AI Developers | GovAISeptember 30, 2023 — COORDINATED PAUSING...
Published: September 30, 2023
-
Source: frontiermodelforum.org
Title: Frontier Model Forum Risk Taxonomy and Thresholds for Frontier AI Frameworks
Link: https://www.frontiermodelforum.org/technical-reports/risk-taxonomy-and-thresholds/Source snippet
Frontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model Forum...
-
Source: juncturepolicy.org
Title: Juncture Policy Capability Threshold
Link: https://juncturepolicy.org/glossary/terms-c/capability-threshold/Source snippet
Juncture PolicyCapability Threshold - Juncture Policy...
-
Source: frontiermodelforum.org
Title: Frontier Model Forum Frontier AI Biosafety Thresholds
Link: https://www.frontiermodelforum.org/issue-briefs/frontier-ai-biosafety-thresholds/Source snippet
Frontier AI Biosafety Thresholds - Frontier Model ForumMay 12, 2025 — ISSUE BRIEF FRONTIER AI BIOSAFETY THRESHOLDS Posted on: 12th May 20...
Published: May 12, 2025
-
Source: frontiermodelforum.org
Title: Frontier Model Forum Issue Brief: Thresholds for Frontier AI Safety Frameworks
Link: https://www.frontiermodelforum.org/updates/issue-brief-thresholds-for-frontier-ai-safety-frameworks/Source snippet
Frontier Model ForumIssue Brief: Thresholds for Frontier AI Safety Frameworks - Frontier Model ForumFebruary 7, 2025...
Published: February 7, 2025
-
Source: pcgamer.com
Link: [https://www.pcgamer.com/software/ai/anthropicSource snippet
Previously, under its Responsible Scaling Policy (RSP), Anthropic pledged to halt AI development should new systems reach dangerous capab...
-
Source: aisecurityandsafety.org
Link: https://aisecurityandsafety.org/en/glossary/frontier-ai-safety-framework/Source snippet
AI Security & Safety DirectoryFrontier AI Safety Framework — AI Governance Definition & Guide | AI Safety DirectoryMarch 27, 2026...
Published: March 27, 2026
-
Source: aiwiki.ai
Title: Responsible Scaling Policy | AI Wiki
Link: https://aiwiki.ai/wiki/responsible_scaling_policySource snippet
May 7, 2026 — Responsible Scaling Policy RESPONSIBLE SCALING POLICY AI GovernanceAI PolicyAI SafetyFrontier AI 39 min read Updated May 7...
Published: May 7, 2026
-
Source: comparativeai.org
Title: safety framework
Link: https://comparativeai.org/en/companies/openai/safety-framework/Source snippet
Comparative AIApril 25, 2026 — SAFETY FRAMEWORK > Snapshot: based on the Preparedness Framework v2.0 (15 April 2025), 2025–2026 blog upda...
Published: April 25, 2026
-
Source: frontiermodelforum.org
Title: Managing Advanced Cyber Risks in Frontier AI Frameworks
Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/Source snippet
Frontier Model ForumFebruary 13, 2026 — 1.3 CURRENT CONSENSUS ON CYBER THRESHOLDS Frontier AI frameworks use thresholds to help determine...
Published: February 13, 2026
Additional References
-
Source: pattrndata.io
Link: https://www.pattrndata.io/blog/ai-governance-committee-decision-rights-charter-approve-pause-terminateSource snippet
| AI Governance Questions | Pattrn DataMarch 16, 2026 — WHAT DECISION RIGHTS AND CHARTER SHOULD AN AI GOVERNANCE COMMITTEE HAVE TO APPROV...
Published: March 16, 2026
-
Source: oecd.ai
Title: Risk thresholds for frontier AI: Insights from the AI Action Summit
Link: https://oecd.ai/en/wonk/risk-thresholds-for-frontier-ai-insights-from-the-ai-action-summitSource snippet
5, 2025 — RISK THRESHOLDS FOR FRONTIER AI: INSIGHTS FROM THE AI ACTION SUMMIT Eunseo Dana Choi, Dylan Rogers March 5, 2025 — Image: clock...
Published: March 5, 2025
-
Source: aigi.ox.ac.uk
Title: ox.ac.uk Survey on thresholds for advanced AI systems
Link: https://aigi.ox.ac.uk/publications/survey-on-thresholds-for-advanced-ai-systems/Source snippet
on thresholds for advanced AI systems - Oxford Martin AIGIAugust 29, 2025 — Image: Survey on thresholds for advanced AI systems SURVEY ON...
Published: August 29, 2025
-
Source: emergentmind.com
Title: risk thresholds for frontier ai
Link: https://www.emergentmind.com/topics/risk-thresholds-for-frontier-aiSource snippet
January 11, 2026 — RISK THRESHOLDS FOR FRONTIER AI Updated 11 January 2026 * Risk Thresholds for Frontier AI are quantitatively defined l...
Published: January 11, 2026
-
Source: emergentmind.com
Title: Frontier AI Regulation
Link: https://www.emergentmind.com/topics/frontier-ai-regulationSource snippet
THRESHOLDS: RISK, CAPABILITY, AND COMPUTE Threshold-based regulation is a core strategy for scalable oversight (Koessler et al., 2024, Ra...
-
Source: youtube.com
Title: Christopher Painter
Link: https://www.youtube.com/watch?v=0lWXXJ5CY4YSource snippet
The Most Important Graph in AI Right Now | Beth Barnes, CEO of METR...
-
Source: youtube.com
Title: The Most Important Graph in AI Right Now | Beth Barnes, CEO of METR
Link: https://www.youtube.com/watch?v=jXtk68KzmmsSource snippet
The Pattern Nobody's Talking About | AI Safety Collapse...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=Z19UEZHJzAgSource snippet
Sovereign AI Stacks: The New Strategic National Resource...
-
Source: youtube.com
Title: The Pattern Nobody’s Talking About | AI Safety Collapse
Link: https://www.youtube.com/watch?v=c5Yw4qMgj3sSource snippet
By 2050 we could get "10,000 years of technological progress"...
-
Source: papers.ssrn.com
Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5908745Source snippet
Thresholds for Managing Frontier AI Risks by Freeman Jackson:: SSRNDecember 11, 2025 — Download This Paper Open PDF in Browser Add Paper...
Published: December 11, 2025
Topic Tree







