Within Governance
When Should AI Training Runs Trigger Oversight?
Proposals to report large training runs rely on measurable compute thresholds, but experts disagree on where those thresholds should sit.
On this page
- How Compute Thresholds Are Defined
- Reporting Requirements and Audits
- Arguments Over Evasion and Effectiveness
Page outline Jump by section
Introduction
One of the most widely discussed ideas in compute governance is the use of training-compute thresholds as a trigger for oversight. Instead of waiting until an AI system demonstrates dangerous capabilities, regulators would require organisations to report, document, or review training runs once they exceed a specified level of computational power. The underlying logic is simple: if the most capable frontier systems require unusually large amounts of compute to train, then compute can serve as an early warning signal. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceFebruary 20, 2025 — This article discusses the role of training compute thresholds, which use training compute to determine which potenti…
Within debates about AI doom and existential risk, these thresholds matter because they are intended to identify projects that might eventually create systems capable of large-scale misuse, dangerous autonomy, or loss-of-control scenarios. Yet there is no consensus on where the thresholds should be set, how often they should be updated, or whether sophisticated developers could eventually bypass them. The debate is not about whether compute can be measured; it is about whether a measurable quantity is a sufficiently reliable proxy for future danger. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
How Compute Thresholds Are Defined
Most proposals define thresholds using the total number of computational operations used during training, typically measured in floating-point operations (FLOPs). Training runs above a specified FLOP level would automatically trigger additional obligations such as government notification, independent review, safety evaluations, or auditing. [Institute for Law & AI]law-ai.orgInstitute for Law & AILegal Considerations for Defining “Frontier Model”September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i…
The most influential threshold to date emerged from the United States government’s 2023 AI Executive Order. It established reporting requirements for certain models trained using more than 10^26 floating-point or integer operations. A lower threshold of 10^23 operations was applied to some biological-sequence models because of concerns about biotechnology-related risks. [Federal Register]federalregister.govFederal RegisterSafe, Secure, and Trustworthy Development and Use of…November 1, 2023 — 1 Nov 2023 — Such reports shall include, at a… [Morrison Foerster]mofo.com231107 the ai executive order presidential authorityMorrison FoersterThe AI Executive Order: Presidential Authority for…7 Nov 2023 — Any AI model that was trained: using a quantity of co…
The same 10^26 FLOP benchmark later appeared in several frontier-model policy proposals, including California’s SB 1047. In these frameworks, crossing the threshold did not automatically imply that a model was dangerous. Instead, it created a presumption that the model was advanced enough to justify closer scrutiny. [Morgan Lewis]morganlewis.comcomputing power of three times 10^25 integer or FLOP costing over $10 million.[1] This is the same computing threshold as set in the Bide… [2orrick.com]orrick.commodels under SB-1047 sets a high threshold for regulation.Read more…
Supporters of compute thresholds often point to several practical advantages: [epoch.ai]epoch.aimodel counts compute thresholdsHow many AI models will exceed compute thresholds?30 May 2025 — We project how many notable AI models will exceed training compute thresh…
- Compute is measurable and can be estimated relatively consistently.
- Training compute can be assessed before deployment, allowing earlier intervention.
- Large training runs generally require specialised hardware, making them harder to hide than software development alone.
- Training compute has historically correlated with frontier-level capability growth. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation [Institute]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceFebruary 20, 2025 — This article discusses the role of training compute thresholds, which use training compute to determine which potenti… for Law & AI
This does not mean compute perfectly predicts capability. Rather, advocates view it as a screening tool that identifies projects deserving additional attention. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
Why AI Doom Advocates Care About Thresholds
For people concerned about existential risk, the appeal of compute thresholds is that they operate upstream of dangerous outcomes.
A central concern in many AI doom scenarios is that by the time a system clearly demonstrates dangerous capabilities, it may already be deeply integrated into critical infrastructure, military systems, scientific research, or economic decision-making. Oversight triggered during training could create opportunities for evaluations and safety testing before deployment. [Institute for Law & AI]law-ai.orgInstitute for Law & AILegal Considerations for Defining “Frontier Model”September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i…
The argument is partly strategic. Governments may struggle to define “dangerous AI” in advance because capabilities can emerge unexpectedly. Compute, by contrast, is a concrete quantity that can be monitored. As a result, some governance researchers argue that thresholds provide a practical mechanism for identifying frontier projects even when policymakers cannot accurately predict which specific capabilities will emerge. [Institute for Law & AI]law-ai.orgInstitute for Law & AILegal Considerations for Defining “Frontier Model”September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i…
Critics respond that the relationship between training scale and existential risk is much less certain than proponents sometimes imply. A model trained with somewhat less compute could still prove dangerous, while a model trained with enormous compute might not create the feared risks. The connection between compute and catastrophe remains inferential rather than directly demonstrated. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
Reporting Requirements and Audits
Most compute-threshold proposals do not advocate automatic bans on large training runs. Instead, they use thresholds to activate additional oversight requirements.
Common proposals include:
- Mandatory notification to a government regulator or AI safety authority before or after a qualifying training run.
- Safety evaluations designed to test for dangerous capabilities.
- Risk assessments documenting foreseeable catastrophic-use concerns.
- Incident reporting if significant safety failures are discovered.
- Independent auditing of compute records and training procedures. [Oxford Martin AIGI]aigi.ox.ac.ukSurvey on thresholds for advanced AI systems 1Oxford Martin AIGISURVEY ON THRESHOLDS FOR ADVANCED AI SYSTEMSAugust 29, 2025 — by J Schuett · 2025 · Cited by 3 — “If training compute t… 2arXiv
This distinction is important. The strongest advocates of compute governance often present thresholds as a trigger rather than a final decision rule. Crossing the threshold does not automatically prove a model is dangerous; it merely initiates a review process. The subsequent evaluations are intended to determine whether additional safeguards are warranted. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
This approach attempts to solve a practical governance problem. Regulators may lack the capacity to examine every AI project. Thresholds provide a filtering mechanism that concentrates attention on a relatively small number of unusually large training efforts. [Institute for Law & AI]law-ai.orgInstitute for Law & AILegal Considerations for Defining “Frontier Model”September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i…
Where Should the Threshold Be Set?
The hardest policy question is not whether thresholds are useful, but where they should sit.
If thresholds are set too low, regulators may become overwhelmed with notifications and audits. Oversight resources could be spread across many systems that pose little existential concern. If thresholds are set too high, potentially important frontier projects may escape scrutiny altogether. [Governor of California]gov.ca.govJune 17 2025 – The California Report on Frontier AI PolicyGovernor of CaliforniaTHE CALIFORNIA REPORT ON FRONTIER AI POLICYJune 17, 2025 — 17 Jun 2025 — Noteworthy examples of compute thresholds…
The 10^26 FLOP benchmark was partly chosen because it was above the training compute used by many leading models when the rule was designed. Policymakers hoped it would focus attention on systems pushing beyond the existing frontier. [Institute for Law & AI]law-ai.orgInstitute for Law & AILegal Considerations for Defining “Frontier Model”September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i…
However, technological progress quickly creates pressure on any fixed threshold. Models that seem exceptional today may become routine within a few years. Research forecasting published by Epoch AI projects that the number of notable models exceeding 10^26 FLOPs could rise dramatically over the second half of the decade, potentially transforming what was once a rare threshold into a relatively common one. [Epoch AI]epoch.aimodel counts compute thresholdsHow many AI models will exceed compute thresholds?30 May 2025 — We project how many notable AI models will exceed training compute thresh…
This creates a moving-target problem. A threshold that is effective in one year may become obsolete in the next. Many analysts therefore argue that thresholds should be regularly updated rather than permanently fixed in law. [Governor of California]gov.ca.govJune 17 2025 – The California Report on Frontier AI PolicyGovernor of CaliforniaTHE CALIFORNIA REPORT ON FRONTIER AI POLICYJune 17, 2025 — 17 Jun 2025 — Noteworthy examples of compute thresholds…
Arguments Over Evasion and Effectiveness
The most significant criticism of compute thresholds is that they may become easier to evade over time.
AI researchers continuously discover techniques that improve capability without proportionally increasing training compute. Better algorithms, model reuse, fine-tuning methods, synthetic data generation, and inference-time techniques can all produce stronger systems while reducing the amount of compute needed during training. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
This raises a challenge for threshold-based regulation. If policymakers assume that high capability always requires high training compute, developers may eventually find ways to remain below regulatory thresholds while still producing highly capable systems. Researchers have explicitly identified fine-tuning, model expansion, and reuse of existing frontier models as potential loopholes in threshold-based frameworks. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
Another concern is that thresholds create artificial boundaries. Risk generally changes gradually, but regulation often creates a sharp distinction between systems just below and just above a numerical cutoff. A model trained at 9.9 × 10^25 FLOPs may not be meaningfully different from one trained at 1.01 × 10^26 FLOPs, yet one might trigger extensive requirements while the other does not. [Governor of California]gov.ca.govJune 17 2025 – The California Report on Frontier AI PolicyGovernor of CaliforniaTHE CALIFORNIA REPORT ON FRONTIER AI POLICYJune 17, 2025 — 17 Jun 2025 — Noteworthy examples of compute thresholds…
Supporters respond that every regulatory system requires practical thresholds somewhere. The relevant question is not whether thresholds are perfect, but whether they are more workable than alternatives. Compared with vague capability-based definitions, compute remains relatively objective, measurable, and auditable. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
What Thresholds Can and Cannot Do
The strongest case for compute reporting thresholds is not that they solve AI doom risk on their own. Rather, they provide an administrative mechanism for identifying frontier projects before deployment and directing limited oversight resources toward the systems most likely to deserve attention. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
The strongest criticism is that compute is only a proxy. Future breakthroughs could weaken the link between training compute and capability, while legal thresholds may struggle to keep pace with changing technology. A threshold can identify some frontier systems, but it cannot reliably determine whether a particular model is aligned, controllable, deceptive, or existentially dangerous. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
As a result, a growing consensus among governance researchers is that compute thresholds are most useful as safety triggers rather than as complete safety standards. Their role is to determine when reporting, auditing, evaluation, and scrutiny should begin. The more difficult question—whether a given system actually poses catastrophic or existential risk—still requires direct assessment of capabilities and behaviour rather than reliance on compute alone. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation
Amazon book picks
Further Reading
Books and field guides related to When Should AI Training Runs Trigger Oversight?. Use these as the next step if you want deeper reading beyond the article.
The Alignment Problem
Covers measurement, evaluation, and governance challenges surrounding advanced machine learning systems.
Human Compatible
Explores why advanced AI systems may require governance, monitoring, and safety constraints.
Superintelligence
Provides foundational arguments motivating oversight of increasingly capable AI development.
The Coming Wave
Directly discusses oversight mechanisms for powerful AI systems and governance responses to frontier model development.
Endnotes
-
Source: law-ai.org
Title: Institute for Law & AIThe Role of Compute Thresholds for AI Governance
Link: https://law-ai.org/the-role-of-compute-thresholds-for-ai-governance/Source snippet
February 20, 2025 — This article discusses the role of training compute thresholds, which use training compute to determine which potenti...
Published: February 20, 2025
-
Source: arxiv.org
Title: arXiv Training Compute Thresholds: Features and Functions in AI Regulation
Link: https://arxiv.org/abs/2405.10799 -
Source: arxiv.org
Title: arXiv On the Limitations of Compute Thresholds as a Governance Strategy
Link: https://arxiv.org/abs/2407.05694 -
Source: orrick.com
Link: https://www.orrick.com/en/Insights/2024/07/California-Looks-to-Regulate-Cutting-Edge-Frontier-AI-Models-5-Things-to-Know-About-SB1047Source snippet
models under SB-1047 sets a high threshold for regulation.Read more...
-
Source: cdn.governance.ai
Title: Computing Power and the Governance of AI
Link: https://cdn.governance.ai/Computing_Power_and_the_Governance_of_AI.pdfSource snippet
Computing Power and the Governance of Artificial...14 Feb 2024 — Computing power, or "compute," is crucial for the development and deplo...
-
Source: law-ai.org
Title: Institute for Law & AILegal Considerations for Defining “Frontier Model”
Link: https://law-ai.org/frontier-model-definitions/Source snippet
September 30, 2024 — 9 Sept 2024 — This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP i...
Published: September 30, 2024
-
Source: epoch.ai
Title: model counts compute thresholds
Link: https://epoch.ai/publications/model-counts-compute-thresholdsSource snippet
How many AI models will exceed compute thresholds?30 May 2025 — We project how many notable AI models will exceed training compute thresh...
Published: May 2025
-
Source: arxiv.org
Title: arXiv Defending Compute Thresholds Against Legal Loopholes
Link: https://arxiv.org/abs/2502.00003Source snippet
arXivDefending Compute Thresholds Against Legal LoopholesJanuary 3, 2025...
Published: January 3, 2025
-
Source: arxiv.org
Link: https://arxiv.org/html/2405.10799v2Source snippet
Training Compute Thresholds: Features and Functions in...6 Aug 2024 — We argue that training compute currently is the most suitable metr...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2502.00003Source snippet
Under the vetoed California Senate Bill 1047, the definition of 'covered model' would have included AI models...Read more...
-
Source: federalregister.gov
Link: https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligenceSource snippet
Federal RegisterSafe, Secure, and Trustworthy Development and Use of...November 1, 2023 — 1 Nov 2023 — Such reports shall include, at a...
Published: November 1, 2023
-
Source: mofo.com
Title: 231107 the ai executive order presidential authority
Link: https://www.mofo.com/resources/insights/231107-the-ai-executive-order-presidential-authoritySource snippet
Morrison FoersterThe AI Executive Order: Presidential Authority for...7 Nov 2023 — Any AI model that was trained: using a quantity of co...
-
Source: morganlewis.com
Link: https://www.morganlewis.com/pubs/2024/08/californias-sb-1047-would-impose-new-safety-requirements-for-developers-of-large-scale-ai-modelsSource snippet
computing power of three times 10^25 integer or FLOP costing over $10 million.[1] This is the same computing threshold as set in the Bide...
-
Source: aigi.ox.ac.uk
Title: Survey on thresholds for advanced AI systems 1
Link: https://aigi.ox.ac.uk/wp-content/uploads/2025/08/Survey_on_thresholds_for_advanced_AI_systems_1.pdfSource snippet
Oxford Martin AIGISURVEY ON THRESHOLDS FOR ADVANCED AI SYSTEMSAugust 29, 2025 — by J Schuett · 2025 · Cited by 3 — “If training compute t...
Published: August 29, 2025
-
Source: gov.ca.gov
Title: June 17 2025 – The California Report on Frontier AI Policy
Link: https://www.gov.ca.gov/wp-content/uploads/2025/06/June-17-2025-%E2%80%93-The-California-Report-on-Frontier-AI-Policy.pdfSource snippet
Governor of CaliforniaTHE CALIFORNIA REPORT ON FRONTIER AI POLICYJune 17, 2025 — 17 Jun 2025 — Noteworthy examples of compute thresholds...
Published: June 17, 2025
-
Source: aws.amazon.com
Link: https://aws.amazon.com/what-is/compute/Source snippet
Enterprise Cloud Computing ExplainedIt is a generic term used to reference processing power, memory, networking, storage, and other resou...
-
Source: hpe.com
Link: https://www.hpe.com/uk/en/what-is/compute.htmlSource snippet
What is Compute? | Glossary31 Oct 2025 — Compute refers to the ability of a computer system to process and execute tasks, calculations, a...
Additional References
-
Source: merriam-webster.com
Link: https://www.merriam-webster.com/dictionary/computeSource snippet
COMPUTE Definition & Meaning1. to make calculation: reckon They compute by weight in selling grain. 2. to use a computer 3. informal: t...
-
Source: reddit.com
Link: https://www.reddit.com/r/explainlikeimfive/comments/hiqdpx/eli5_what_exactly_is_compute/Source snippet
eli5: What exactly is compute?: r/explainlikeimfiveI am curious to understand more what compute means in reference to AWS services. What...
-
Source: mayerbrown.com
Link: https://www.mayerbrown.com/en/insights/publications/2024/09/us-department-of-commerce-issues-proposal-to-require-reporting-development-of-advanced-ai-models-and-computer-clustersSource snippet
US Department of Commerce Issues Proposal to Require...17 Sept 2024 — “Conducting any AI model training run using more than 10^26 comput...
-
Source: GOV.UK
Link: https://www.gov.uk/government/publications/ai-safety-institute-overview/introducing-the-ai-safety-instituteSource snippet
the AI Safety InstituteThe government is committed to supporting a thriving compute environment that maintains the UK's position as a lea...
-
Source: reddit.com
Link: https://www.reddit.com/r/LocalLLaMA/comments/17k7obo/biden_executive_order_regulates_very_large_models/ -
Source: paulweiss.com
Link: https://www.paulweiss.com/insights/client-memos/commerce-proposes-rule-to-collect-frontier-ai-and-computing-cluster-data-for-national-security-purposesSource snippet
Commerce Proposes Rule to Collect Frontier AI and...13 Sept 2024 — [7] Models trained on primarily biological sequence data, but at the...
-
Source: lw.com
Link: https://www.lw.com/admin/upload/SiteAttachments/President-Bidens-Executive-Order-on-AI-Initial-Analysis-of-Private-Sector-Implications.pdfSource snippet
President Biden's Executive Order on Artificial Intelligence30 Oct 2023 — Until then, a model shall be considered to have potential for s...
-
Source: santafe.edu
Title: what does it mean to compute new paper by sfi researchers points to an answer
Link: https://www.santafe.edu/news-center/news/what-does-it-mean-to-compute-new-paper-by-sfi-researchers-points-to-an-answerSource snippet
What does it mean to compute?25 Feb 2026 — It also provides a way to define computation specifically. “We can say that some system can co...
-
Source: engineadvocacyfoundation.medium.com
Title: ai essentials what is compute and how is it measured 36951f78485a
Link: https://engineadvocacyfoundation.medium.com/ai-essentials-what-is-compute-and-how-is-it-measured-36951f78485aSource snippet
Essentials: What is compute and how is it measured?Compute refers to the hardware resources that make AI models work, allowing them to tr...
-
Source: fenwick.com
Title: interesting developments for regulatory thresholds of [ai compute]({{ ‘compute-kyc/’ | relative_url }})
Link: https://www.fenwick.com/insights/publications/interesting-developments-for-regulatory-thresholds-of-ai-computeSource snippet
Technological Challenges for Regulatory Thresholds of AI…20 Jun 2024 — This comports with California's proposed SB 1047, which asserts th...
Topic Tree





