Within Thresholds

Why Did Policymakers Pick the 10^26 FLOP Line?

The best-known compute threshold became a policy benchmark, but its rationale and limits remain contested.

On this page

  • Origins of the benchmark
  • What the threshold was meant to catch
  • Arguments that the line is too high or too low
Preview for Why Did Policymakers Pick the 10^26 FLOP Line?

Introduction

The 10^26 FLOP threshold became one of the most influential numbers in AI governance almost overnight. It appeared in the 2023 United States AI Executive Order as the point at which developers of certain advanced AI models would face reporting requirements, and it later shaped several frontier-AI policy proposals, including California’s SB 1047. The number was never intended to be a claim that systems become dangerous exactly at 10^26 floating-point operations (FLOPs). Instead, it was chosen as a practical policy benchmark: high enough to capture only the most advanced training runs, but low enough to provide oversight before potentially dangerous capabilities emerged. [Morrison Foerster]mofo.comMorrison FoersterThe AI Executive Order: Presidential Authority for…November 7, 2023 — 7 Nov 2023 — Any AI model that was trained: usi…Published: November 7, 2023 [stanford]hai.stanford.edudecoding white house ai executive orders achievementsStanford HAIDecoding the White House AI Executive Order's Achievements2 Nov 2023 — Concretely, the default thresholds for compliance are… Within debates about AI doom and existential risk, the threshold matters because it represents an attempt to create an early-warning system. Rather than waiting until a model demonstrates dangerous autonomy, deceptive behaviour, or other capabilities associated with loss-of-control scenarios, regulators use training compute as a proxy for identifying projects that deserve closer scrutiny. Whether 10^26 FLOPs is the right proxy remains heavily contested. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

10 26 FLOPs illustration 1

Origins of the Benchmark

The most important historical fact about the 10^26 figure is that it was not derived from a precise scientific boundary between safe and unsafe AI. Policymakers were trying to solve a more practical problem: how to identify a small set of frontier training runs without regulating the entire AI industry. The Biden administration’s Executive Order defined certain “dual-use foundation models” partly by whether they were trained using more than 10^26 integer or floating-point operations. Developers crossing that threshold became subject to reporting obligations concerning safety testing, cybersecurity measures, and related information. [Morrison Foerster]mofo.comMorrison FoersterThe AI Executive Order: Presidential Authority for…November 7, 2023 — 7 Nov 2023 — Any AI model that was trained: usi…Published: November 7, 2023 [Federal Register]federalregister.govEstablishment of Reporting Requirements for the…11 Sept 2024 — A dual-use foundation model training run triggers reporting requirement…

The choice reflected the state of the industry in 2023. At the time, only a handful of organisations appeared capable of training models near that scale. Policymakers wanted a threshold that would capture frontier systems such as the successors to GPT-4 while excluding the vast majority of academic and commercial AI development. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

This was a deliberate design choice. A reporting threshold that covered thousands of ordinary models would be difficult to administer and politically difficult to justify. A threshold that covered only the largest and most expensive training runs was easier to explain as a targeted intervention aimed at potentially transformative systems. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

What the Threshold Was Meant to Catch

The key idea behind the 10^26 benchmark was not that compute itself is dangerous. Rather, policymakers and many AI safety researchers believed that training compute was one of the best available predictors of frontier capabilities.

Historically, larger training runs have often produced more capable models. The relationship is imperfect, but advances in language modelling during the 2018–2024 period were strongly associated with increasing amounts of compute. Because compute can be measured before deployment, it offers an earlier signal than waiting for dangerous capabilities to appear in public. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

For AI doom advocates, this matters because many existential-risk arguments focus on systems becoming dangerous before society fully understands them. If future systems acquire advanced autonomy, strategic planning abilities, or the capacity to deceive human operators, regulators may want oversight mechanisms to activate during training rather than after deployment. Compute thresholds are one attempt to create that upstream trigger. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

Importantly, crossing 10^26 FLOPs was never supposed to mean a model posed an existential threat. The threshold was intended to identify projects worthy of additional attention, evaluation, and reporting. In policy terms, it functioned as a screening mechanism rather than a declaration of danger. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

Why Not 10^25 or 10^27?

One reason the 10^26 figure attracted so much attention is that alternative thresholds were entirely plausible.

The European Union eventually adopted a lower benchmark of 10^25 FLOPs for certain general-purpose AI models deemed capable of posing systemic risks. This immediately highlighted the fact that no universally accepted scientific threshold existed. Different jurisdictions looked at broadly similar evidence and chose different trigger points. [Fenwick]fenwick.cominteresting developments for regulatory thresholds of ai computeFenwickTechnological Challenges for Regulatory Thresholds of AI…20 Jun 2024 — Article 51 of the EU AI Act specifies 10^25 floating point… [Responsible AI Platform]aiactblog.nlResponsible AI PlatformFLOP Threshold (10^25): Definition & Explanation | EU AI ActFLOP (Floating Point Operations) measures the computin…

A lower threshold has obvious advantages. It captures more models, provides earlier oversight, and reduces the risk that a dangerous capability appears just below the reporting line. Critics of the US approach argued that waiting until 10^26 FLOPs could mean oversight begins too late. [Fenwick]fenwick.cominteresting developments for regulatory thresholds of ai computeFenwickTechnological Challenges for Regulatory Thresholds of AI…20 Jun 2024 — Article 51 of the EU AI Act specifies 10^25 floating point… [Import AI]jack-clark.netwhat does 1025 versus 1026 meanWhat does 10^25 versus 10^26 mean?28 Mar 2024 — In Europe, the recent EU AI Act says that general-purpose systems trained with 10^25 FLOP…

On the other hand, a much lower threshold would have increased the number of covered models dramatically. Policymakers seeking a narrowly targeted reporting regime often preferred a higher number because it concentrated obligations on a small group of frontier developers. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

The choice therefore reflected a policy trade-off rather than a scientific discovery. Lower thresholds increase precaution. Higher thresholds reduce regulatory burden and focus attention on the most expensive and ambitious training runs.

10 26 FLOPs illustration 2

Arguments That the Line Is Too High

Many researchers and AI-risk advocates have argued that 10^26 FLOPs may be too permissive.

One concern is that capabilities do not always scale smoothly. A threshold based on training compute assumes that the most dangerous systems will also be among the most computationally expensive. Yet AI history contains examples where algorithmic improvements delivered substantial capability gains without proportional increases in compute. If efficiency improvements continue, future systems could achieve frontier-level performance using less compute than policymakers expected. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation

Another criticism is that dangerous capabilities might emerge below the threshold. A model trained with less than 10^26 FLOPs could still assist in cyber operations, biological research, persuasion campaigns, or autonomous decision-making. If the goal is preventing catastrophic outcomes, critics argue that a single numerical threshold risks creating blind spots. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation

Forecasting work has also suggested that the number of models exceeding fixed compute thresholds may grow rapidly over time. A benchmark that once captured only a handful of systems may eventually capture dozens or hundreds, potentially requiring frequent revision. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation

Arguments That the Line Is Too Low

Others have made the opposite argument.

Some industry critics viewed 10^26 FLOPs as an arbitrary line that risked imposing obligations on systems that had not demonstrated genuinely dangerous capabilities. From this perspective, training compute is only a rough proxy, and regulation should focus more directly on what models can actually do. [Cohere]cohere.comCohereThe Limits of ThresholdsProminent AI governance frameworks around the world have specified thresholds based on the amount of comput…

There is also concern that compute-based rules can become outdated. A threshold chosen when frontier training runs were rare may eventually apply to a much larger segment of the industry as hardware improves and costs fall. Developers who see little connection between compute and existential risk often argue that capability evaluations should matter more than raw training expenditure. [Cohere]cohere.comCohereThe Limits of ThresholdsProminent AI governance frameworks around the world have specified thresholds based on the amount of comput…

From this perspective, a fixed numerical threshold may regulate model size rather than genuine danger.

10 26 FLOPs illustration 3

What the Debate Reveals About AI Doom

The controversy surrounding 10^26 FLOPs illustrates a broader tension within AI doom debates.

Many existential-risk arguments depend on acting before dangerous capabilities become obvious. That creates demand for measurable indicators that can trigger oversight early. Training compute is attractive because it is quantifiable, difficult to fake at large scales, and historically correlated with frontier performance. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

Yet the very need for a proxy reveals the uncertainty at the heart of the discussion. No one knows exactly which future systems, if any, might create the kinds of loss-of-control scenarios reflected in high p(doom) estimates. The 10^26 threshold was therefore not a prediction about where existential risk begins. It was an attempt to identify where precaution should start. [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

That distinction explains both the benchmark’s influence and its critics. Supporters see it as a practical early-warning mechanism for potentially transformative AI. Critics see it as an inevitably imperfect line drawn through a rapidly changing technological landscape. Both sides largely agree on one point: if compute thresholds remain part of AI governance, they will probably need periodic revision as models, hardware, and training techniques evolve. [arXiv]arxiv.orgarXiv Training Compute Thresholds: Features and Functions in AI RegulationarXiv Training Compute Thresholds: Features and Functions in AI Regulation [Institute for Law & AI]law-ai.orgInstitute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi…

Amazon book picks

Further Reading

Books and field guides related to Why Did Policymakers Pick the 10^26 FLOP Line?. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: hai.stanford.edu
    Title: decoding white house ai executive orders achievements
    Link: https://hai.stanford.edu/news/decoding-white-house-ai-executive-orders-achievements
    Source snippet

    Stanford HAIDecoding the White House AI Executive Order's Achievements2 Nov 2023 — Concretely, the default thresholds for compliance are...

  2. Source: law-ai.org
    Link: https://law-ai.org/the-role-of-compute-thresholds-for-ai-governance/
    Source snippet

    Institute for Law & AIThe Role of Compute Thresholds for AI GovernanceThis article discusses the role of training compute thresholds, whi...

  3. Source: arxiv.org
    Title: arXiv Training Compute Thresholds: Features and Functions in AI Regulation
    Link: https://arxiv.org/abs/2405.10799

  4. Source: arxiv.org
    Title: arXiv Defending Compute Thresholds Against Legal [Loopholes]({{ ‘loopholes/’ | relative_url }})
    Link: https://arxiv.org/abs/2502.00003

  5. Source: fenwick.com
    Title: interesting developments for regulatory thresholds of ai compute
    Link: https://www.fenwick.com/insights/publications/interesting-developments-for-regulatory-thresholds-of-ai-compute
    Source snippet

    FenwickTechnological Challenges for Regulatory Thresholds of AI…20 Jun 2024 — Article 51 of the EU AI Act specifies 10^25 floating point...

  6. Source: jack-clark.net
    Title: what does 1025 versus 1026 mean
    Link: https://jack-clark.net/2024/03/28/what-does-1025-versus-1026-mean/
    Source snippet

    What does 10^25 versus 10^26 mean?28 Mar 2024 — In Europe, the recent EU AI Act says that general-purpose systems trained with 10^25 FLOP...

  7. Source: cohere.com
    Link: https://cohere.com/research/papers/The-Limits-of-Thresholds.pdf
    Source snippet

    CohereThe Limits of ThresholdsProminent AI governance frameworks around the world have specified thresholds based on the amount of comput...

  8. Source: arxiv.org
    Link: https://arxiv.org/abs/2504.16138
    Source snippet

    arXivTrends in Frontier AI Model Count: A Forecast to 2028April 21, 2025...

    Published: April 21, 2025

  9. Source: arxiv.org
    Link: https://arxiv.org/pdf/2502.00003
    Source snippet

    Defending Compute Thresholds Against Legal Loopholesby M Pistillo · 2025 · Cited by 1 — At the time of writing, less than 10 developers a...

  10. Source: mofo.com
    Link: https://www.mofo.com/resources/insights/231107-the-ai-executive-order-presidential-authority
    Source snippet

    Morrison FoersterThe AI Executive Order: Presidential Authority for...November 7, 2023 — 7 Nov 2023 — Any AI model that was trained: usi...

    Published: November 7, 2023

  11. Source: federalregister.gov
    Link: [https://www.federalregister.gov/documents/2024/09/11/2024-20529/establishment-of-reporting-requirements-for-the-development-of-advanced-artificial
    Source snippet

    Establishment of Reporting Requirements for the...11 Sept 2024 — A dual-use foundation model training run triggers reporting requirement...

  12. Source: aiactblog.nl
    Link: https://www.aiactblog.nl/en/glossary/flop-threshold
    Source snippet

    Responsible AI PlatformFLOP Threshold (10^25): Definition & Explanation | EU AI ActFLOP (Floating Point Operations) measures the computin...

  13. Source: law-ai.org
    Link: https://law-ai.org/wp-content/uploads/2024/11/The-Role-of-Compute-Thresholds-for-AI-Governance.pdf
    Source snippet

    1e25 FLOP compute threshold “should be adjusted over time to reflect technological and industrial.Read more...

  14. Source: dlapiper.com
    Title: californias sb 1047
    Link: https://www.dlapiper.com/insights/publications/2024/02/californias-sb-1047
    Source snippet

    California's SB-1047: Understanding the Safe and Secure...Feb 20, 2024 — We describe the current legal landscape related to AI and how S...

Additional References

  1. Source: medium.com
    Link: https://medium.com/%40julian.burns50/breaching-the-eu-ai-act-model-power-threshold-with-macbooks-9725b223176d
    Source snippet

    AI Safety: Inside the EU '10²⁵' Flop Model Size LimitsEU law classifies any Model accumulating > 10²⁵ FLOPs of Compute resources, as a sy...

  2. Source: linkedin.com
    Link: https://www.linkedin.com/posts/danielflorian_what-does-1025-versus-1026-mean-activity-7182809393292292096-Emo2
    Source snippet

    What does 10^25 versus 10^26 mean? | Daniel FlorianWhat a difference a FLOP makes: In the final months of negotiation of the EU AI Act, t...

  3. Source: mayerbrown.com
    Link: https://www.mayerbrown.com/en/insights/publications/2024/09/us-department-of-commerce-issues-proposal-to-require-reporting-development-of-advanced-ai-models-and-computer-clusters
    Source snippet

    US Department of Commerce Issues Proposal to Require...17 Sept 2024 — A proposed rule to create a [mandatory]({{ 'safety-checks/' | relative_url }}) reporting requirement for ar...

  4. Source: jolt.law.harvard.edu
    Link: https://jolt.law.harvard.edu/digest/beyond-flops-shortcomings-of-flops-as-a-model-classification-metric-in-ai-regulation-1
    Source snippet

    AI Act's significant obligations and penalties if the cumulative amount of computation used for its training is greater than 10^25 FLOPs...

  5. Source: hsfkramer.com
    Title: california passes broad safety and transparency law for frontier ai developers
    Link: https://www.hsfkramer.com/insights/2025-10/california-passes-broad-safety-and-transparency-law-for-frontier-ai-developers
    Source snippet

    California passes broad safety and transparency law for '...Oct 7, 2025 — The TFAIA was narrowed from SB 1047 to target developers (rath...

  6. Source: fisherphillips.com
    Title: california lawmakers pass landmark ai transparency law for frontier models
    Link: https://www.fisherphillips.com/en/insights/insights/california-lawmakers-pass-landmark-ai-transparency-law-for-frontier-models
    Source snippet

    California Lawmakers Pass Landmark AI Transparency...Sep 15, 2025 — Focuses narrowly on “large frontier developers” (>$500M in annual re...

  7. Source: Wikipedia
    Title: Safe and Secure Innovation for Frontier Artificial Intelligence Models Act
    Link: https://en.wikipedia.org/wiki/Safe_and_Secure_Innovation_for_Frontier_Artificial_Intelligence_Models_Act
    Source snippet

    Safe and Secure Innovation for Frontier Artificial...The Safe and Secure Innovation for Frontier Artificial Intelligence Models Act...

  8. Source: reddit.com
    Title: Biden Executive Order regulates VERY large models Basically
    Link: https://www.reddit.com/r/LocalLLaMA/comments/17k7obo/biden_executive_order_regulates_very_large_models/
    Source snippet

    Biden Executive Order regulates VERY large modelsBasically - "any model trained with ~28M H100 hours, which is around $50M USD or - any c...

  9. Source: bakerbotts.com
    Title: ca new regulations for developers of frontier ai models
    Link: https://www.bakerbotts.com/thought-leadership/publications/2025/october/ca-new-regulations-for-developers-of-frontier-ai-models
    Source snippet

    California's New Regulations for Developers of Frontier AI...Oct 22, 2025 — A recent analysis projected that there may be around 30 such...

  10. Source: lesswrong.com
    Link: https://www.lesswrong.com/posts/SoEbZKhoaXHfaGD48/can-efficiency-adjustable-reporting-thresholds-close-a
    Source snippet

    Can efficiency-adjustable reporting thresholds close a...11 Jun 2024 — If a training run exceeds 1026 floating point operations or, for...

Topic Tree

Follow this branch

Parent topic

Thresholds When Should AI Training Runs Trigger Oversight?

Related pages 2