Within Safety Checks

How Pre Training Hazard Modelling Aims to Prevent Catastrophic AI Risks

This page explains how developers identify extreme risks and mitigation plans before starting large AI training runs.

On this page

  • Threat identification for high severity AI capabilities
  • Estimating likelihood and impact of emergent risks
  • Designing mitigation strategies before training begins
Preview for How Pre Training Hazard Modelling Aims to Prevent Catastrophic AI Risks

Introduction

In the context of AI systems that could one day pose existential dangers—through loss of control, deceptive behaviour, or autonomous pursuit of unintended goals—developers and regulators are increasingly talking about pre‑training hazard modelling. This term refers to the structured analysis and forecasting that happens before a major training run of a powerful AI model begins: identifying the extreme risks that such a run might produce and planning protections or limits in advance. In contrast to testing only after a system is built, pre‑training hazard modelling is about asking what could go wrong if this model becomes more capable than expected? and how do we mitigate those hazards before it exists? This proactive approach aims to catch catastrophic dangers early, and is a central part of proposals for mandatory safety evaluations that could be required before training large AI systems.

Pre Training Risks illustration 1

Threat identification for high‑severity AI capabilities

A core part of pre‑training hazard modelling is systematically anticipating the ways advanced AI could be harmful. Borrowing methods long used in safety‑critical fields like aviation or nuclear power, developers map out causal pathways from future model capabilities to extreme harms. This “threat modelling” isn’t about ordinary software bugs; it concerns scenarios where a model could be misused in ways that are hard to reverse, scale rapidly, or cause widespread disruption.

Common hazard domains flagged by frontier safety frameworks include: [emergentmind.com]emergentmind.comFrontier Model Safety FrameworkFebruary 3, 2026 — FRONTIER MODEL SAFETY FRAMEWORK Updated 3 February 2026 * FMSF is a comprehensive safety framework that systematically…Published: February 3, 2026

  • Dual‑use assistance such as guidance on creating biological threats or chemical agents, where AI could lower barriers to misuse. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
  • Advanced cyber threats, where an AI could help discover or exploit vulnerabilities in critical infrastructure. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
  • Autonomous or recursive capabilities, such as self‑replication, automated research or planning, and emerging “agentic” behaviour that might pursue objectives not aligned with human intentions. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
  • Strategic deception, where models could behave differently when being evaluated versus in deployment or misuse scenarios (sometimes called “scheming” in risk literature). [arXiv]arxiv.orgarXiv Towards evaluations-based safety cases for AI schemingarXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024…Published: October 29, 2024

In well‑developed frameworks, threat identification is not a casual brainstorm but a systematic analysis that moves from broad scenarios (e.g., “AI could accelerate misuse of bioengineering”) to specific pathways linking a future model’s capabilities to measurable harms. This can include conceptually isolating “precursory capabilities”—smaller skills that a model must possess before it can unlock more dangerous behaviours—to give early warning signs and more manageable assessment points. [Apollo Research]apolloresearch.aiApollo ResearchPrecursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities – Apollo ResearchJun…

Estimating likelihood and impact of emergent risks

Once threats are identified, the next step in pre‑training hazard modelling is to estimate both how likely they are and how severe their impacts could be. This is intrinsically challenging because powerful AI models have not yet existed, and historical data for catastrophic misuse or autonomous breakdowns simply does not exist. Instead, developers use a mix of expert judgement, analogue methods from other industries, and emerging techniques that try to quantify uncertainty explicitly.

Approaches adapted from systems engineering include:

  • Scenario building and causal mapping to understand how a given training configuration could lead to harmful outcomes. [SaferAI]safer-ai.orgthe role of risk modeling in advanced ai risk managementSaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025…Published: December 10, 2025
  • Fault and event tree analyses or Bayesian networks that try to combine individual hazard probabilities into a broader risk picture. [SaferAI]safer-ai.orgthe role of risk modeling in advanced ai risk managementSaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025…Published: December 10, 2025
  • Capability thresholds that define trigger points where specific risky outcomes become credible enough to demand action. Frontier frameworks often set these thresholds qualitatively—for example, when a model is capable of advanced cyber exploitation or biological protocol design—recognising that exact numbers are uncertain. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025

These methods aim to balance likelihood (how probable it is that a future model would develop a particular dangerous capability) with impact (how large the harm would be if that capability materialised). Because frontier AI risk is about unprecedented scale and potential irreversibility, even low‑probability, high‑impact pathways are taken seriously in these models.

Pre Training Risks illustration 2

Designing mitigation strategies before training begins

Perhaps the most consequential part of pre‑training hazard modelling is not just spotting risks in theory, but tying them to concrete mitigation plans that can be deployed before training begins. A basic premise of AI doom governance proposals is that waiting until after a model is built may be too late to prevent certain catastrophic outcomes; by then, the capability is already there. Pre‑training analysis feeds directly into decisions about whether and how a training run should proceed.

Mitigation strategies that can be shaped pre‑training include:

  • Training adjustments: Altering data curation, objective functions, or model architectures to constrain certain capabilities from emerging in the first place. These early interventions are informed by hazard forecasts that suggest areas of special caution. [ScienceDirect]sciencedirect.comScienceDirectAligning Large Language Models Across the Lifecycle: A Survey on Safety–Usability Trade-offs from Pre-training to Post-train…
  • Capability tripwires: Incorporating monitoring during training that watches for signs a model is approaching a threshold of dangerous behaviour and pauses training for further evaluation if triggered. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
  • Governance rules and safety cases: Developers can assemble structured “safety cases” that tie evidence from pre‑training models and analogue tests to arguments about why a training run will not cross identified risk boundaries or will do so only with specified safeguards in place. [arXiv]arxiv.orgarXiv Towards evaluations-based safety cases for AI schemingarXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024…Published: October 29, 2024
  • External evaluation and regulatory engagement: Pre‑training modelling can be documented and shared with independent reviewers or regulators as part of mandatory evaluations that would be required before intense compute authorisations. These documented risk forecasts and mitigation plans are critical if AI safety evaluations become a legal prerequisite. [GOV.UK]GOV.UKEmerging processes for frontier AI safety27, 2023…

In advanced safety proposals, these mitigations are not static; they evolve. Training forecasts can be updated with new evidence from predecessor models, red‑teaming, and continuous evaluation pipelines so that as understanding grows, the mitigation strategies adjust accordingly.

Why pre‑training modelling matters to existential risk governance

From an AI doom perspective, the very idea of pre‑training hazard modelling reflects a shift from reactive to anticipatory risk management. Rather than testing only after a model exists—by which point highly capable behaviours might already be baked in—this modelling tries to forecast extreme risks, estimate where they might arise, and tie them to preventative action. In debates about mandatory frontier AI evaluations, this anticipatory modelling forms the backbone of arguments that powerful AI systems should not be trained without first demonstrating that critical hazards have been analysed and mitigated. [GOV.UK]GOV.UKwww.gov.uk Frontier AI: capabilities and risks – discussion paperIntroduction 2. What is the current state of frontier AI capabilities? 3. How might frontier AI capabilitie…

Because frontier AI risk involves significant uncertainty and unprecedented capabilities, pre‑training hazard modelling does not claim exact predictions. But by combining structured threat frameworks, expert judgement, and evidence from analogue safety domains, it gives developers and regulators a way to move from vague fears about future dangers to concrete checkpoints and mitigation strategies before the most powerful AI systems are ever trained.

Pre Training Risks illustration 3

Amazon book picks

Further Reading

Books and field guides related to How Pre Training Hazard Modelling Aims to Prevent Catastrophic AI Risks. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Title: arXiv Towards evaluations-based safety cases for AI scheming
    Link: https://arxiv.org/abs/2411.03336
    Source snippet

    arXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024...

    Published: October 29, 2024

  2. Source: sciencedirect.com
    Link: https://www.sciencedirect.com/science/article/pii/S0893608026004570
    Source snippet

    ScienceDirectAligning Large Language Models Across the Lifecycle: A Survey on Safety–Usability Trade-offs from Pre-training to Post-train...

  3. Source: GOV.UK
    Title: Emerging processes for frontier AI safety
    Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
    Source snippet

    27, 2023...

  4. Source: GOV.UK
    Title: www.gov.uk Frontier AI: capabilities and risks – discussion paper
    Link: https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper/frontier-ai-capabilities-and-risks-discussion-paper
    Source snippet

    Introduction 2. What is the current state of frontier AI capabilities? 3. How might frontier AI capabilitie...

  5. Source: GOV.UK
    Link: https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper/future-risks-of-frontier-ai-annex-a
    Source snippet

    Executive summary 2. Context 3. Current Frontier AI capabilities 4. Future Frontier AI capabilities 5. Other critical uncert...

  6. Source: frontiermodelforum.org
    Title: Frontier Model Forum Risk Taxonomy and Thresholds for Frontier AI Frameworks
    Link: https://www.frontiermodelforum.org/technical-reports/risk-taxonomy-and-thresholds/
    Source snippet

    Frontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025...

    Published: June 18, 2025

  7. Source: frontiermodelforum.org
    Title: Frontier Model Forum Managing Advanced Cyber Risks in Frontier AI Frameworks
    Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
    Source snippet

    Frontier Model ForumManaging Advanced Cyber Risks in Frontier AI Frameworks - Frontier Model ForumFebruary 13, 2026...

    Published: February 13, 2026

  8. Source: apolloresearch.ai
    Link: [https://www.apolloresearch.ai/research/precursory-capabilities-a-refinement-to-pre-deployment
    Source snippet

    Apollo ResearchPrecursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities – Apollo ResearchJun...

  9. Source: safer-ai.org
    Title: the role of risk modeling in advanced ai risk management
    Link: https://www.safer-ai.org/research/the-role-of-risk-modeling-in-advanced-ai-risk-management
    Source snippet

    SaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025...

    Published: December 10, 2025

  10. Source: frontiermodelforum.org
    Link: https://www.frontiermodelforum.org/updates/issue-brief-preliminary-taxonomy-of-pre-deployment-frontier-ai-safety-evaluations/
    Source snippet

    Frontier Model ForumIssue Brief: Preliminary Taxonomy of Pre-Deployment Frontier AI Safety Evaluations - Frontier Model ForumDecember 20...

  11. Source: emergentmind.com
    Title: Frontier Model Safety Framework
    Link: https://www.emergentmind.com/topics/frontier-model-safety-framework-fmsf
    Source snippet

    February 3, 2026 — FRONTIER MODEL SAFETY FRAMEWORK Updated 3 February 2026 * FMSF is a comprehensive safety framework that systematically...

    Published: February 3, 2026

  12. Source: frontiermodelforum.org
    Title: Frontier Mitigations
    Link: https://www.frontiermodelforum.org/technical-reports/frontier-mitigations/
    Source snippet

    OVERVIEW OF FRONTIER MITIGATIONS 1.1 PURPOSE AND SCOPE Frontier mitigations are protective measures implemented on frontier models, with...

Additional References

  1. Source: s-rsa.com
    Link: https://s-rsa.com/index.php/agi/article/view/14741
    Source snippet

    Yu-Gang Jiang | SuperIntelligence - Robotics - Safety & AlignmentJune 3, 2025 — REVIEW: SAFETY AT SCALE: COMPREHENSIVE SURVEY OF LARGE MO...

    Published: June 3, 2025

  2. Source: [evals]({{ ‘evals/’ | relative_url }}). alignment.org
    Title: Open AI’s Preparedness Framework, Google Deep Mind’s Frontier Safet
    Link: https://evals.alignment.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/
    Source snippet

    models can be dangerous before public deployment - METRJanuary 17, 2025 — AI models can be dangerous before public deployment DATE Januar...

    Published: January 17, 2025

  3. Source: papers.cool
    Title: Systematic Hazard Analysis for Frontier AI using STPA | Cool Papers
    Link: https://papers.cool/arxiv/2506.01782
    Source snippet

    Immersive Paper DiscoveryJune 2, 2025 — 2506.01782 Total: 1 #1 SYSTEMATIC HAZARD ANALYSIS FOR FRONTIER AI USING STPA [PDF^{}] [COPY] [KIM...

    Published: June 2, 2025

  4. Source: youtube.com
    Title: “Extinction from AI” – The FULL explanation
    Link: https://www.youtube.com/watch?v=2Tn5gy1Fuwg
    Source snippet

    "Pre-training hazard modelling" OR "frontier model safety framework" Safety Testing Amazon's Nova Premier [AI Research]({{ 'ai-research-loop/' | relative_url }}) Roundup...

  5. Source: OpenAI
    Title: a hazard analysis framework for code synthesis large language models
    Link: https://openai.com/index/a-hazard-analysis-framework-for-code-synthesis-large-language-models/
    Source snippet

    comA hazard analysis framework for code synthesis large language models | OpenAIJuly 25, 2022 — A hazard analysis framework for code synt...

    Published: July 25, 2022

  6. Source: youtube.com
    Link: https://www.youtube.com/watch?v=R49Cv7pJ2KA
    Source snippet

    OpenAI’s Preparedness Framework: AI Safety Plan...

  7. Source: ai-safety-atlas.com
    Title: Foundation Models
    Link: https://ai-safety-atlas.com/chapters/v1/capabilities/foundation-models
    Source snippet

    First, they go through what we call a pre-training, and then second, they can be adapted through various mechanisms like fin...

  8. Source: youtube.com
    Title: [Anthropic]({{ ‘anthropic-tests/’ | relative_url }})’s Plan to Stop AI Bioweapons & Autonomous Misuse
    Link: https://www.youtube.com/watch?v=n5h1GNvzqIg
    Source snippet

    "Extinction from AI" – The FULL explanation...

  9. Source: OpenAI
    Title: frontier ai regulation
    Link: https://openai.com/research/frontier-ai-regulation
    Source snippet

    comFrontier AI regulation: Managing emerging risks to public safety | OpenAIJuly 6, 2023 — OpenAI July 6, 2023 Publication FRONTIER AI RE...

    Published: July 6, 2023

  10. Source: youtube.com
    Title: Anthropic’s AI Safety Plan
    Link: https://www.youtube.com/watch?v=Z_nHHKrcjQM
    Source snippet

    Anthropic’s Plan to Stop AI Bioweapons & Autonomous Misuse...

Topic Tree

Follow this branch

Parent topic

Safety Checks Should Frontier Models Pass Safety Checks First?

Related pages 2