How Pre Training Hazard Modelling Aims to Prevent Catastrophic AI Risks

Introduction

In the context of AI systems that could one day pose existential dangers—through loss of control, deceptive behaviour, or autonomous pursuit of unintended goals—developers and regulators are increasingly talking about pre‑training hazard modelling. This term refers to the structured analysis and forecasting that happens before a major training run of a powerful AI model begins: identifying the extreme risks that such a run might produce and planning protections or limits in advance. In contrast to testing only after a system is built, pre‑training hazard modelling is about asking what could go wrong if this model becomes more capable than expected? and how do we mitigate those hazards before it exists? This proactive approach aims to catch catastrophic dangers early, and is a central part of proposals for mandatory safety evaluations that could be required before training large AI systems.

Pre Training Risks illustration 1

Threat identification for high‑severity AI capabilities

A core part of pre‑training hazard modelling is systematically anticipating the ways advanced AI could be harmful. Borrowing methods long used in safety‑critical fields like aviation or nuclear power, developers map out causal pathways from future model capabilities to extreme harms. This “threat modelling” isn’t about ordinary software bugs; it concerns scenarios where a model could be misused in ways that are hard to reverse, scale rapidly, or cause widespread disruption.

Common hazard domains flagged by frontier safety frameworks include: [emergentmind.com]emergentmind.comFrontier Model Safety FrameworkFebruary 3, 2026 — FRONTIER MODEL SAFETY FRAMEWORK Updated 3 February 2026 * FMSF is a comprehensive safety framework that systematically…Published: February 3, 2026

Dual‑use assistance such as guidance on creating biological threats or chemical agents, where AI could lower barriers to misuse. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
Advanced cyber threats, where an AI could help discover or exploit vulnerabilities in critical infrastructure. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
Autonomous or recursive capabilities, such as self‑replication, automated research or planning, and emerging “agentic” behaviour that might pursue objectives not aligned with human intentions. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
Strategic deception, where models could behave differently when being evaluated versus in deployment or misuse scenarios (sometimes called “scheming” in risk literature). [arXiv]arxiv.orgarXiv Towards evaluations-based safety cases for AI schemingarXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024…Published: October 29, 2024

In well‑developed frameworks, threat identification is not a casual brainstorm but a systematic analysis that moves from broad scenarios (e.g., “AI could accelerate misuse of bioengineering”) to specific pathways linking a future model’s capabilities to measurable harms. This can include conceptually isolating “precursory capabilities”—smaller skills that a model must possess before it can unlock more dangerous behaviours—to give early warning signs and more manageable assessment points. [Apollo Research]apolloresearch.aiApollo ResearchPrecursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities – Apollo ResearchJun…

Estimating likelihood and impact of emergent risks

Once threats are identified, the next step in pre‑training hazard modelling is to estimate both how likely they are and how severe their impacts could be. This is intrinsically challenging because powerful AI models have not yet existed, and historical data for catastrophic misuse or autonomous breakdowns simply does not exist. Instead, developers use a mix of expert judgement, analogue methods from other industries, and emerging techniques that try to quantify uncertainty explicitly.

Approaches adapted from systems engineering include:

Scenario building and causal mapping to understand how a given training configuration could lead to harmful outcomes. [SaferAI]safer-ai.orgthe role of risk modeling in advanced ai risk managementSaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025…Published: December 10, 2025
Fault and event tree analyses or Bayesian networks that try to combine individual hazard probabilities into a broader risk picture. [SaferAI]safer-ai.orgthe role of risk modeling in advanced ai risk managementSaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025…Published: December 10, 2025
Capability thresholds that define trigger points where specific risky outcomes become credible enough to demand action. Frontier frameworks often set these thresholds qualitatively—for example, when a model is capable of advanced cyber exploitation or biological protocol design—recognising that exact numbers are uncertain. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025

These methods aim to balance likelihood (how probable it is that a future model would develop a particular dangerous capability) with impact (how large the harm would be if that capability materialised). Because frontier AI risk is about unprecedented scale and potential irreversibility, even low‑probability, high‑impact pathways are taken seriously in these models.

Pre Training Risks illustration 2

Designing mitigation strategies before training begins

Perhaps the most consequential part of pre‑training hazard modelling is not just spotting risks in theory, but tying them to concrete mitigation plans that can be deployed before training begins. A basic premise of AI doom governance proposals is that waiting until after a model is built may be too late to prevent certain catastrophic outcomes; by then, the capability is already there. Pre‑training analysis feeds directly into decisions about whether and how a training run should proceed.

Mitigation strategies that can be shaped pre‑training include:

Training adjustments: Altering data curation, objective functions, or model architectures to constrain certain capabilities from emerging in the first place. These early interventions are informed by hazard forecasts that suggest areas of special caution. [ScienceDirect]sciencedirect.comScienceDirectAligning Large Language Models Across the Lifecycle: A Survey on Safety–Usability Trade-offs from Pre-training to Post-train…
Capability tripwires: Incorporating monitoring during training that watches for signs a model is approaching a threshold of dangerous behaviour and pauses training for further evaluation if triggered. [Frontier Model Forum]frontiermodelforum.orgFrontier Model Forum Risk Taxonomy and Thresholds for Frontier AI FrameworksFrontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025…Published: June 18, 2025
Governance rules and safety cases: Developers can assemble structured “safety cases” that tie evidence from pre‑training models and analogue tests to arguments about why a training run will not cross identified risk boundaries or will do so only with specified safeguards in place. [arXiv]arxiv.orgarXiv Towards evaluations-based safety cases for AI schemingarXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024…Published: October 29, 2024
External evaluation and regulatory engagement: Pre‑training modelling can be documented and shared with independent reviewers or regulators as part of mandatory evaluations that would be required before intense compute authorisations. These documented risk forecasts and mitigation plans are critical if AI safety evaluations become a legal prerequisite. [GOV.UK]GOV.UKEmerging processes for frontier AI safety27, 2023…

In advanced safety proposals, these mitigations are not static; they evolve. Training forecasts can be updated with new evidence from predecessor models, red‑teaming, and continuous evaluation pipelines so that as understanding grows, the mitigation strategies adjust accordingly.

Why pre‑training modelling matters to existential risk governance

From an AI doom perspective, the very idea of pre‑training hazard modelling reflects a shift from reactive to anticipatory risk management. Rather than testing only after a model exists—by which point highly capable behaviours might already be baked in—this modelling tries to forecast extreme risks, estimate where they might arise, and tie them to preventative action. In debates about mandatory frontier AI evaluations, this anticipatory modelling forms the backbone of arguments that powerful AI systems should not be trained without first demonstrating that critical hazards have been analysed and mitigated. [GOV.UK]GOV.UKwww.gov.uk Frontier AI: capabilities and risks – discussion paperIntroduction 2. What is the current state of frontier AI capabilities? 3. How might frontier AI capabilitie…

Because frontier AI risk involves significant uncertainty and unprecedented capabilities, pre‑training hazard modelling does not claim exact predictions. But by combining structured threat frameworks, expert judgement, and evidence from analogue safety domains, it gives developers and regulators a way to move from vague fears about future dangers to concrete checkpoints and mitigation strategies before the most powerful AI systems are ever trained.

Pre Training Risks illustration 3

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

500PCS Science Chemistry Stickers Rolls – Lab Experiment Cartoon Reward Labels

Search eBay.com: science sticker

Browse similar on eBay.com

Example eBay listing

10 Random Science Education Themed Stickers Decals Laptop Yeti Car Free Shipping

Search eBay.com: science sticker

Browse similar on eBay.com

Example eBay listing

Atomic Energy Commission USA Seal Sticker | Science Physics Nuclear Vinyl 4993

Search eBay.com: science sticker

Browse similar on eBay.com

Example eBay listing

Science Vinyl Sticker Its Like Magic But Real Perfect for Science #790260

Search eBay.com: science sticker

Browse similar on eBay.com

Browse more on eBay.com

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Example eBay listing

EcoBot Robotics Toy Build and Learn Science Museum STEM Building Robot Kit

Search eBay.co.uk: robotics kit

Browse similar on eBay.co.uk

Example eBay listing

Makeblock mBot STEM Educational Robot Kit – Bluetooth Version Boxed

Search eBay.co.uk: robotics kit

Browse similar on eBay.co.uk

Example eBay listing

6 in 1 Solar Powered Boat Robot Kit DIY Educational Toy 3D Model Fan Toys Car

Search eBay.co.uk: robotics kit

Browse similar on eBay.co.uk

Example eBay listing

STEM Spider Robot Toy Kit DIY Educational Science Project Kids Building Gift 6+

Search eBay.co.uk: robotics kit

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Title: arXiv Towards evaluations-based safety cases for AI scheming
Link: https://arxiv.org/abs/2411.03336
Source snippet
arXivTowards evaluations-based safety cases for AI schemingOctober 29, 2024...

Published: October 29, 2024
Source: sciencedirect.com
Link: https://www.sciencedirect.com/science/article/pii/S0893608026004570
Source snippet
ScienceDirectAligning Large Language Models Across the Lifecycle: A Survey on Safety–Usability Trade-offs from Pre-training to Post-train...
Source: GOV.UK
Title: Emerging processes for frontier AI safety
Link: https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
Source snippet
27, 2023...
Source: GOV.UK
Title: www.gov.uk Frontier AI: capabilities and risks – discussion paper
Link: https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper/frontier-ai-capabilities-and-risks-discussion-paper
Source snippet
Introduction 2. What is the current state of frontier AI capabilities? 3. How might frontier AI capabilitie...
Source: GOV.UK
Link: https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper/future-risks-of-frontier-ai-annex-a
Source snippet
Executive summary 2. Context 3. Current Frontier AI capabilities 4. Future Frontier AI capabilities 5. Other critical uncert...
Source: frontiermodelforum.org
Title: Frontier Model Forum Risk Taxonomy and Thresholds for Frontier AI Frameworks
Link: https://www.frontiermodelforum.org/technical-reports/risk-taxonomy-and-thresholds/
Source snippet
Frontier Model ForumRisk Taxonomy and Thresholds for Frontier AI Frameworks - Frontier Model ForumJune 18, 2025...

Published: June 18, 2025
Source: frontiermodelforum.org
Title: Frontier Model Forum Managing Advanced Cyber Risks in Frontier AI Frameworks
Link: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
Source snippet
Frontier Model ForumManaging Advanced Cyber Risks in Frontier AI Frameworks - Frontier Model ForumFebruary 13, 2026...

Published: February 13, 2026
Source: apolloresearch.ai
Link: [https://www.apolloresearch.ai/research/precursory-capabilities-a-refinement-to-pre-deployment
Source snippet
Apollo ResearchPrecursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities – Apollo ResearchJun...
Source: safer-ai.org
Title: the role of risk modeling in advanced ai risk management
Link: https://www.safer-ai.org/research/the-role-of-risk-modeling-in-advanced-ai-risk-management
Source snippet
SaferAIThe Role of Risk Modeling in Advanced AI Risk Management – SaferAIDecember 10, 2025...

Published: December 10, 2025
Source: frontiermodelforum.org
Link: https://www.frontiermodelforum.org/updates/issue-brief-preliminary-taxonomy-of-pre-deployment-frontier-ai-safety-evaluations/
Source snippet
Frontier Model ForumIssue Brief: Preliminary Taxonomy of Pre-Deployment Frontier AI Safety Evaluations - Frontier Model ForumDecember 20...
Source: emergentmind.com
Title: Frontier Model Safety Framework
Link: https://www.emergentmind.com/topics/frontier-model-safety-framework-fmsf
Source snippet
February 3, 2026 — FRONTIER MODEL SAFETY FRAMEWORK Updated 3 February 2026 * FMSF is a comprehensive safety framework that systematically...

Published: February 3, 2026
Source: frontiermodelforum.org
Title: Frontier Mitigations
Link: https://www.frontiermodelforum.org/technical-reports/frontier-mitigations/
Source snippet
OVERVIEW OF FRONTIER MITIGATIONS 1.1 PURPOSE AND SCOPE Frontier mitigations are protective measures implemented on frontier models, with...

Additional References

Source: s-rsa.com
Link: https://s-rsa.com/index.php/agi/article/view/14741
Source snippet
Yu-Gang Jiang | SuperIntelligence - Robotics - Safety & AlignmentJune 3, 2025 — REVIEW: SAFETY AT SCALE: COMPREHENSIVE SURVEY OF LARGE MO...

Published: June 3, 2025
Source: [evals]({{ ‘evals/’ | relative_url }}). alignment.org
Title: Open AI’s Preparedness Framework, Google Deep Mind’s Frontier Safet
Link: https://evals.alignment.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/
Source snippet
models can be dangerous before public deployment - METRJanuary 17, 2025 — AI models can be dangerous before public deployment DATE Januar...

Published: January 17, 2025
Source: papers.cool
Title: Systematic Hazard Analysis for Frontier AI using STPA | Cool Papers
Link: https://papers.cool/arxiv/2506.01782
Source snippet
Immersive Paper DiscoveryJune 2, 2025 — 2506.01782 Total: 1 #1 SYSTEMATIC HAZARD ANALYSIS FOR FRONTIER AI USING STPA [PDF^{}] [COPY] [KIM...

Published: June 2, 2025
Source: youtube.com
Title: “Extinction from AI” – The FULL explanation
Link: https://www.youtube.com/watch?v=2Tn5gy1Fuwg
Source snippet
"Pre-training hazard modelling" OR "frontier model safety framework" Safety Testing Amazon's Nova Premier [AI Research]({{ 'ai-research-loop/' | relative_url }}) Roundup...
Source: OpenAI
Title: a hazard analysis framework for code synthesis large language models
Link: https://openai.com/index/a-hazard-analysis-framework-for-code-synthesis-large-language-models/
Source snippet
comA hazard analysis framework for code synthesis large language models | OpenAIJuly 25, 2022 — A hazard analysis framework for code synt...

Published: July 25, 2022
Source: youtube.com
Link: https://www.youtube.com/watch?v=R49Cv7pJ2KA
Source snippet
OpenAI’s Preparedness Framework: AI Safety Plan...
Source: ai-safety-atlas.com
Title: Foundation Models
Link: https://ai-safety-atlas.com/chapters/v1/capabilities/foundation-models
Source snippet
First, they go through what we call a pre-training, and then second, they can be adapted through various mechanisms like fin...
Source: youtube.com
Title: [Anthropic]({{ ‘anthropic-tests/’ | relative_url }})’s Plan to Stop AI Bioweapons & Autonomous Misuse
Link: https://www.youtube.com/watch?v=n5h1GNvzqIg
Source snippet
"Extinction from AI" – The FULL explanation...
Source: OpenAI
Title: frontier ai regulation
Link: https://openai.com/research/frontier-ai-regulation
Source snippet
comFrontier AI regulation: Managing emerging risks to public safety | OpenAIJuly 6, 2023 — OpenAI July 6, 2023 Publication FRONTIER AI RE...

Published: July 6, 2023
Source: youtube.com
Title: Anthropic’s AI Safety Plan
Link: https://www.youtube.com/watch?v=Z_nHHKrcjQM
Source snippet
Anthropic’s Plan to Stop AI Bioweapons & Autonomous Misuse...

How Pre Training Hazard Modelling Aims to Prevent Catastrophic AI Risks

Introduction

Threat identification for high‑severity AI capabilities

Estimating likelihood and impact of emergent risks

Designing mitigation strategies before training begins

Why pre‑training modelling matters to existential risk governance

Further Reading

Human Compatible

The Alignment Problem

The Precipice

Superforecasting

Marketplace Samples

500PCS Science Chemistry Stickers Rolls – Lab Experiment Cartoon Reward Labels

10 Random Science Education Themed Stickers Decals Laptop Yeti Car Free Shipping

Atomic Energy Commission USA Seal Sticker | Science Physics Nuclear Vinyl 4993

Science Vinyl Sticker Its Like Magic But Real Perfect for Science #790260

EcoBot Robotics Toy Build and Learn Science Museum STEM Building Robot Kit

Makeblock mBot STEM Educational Robot Kit – Bluetooth Version Boxed

6 in 1 Solar Powered Boat Robot Kit DIY Educational Toy 3D Model Fan Toys Car

STEM Spider Robot Toy Kit DIY Educational Science Project Kids Building Gift 6+

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2