Within Successor training

Could coding agents replace AI researchers?

Coding progress matters because frontier AI development still depends heavily on software engineering, debugging, and experiment plumbing.

On this page

  • Why AI labs depend on software engineering
  • What current coding agents can and cannot automate
  • When faster engineering becomes a doom relevant feedback loop
Preview for Could coding agents replace AI researchers?

Introduction

A key question in debates about AI doom and recursive self-improvement is whether coding agents could remove one of the biggest bottlenecks in frontier AI development: software engineering. Modern AI labs do not advance purely through scientific breakthroughs. They depend on vast amounts of coding, debugging, experiment management, evaluation infrastructure, data pipelines, monitoring systems, and deployment work. If AI systems become able to perform much of that labour, the pace of AI development could accelerate significantly. If they cannot, then many fast-takeoff and intelligence-explosion scenarios become harder to realise.

Coding agents illustration 1 The short answer is that coding agents are already removing some engineering bottlenecks, but there is little evidence that they can yet replace the full range of work performed by experienced AI researchers and research engineers. The importance of this question for existential-risk discussions is that software engineering may be easier to automate than scientific discovery. If machines can take over enough of the engineering burden, even without becoming brilliant scientists, they may still speed up the creation of more capable successor systems. [GitHub]github.comGitHubAnthropicClaude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by… [Metr]metr.orgMetrResearchForecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study. 20 August 2025. AI agents are improving rapidly at…Published: August 2025

Why AI labs depend on software engineering

Outside observers sometimes imagine frontier AI progress as being driven mainly by a handful of scientists inventing new algorithms. In practice, a large fraction of work inside leading labs consists of engineering.

Training a frontier model requires maintaining enormous codebases, building data-processing pipelines, managing compute clusters, running thousands of experiments, tracking failures, evaluating model behaviour, analysing results, and integrating new techniques into production systems. Many proposed improvements never become useful because implementation, testing, and debugging take too long.

This matters because engineering work is often more structured and measurable than open-ended scientific research. A coding agent does not necessarily need deep scientific insight to generate substantial productivity gains. If it can reliably fix bugs, write infrastructure, create tests, automate evaluations, and run experiments, it may remove delays that currently slow research teams. Several AI-risk forecasting efforts therefore treat AI automation of AI research and engineering as an especially important milestone because it could create a feedback loop in which AI systems help build better AI systems. [arXiv]arxiv.orgarXiv Measuring AI R&D AutomationarXivMeasuring AI R&D AutomationMarch 4, 2026 — by A Chan · 2026 — For tasks more directly relevant to frontier research, METR's RE-Bench…Published: March 4, 2026 [Metr]metr.orgMetrMETROur AI evaluations research focuses on assessing broad autonomous capabilities and the ability of AI systems to accelerate AI R&D…

One reason this possibility receives attention in x-risk discussions is that frontier AI development is increasingly constrained by skilled labour. Compute and funding matter, but experienced research engineers remain scarce. If AI systems can effectively multiply the productivity of those engineers, the practical research capacity of a lab could expand far faster than headcount alone would suggest. [Hyperdimensional]hyperdimensional.coOn Recursive Self-Improvement (Part IBallFebruary 5, 2026 — 5 Feb 2026 — America's major frontier AI labs have begun automating large fractions of their research and engineer…Published: February 5, 2026

What current coding agents can and cannot automate

The strongest evidence for coding-agent progress comes from software-engineering benchmarks and growing deployment inside technology companies.

On benchmarks such as SWE-bench, which asks models to resolve real software issues from GitHub repositories, performance has improved dramatically over a short period. Systems that struggled with most tasks in 2023 now solve large fractions of benchmark problems. Researchers and companies increasingly use coding agents that can navigate repositories, edit multiple files, execute tests, inspect outputs, and iteratively revise solutions. [SWE-bench]swebench.comSWE-benchSWE-bench LeaderboardsOfficial Leaderboards. mini-SWE-agent scores up to 74% on SWE-bench Verified in 100 lines of Python code.R… [Import AI]importai.substack.comImport AIAI systems are about to start building themselvesImport AI 455Solving real-world software engineering problems: SWE-Bench is a widely used coding test which evaluates how well AI systems…

Modern coding agents can often:

  • Write substantial amounts of working code.
  • Search and understand large codebases.
  • Generate tests and documentation.
  • Debug routine software failures.
  • Execute repetitive engineering tasks. [arxiv.org]arxiv.orgarXiv Measuring AI R&D AutomationarXivMeasuring AI R&D AutomationMarch 4, 2026 — by A Chan · 2026 — For tasks more directly relevant to frontier research, METR's RE-Bench…Published: March 4, 2026
  • Run bounded development workflows with limited supervision. [GitHub]github.comGitHubAnthropicClaude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by… [checkmarx]checkmarx.comtop 12 ai developer tools in 2026 for security coding and qualityTop 12 AI Developer Tools in 2026 for Security, Coding…Mar 11, 2026 — AI developer tools use large language models, embeddings, and au… These capabilities are directly relevant to AI labs because much AI development consists of exactly these activities.

However, the limitations are equally important.

Many evaluations still show a significant gap between benchmark performance and real-world engineering. Some benchmark scores may be inflated by contamination, infrastructure quirks, or evaluation design. Researchers have repeatedly found that benchmark success does not automatically translate into reliable performance on messy production systems. [TianPan]tianpan.coTianPanAgentic Coding in Production: What SWE-bench Scores…Apr 9, 2026 — SWE-bench Verified became the de facto standard for evaluatin… [Anthropic]anthropic.cominfrastructure noiseAnthropicQuantifying infrastructure noise in agentic coding evalsFeb 5, 2026 — Agentic coding benchmarks like SWE-bench and Terminal-Benc… [Berkeley RDI]rdi.berkeley.eduBerkeley RDIHow We Broke Top AI Agent BenchmarksWe built an automated scanning agent that systematically audited eight among the most pro…

Current coding agents also struggle with: [arxiv.org]arxiv.orgFrontier Coding Agents Can Now Implement an AlphaZero…29 Apr 2026 — This paper is particularly concerned with recursive self-improveme…

  • Long-horizon projects requiring weeks or months of coordinated effort.
  • Ambiguous goals and changing requirements.
  • Novel research problems without clear success criteria.
  • Strategic prioritisation between competing experiments.
  • Understanding organisational context and tacit knowledge.
  • Coordinating many interacting teams and systems. [Metr]metr.org2025 08 20 forecasting impacts of ai accelerationForecasting the Impacts of AI R&D Acceleration20 Aug 2025 — AI agents are improving rapidly at autonomous software development and machin… [Metr]metr.org2026 02 10 simpler ai timelines modelA simpler AI timelines model predicts 99% AI R&D…10 Feb 2026 — In this post, I describe a simple model for forecasting when AI will au…

Even highly capable coding agents often succeed on well-defined technical tasks while failing on broader project management and research-direction questions. That distinction is central to the debate.

The strongest doom-relevant argument

The most important argument from AI-doom advocates does not require coding agents to replace every researcher.

Instead, it relies on a narrower claim: software engineering may be the largest remaining bottleneck in AI progress, and that bottleneck may be easier to automate than frontier scientific creativity.

Imagine a lab where researchers already know dozens of promising ideas they would like to test but lack the engineering capacity to implement and evaluate them all. If coding agents multiply engineering throughput several-fold, more experiments can be run, more model variants can be tested, and more improvements can be incorporated into future systems.

In that scenario, AI systems accelerate progress without independently inventing entirely new paradigms. They function as force multipliers for human researchers. If the resulting models are themselves better coding agents, the process could repeat. This is one of the mechanisms by which recursive capability gains might emerge gradually rather than through a sudden scientific breakthrough. [arXiv]arxiv.orgarXiv Measuring AI R&D AutomationarXivMeasuring AI R&D AutomationMarch 4, 2026 — by A Chan · 2026 — For tasks more directly relevant to frontier research, METR's RE-Bench…Published: March 4, 2026 [3Metr 3Metr]

Some recent forecasting work explicitly examines AI R&D acceleration through automation of software engineering and machine-learning tasks. The concern is not that an AI wakes up and redesigns itself overnight, but that increasingly capable agents steadily compress research cycles by reducing the amount of human labour required per generation of models. [Metr]metr.orgMetrTask-Completion Time Horizons of Frontier AI ModelsIt varies by model, task, and the exact agent setup, but AI agents are typically s… [Metr]metr.orgMetrAnalyzing coding agent transcripts to upper bound…17 Feb 2026 — This method estimates a time savings factor of ~1.5x to ~13x on Cl…

Coding agents illustration 3

Why sceptics think the bottleneck may persist

Critics of recursive-improvement scenarios argue that software engineering is only one constraint among many.

Even if coding productivity increases substantially, frontier AI development still depends on factors that are harder to automate:

  • Access to specialised hardware.
  • Massive compute budgets.
  • Experimental validation.
  • Safety evaluation.
  • Organisational decision-making.
  • Scientific insight into new architectures and training methods.
  • Coordination among large teams.

From this perspective, coding agents resemble previous productivity tools. They make researchers more effective but do not fundamentally change the pace of progress.

There is also evidence that faster coding does not automatically produce faster innovation. Organisations frequently discover that downstream bottlenecks emerge in testing, verification, deployment, security review, or infrastructure management. Some studies and industry reports suggest that AI-assisted coding can increase software output while simultaneously creating new quality-control burdens. [TechRadar]techradar.comai has slashed coding time in 2026 but its sacrificed software stabilityTeams using AI tools frequently are releasing code faster—with 45% deploying daily—compared to just 15% of occasional users. However, thi…

Another sceptical argument is that frontier AI research contains many activities that look less like programming and more like scientific judgement. Deciding which hypotheses deserve attention, interpreting surprising results, identifying hidden failure modes, and choosing strategic research directions may remain difficult to automate even if coding itself becomes largely automated.

Coding agents illustration 2

When faster engineering becomes a doom-relevant feedback loop

For existential-risk discussions, the key question is not whether coding agents save time. They clearly do. The question is whether the savings become large enough to change the dynamics of AI development.

A feedback loop becomes interesting when three conditions hold:

  1. AI systems substantially accelerate AI-development work.
  2. The resulting systems are better at accelerating AI-development work.
  3. The cycle repeats faster than human institutions can adapt.

The first condition increasingly appears plausible. The second remains uncertain but is actively studied by organisations evaluating AI R&D automation. The third is the most speculative and depends on economics, hardware constraints, safety measures, regulation, and coordination among leading labs. [Metr]metr.org2025 07 10 early 2025 ai experienced os dev studyMeasuring the Impact of Early-2025 AI on Experienced…10 Jul 2025 — We conduct a randomized controlled trial (RCT) to understand how ea… [Metr]metr.org2026 05 11 ai usage surveyWe propose measuring AI performance in terms of the length of tasks AI agents can complete. We…Read more…

This is why coding agents occupy a prominent place in AI-doom discussions. They offer a concrete mechanism through which AI systems might speed up the creation of successor systems. Unlike hypothetical superhuman scientific genius, automated coding is already visible and measurable.

At the same time, current evidence does not show that coding agents can independently run frontier AI laboratories. They appear much closer to highly productive research assistants than autonomous research directors. The central uncertainty is whether future improvements merely continue this pattern of assistance or eventually remove enough engineering bottlenecks to create a self-reinforcing acceleration cycle. That uncertainty sits near the heart of modern debates about p(doom), recursive improvement, and the possibility of losing control of increasingly capable AI systems. [arXiv]arxiv.orgarXiv Measuring AI R&D AutomationarXivMeasuring AI R&D AutomationMarch 4, 2026 — by A Chan · 2026 — For tasks more directly relevant to frontier research, METR's RE-Bench…Published: March 4, 2026 [3Metr 3Metr]

Amazon book picks

Further Reading

Books and field guides related to Could coding agents replace AI researchers?. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: github.com
    Link: https://github.com/anthropics
    Source snippet

    GitHubAnthropicClaude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by...

  2. Source: metr.org
    Link: https://metr.org/research/
    Source snippet

    MetrResearchForecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study. 20 August 2025. AI agents are improving rapidly at...

    Published: August 2025

  3. Source: metr.org
    Link: https://metr.org/
    Source snippet

    MetrMETROur AI evaluations research focuses on assessing broad autonomous capabilities and the ability of AI systems to accelerate AI R&D...

  4. Source: arxiv.org
    Title: arXiv Measuring AI R&D Automation
    Link: https://arxiv.org/pdf/2603.03992
    Source snippet

    arXivMeasuring AI R&D AutomationMarch 4, 2026 — by A Chan · 2026 — For tasks more directly relevant to frontier research, METR's RE-Bench...

    Published: March 4, 2026

  5. Source: metr.org
    Title: 2025 08 20 forecasting impacts of ai acceleration
    Link: https://metr.org/blog/2025-08-20-forecasting-impacts-of-ai-acceleration/
    Source snippet

    Forecasting the Impacts of AI R&D Acceleration20 Aug 2025 — AI agents are improving rapidly at autonomous software development and machin...

  6. Source: metr.org
    Title: 2026 02 10 simpler [ai timelines]({{ ‘timeline-effects/’ | relative_url }}) model
    Link: https://metr.org/notes/2026-02-10-simpler-ai-timelines-model/
    Source snippet

    A simpler AI timelines model predicts 99% AI R&D...10 Feb 2026 — In this post, I describe a simple model for forecasting when AI will au...

  7. Source: hyperdimensional.co
    Title: On Recursive Self-Improvement (Part I)
    Link: https://www.hyperdimensional.co/p/on-recursive-self-improvement-part
    Source snippet

    BallFebruary 5, 2026 — 5 Feb 2026 — America's major frontier AI labs have begun automating large fractions of their research and engineer...

    Published: February 5, 2026

  8. Source: github.com
    Link: https://github.com/swe-bench/SWE-bench
    Source snippet

    GitHubSWE-bench: Can Language Models Resolve Real-...SWE-bench is a benchmark for evaluating large language models on real world softwar...

  9. Source: checkmarx.com
    Title: top 12 ai developer tools in 2026 for security coding and quality
    Link: https://checkmarx.com/learn/ai-security/top-12-ai-developer-tools-in-2026-for-security-coding-and-quality/
    Source snippet

    Top 12 AI Developer Tools in 2026 for Security, Coding...Mar 11, 2026 — AI developer tools use large language models, embeddings, and au...

  10. Source: anthropic.com
    Title: infrastructure noise
    Link: https://www.anthropic.com/engineering/infrastructure-noise
    Source snippet

    AnthropicQuantifying infrastructure noise in agentic coding evalsFeb 5, 2026 — Agentic coding benchmarks like SWE-bench and Terminal-Benc...

  11. Source: rdi.berkeley.edu
    Link: https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
    Source snippet

    Berkeley RDIHow We Broke Top AI Agent BenchmarksWe built an automated scanning agent that systematically audited eight among the most pro...

  12. Source: arxiv.org
    Link: https://arxiv.org/abs/2505.20411

  13. Source: tianpan.co
    Link: https://tianpan.co/blog/2026-04-09-agentic-coding-production-swebench-gap
    Source snippet

    TianPanAgentic Coding in Production: What SWE-bench Scores...Apr 9, 2026 — SWE-bench Verified became the de facto standard for evaluatin...

  14. Source: metr.org
    Link: https://metr.org/time-horizons/
    Source snippet

    MetrTask-Completion Time Horizons of Frontier AI ModelsIt varies by model, task, and the exact agent setup, but AI agents are typically s...

  15. Source: metr.org
    Link: https://metr.org/notes/2026-02-17-exploratory-transcript-analysis-for-estimating-time-savings-from-coding-agents/
    Source snippet

    MetrAnalyzing coding agent transcripts to [upper bound]({{ 'upper-bound-limits/' | relative_url }})...17 Feb 2026 — This method estimates a time savings factor of ~1.5x to ~13x on Cl...

  16. Source: arxiv.org
    Link: https://arxiv.org/html/2604.25067v2
    Source snippet

    Frontier Coding Agents Can Now Implement an AlphaZero...29 Apr 2026 — This paper is particularly concerned with recursive self-improveme...

  17. Source: techradar.com
    Title: ai has slashed coding time in 2026 but its sacrificed software stability
    Link: https://www.techradar.com/pro/ai-has-slashed-coding-time-in-2026-but-its-sacrificed-software-stability
    Source snippet

    Teams using AI tools frequently are releasing code faster—with 45% deploying daily—compared to just 15% of occasional users. However, thi...

  18. Source: metr.org
    Title: 2025 07 10 early 2025 ai experienced os dev study
    Link: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
    Source snippet

    Measuring the Impact of Early-2025 AI on Experienced...10 Jul 2025 — We conduct a randomized controlled trial (RCT) to understand how ea...

  19. Source: metr.org
    Title: 2026 05 11 ai usage survey
    Link: https://metr.org/blog/2026-05-11-ai-usage-survey/
    Source snippet

    We propose measuring AI performance in terms of the length of tasks AI agents can complete. We...Read more...

  20. Source: metr.org
    Title: 2026 02 24 uplift update
    Link: https://metr.org/blog/2026-02-24-uplift-update/
    Source snippet

    We are Changing our Developer Productivity Experiment...24 Feb 2026 — To understand how AI is impacting developer productivity over time...

  21. Source: anthropic.com
    Link: https://www.anthropic.com/

  22. Source: github.com
    Title: awesome ai agent papers
    Link: https://github.com/VoltAgent/awesome-ai-agent-papers
    Source snippet

    VoltAgent/awesome-ai-agent-papersA curated collection of AI agent research papers released in 2026, covering agent engineering, memory, e...

  23. Source: swebench.com
    Link: https://www.swebench.com/
    Source snippet

    SWE-benchSWE-bench LeaderboardsOfficial Leaderboards. mini-SWE-agent scores up to 74% on SWE-bench Verified in 100 lines of Python code.R...

  24. Source: importai.substack.com
    Title: Import AIAI systems are about to start building themselves
    Link: https://importai.substack.com/p/import-ai-455-automating-ai-research
    Source snippet

    Import AI 455Solving real-world software engineering problems: SWE-Bench is a widely used coding test which evaluates how well AI systems...

  25. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Anthropic
    Source snippet

    AnthropicAnthropic is an American [artificial]({{ 'artificial-goals/' | relative_url }}) intelligence (AI) company headquartered in San Francisco. It has developed a series of la...

  26. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/METR
    Source snippet

    METRModel Evaluation and Threat Research (METR) (MEE-tər), is a nonprofit research institute, based in Berkeley, California, that eval...

  27. Source: codeant.ai
    Title: swe bench scores
    Link: https://www.codeant.ai/blogs/swe-bench-scores
    Source snippet

    SWE-bench Leaderboard 2026: All Model Scores...13 Apr 2026 — SWE-bench is a benchmark that gives an AI model a real GitHub issue and a c...

  28. Source: epoch.ai
    Link: https://epoch.ai/benchmarks/swe-bench-verified
    Source snippet

    SWE-bench VerifiedSWE-bench Verified is a human-validated subset of the original SWE-bench dataset, consisting of 500 samples that evalua...

  29. Source: labs.scale.com
    Title: swe bench pro public
    Link: https://labs.scale.com/leaderboard/swe_bench_pro_public
    Source snippet

    scale.comSWE-Bench Pro (Public Dataset) - Scale LabsSWE-Bench Pro is a benchmark designed to provide a rigorous and realistic evaluation...

Additional References

  1. Source: researchgate.net
    Link: https://www.researchgate.net/publication/404948272_Hybrid_Architectures_for_Pairing_Frontier_AI_Code_Agents_with_Resource-Efficient_Models
    Source snippet

    Hybrid Architectures for Pairing Frontier AI Code Agents...18 May 2026 — This article investigates hybrid architectures designed to pair...

    Published: May 2026

  2. Source: linkedin.com
    Link: [https://www.linkedin.com/pulse/ai-coding-agents-have-become-cheap-labor-governance
    Source snippet

    AI Coding Agents Have Become Cheap Labor....We created methodologies, governance structures, CABs, SDLC frameworks, release controls, te...

  3. Source: timesofindia.indiatimes.com
    Link: https://timesofindia.indiatimes.com/technology/tech-news/salesforce-ceo-marc-benioff-says-that-the-company-has-almost-not-hired-engineers-since-last-two-years-reason-is-ai-coding-agents-with-/articleshow/131383964.cms
    Source snippet

    Speaking during a quarterly earnings call, Benioff explained that the engineering team has remained roughly constant at around 15,000 emp...

  4. Source: linkedin.com
    Link: https://www.linkedin.com/posts/amlau_you-may-have-heard-about-the-recent-metr-activity-7353133629553745920-i51q
    Source snippet

    You may have heard about the recent METR AI productivity...METR found that experienced open source developers doing typical coding tasks...

  5. Source: faros.ai
    Link: https://www.faros.ai/blog/lab-vs-reality-ai-productivity-study-findings
    Source snippet

    What METR's Study Missed About AI Productivity in the WildJul 28, 2025 — METR's study found AI tooling slowed developers down...

  6. Source: programming-helper.com
    Title: anthropic claude opus 4 5 coding breakthrough 2026 human level performance
    Link: https://www.programming-helper.com/tech/anthropic-claude-opus-4-5-coding-breakthrough-2026-human-level-performance
    Source snippet

    Anthropic Claude Opus 4.5: How 80.9% SWE-bench...25 Jan 2026 — Anthropic's Claude Opus 4.5, released in November 2025, represents a hist...

    Published: November 2025

  7. Source: codegen.com
    Title: Cursor, Claude Code, Devin, Codegen, and more — compared on what
    Link: https://codegen.com/best-ai-coding-agents/
    Source snippet

    Best AI Coding Agents in 2026: Ranked and ComparedThe best AI coding agents ranked by the team that built agent orchestration infrastructure...

  8. Source: businessinsider.com
    Title: Business Insider Why AI hasn’t replaced every ‘automatable’ job
    Link: https://www.businessinsider.com/why-ai-hasnt-replaced-every-automatable-job-yet-2026-5
    Source snippet

    According to Benjamin Todd, president of 80,000 Hours, the reason lies in AI's current limitations—it often automates only parts of a job...

  9. Source: thenextweb.com
    Title: anthropic claude opus 4 7 coding agentic benchmarks release
    Link: https://thenextweb.com/news/anthropic-claude-opus-4-7-coding-agentic-benchmarks-release
    Source snippet

    Claude Opus 4.7 leads on SWE-bench and agentic...Apr 16, 2026 — Anthropic's Claude Opus 4.7 scores 64.3% on SWE-bench Pro, adds multi-ag...

  10. Source: youtube.com
    Title: AI Agent Automatically Codes WITH TOOLS
    Link: https://www.youtube.com/watch?v=9-JBHGlYEBI
    Source snippet

    Advancing Scientific Research with AI Research Agents - YouTube Advancing Scientific Research with AI Research Agents - YouTube...

Topic Tree

Follow this branch

Parent topic

Successor training Could AI train the next AI itself?

Related pages 2