📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent developments indicate AI systems are nearing full automation of core engineering tasks in AI research. Research itself remains partially manual, but evidence suggests this gap may close faster than expected. The implications could reshape AI development workflows within the next two years.
Recent empirical evidence shows that AI systems can now automate the majority of engineering tasks involved in AI research, with some experts suggesting that research itself might become automated at scale within the next 32 months.
Thorsten Meyer’s analysis of recent benchmark progress indicates that AI systems have reached near-saturation levels in core engineering skills relevant to AI R&D. The CORE-Bench, which measures research reproduction capabilities, improved from 21.5% in September 2024 to 95.5% in December 2025, with the benchmark’s author declaring it ‘solved.’ Similarly, the MLE-Bench, assessing performance in Kaggle competitions, increased from 16.9% to 64.4% over the same period, now rivaling mid-tier human performance. These patterns suggest that the bottleneck in AI research is shifting from engineering to the research process itself, which remains less automated.
Clark’s framework emphasizes the distinction between engineering tasks—such as reproducing experiments and optimizing code—and the creative or conceptual aspects of research. The evidence indicates that engineering can be largely automated, but research—particularly the generation of novel hypotheses, experimental design, and interpretation—may be more resistant, though this gap could close rapidly as AI capabilities evolve.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

1000 AI Tools Directory 2026: The Ultimate Guide to AI Tools for Business, Productivity, Content Creation, Marketing, Coding, Design, Research and Automation
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Design Multi-Agent AI Systems Using MCP and A2A: Engineer your own Python-based agentic AI framework with tool use, memory, and multi-agent workflows
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Recent Advances in Artificial Intelligence in Cost Estimation in Project Management (Artificial Intelligence-Enhanced Software and Systems Engineering, 6)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Research Workflow
The rapid automation of engineering tasks in AI research could significantly reduce costs, accelerate innovation cycles, and shift the strategic focus toward research and theory development. If research becomes automatable at scale, traditional roles of human researchers may diminish, prompting a reassessment of AI R&D structures and institutional investments. This transition could lead to a new phase where AI systems not only build and test models but also generate novel research directions autonomously.
Recent Progress in AI R&D Skill Benchmarks
Over the past 18 months, multiple independent benchmarks—such as CORE-Bench and MLE-Bench—have demonstrated consistent improvements in AI’s ability to perform core engineering tasks. The CORE-Bench, measuring research reproduction, has shown a 4.4× improvement, reaching a level considered ‘solved.’ The MLE-Bench, assessing Kaggle competition performance, has approached professional-level results, with scores surpassing two-thirds of competitions. Concurrently, advances in kernel design, such as automated GPU kernel generation and optimization, further illustrate the transition of AI from experimental to production-grade engineering capabilities. These developments collectively suggest that AI is approaching full automation of the engineering side of research.
“The pattern across multiple benchmarks indicates that engineering tasks in AI R&D are nearing full automation, while research remains the residual challenge.”
— Thorsten Meyer
Uncertainties About Research Automation Timeline
While engineering tasks are approaching full automation, it remains unclear how quickly research—particularly the creative and hypothesis-generating aspects—will become fully automated. The structural question Clark leaves open is whether research itself is a form of large-scale engineering, which could accelerate automation, or if some inherently human elements will persist longer. The pace of progress in AI’s creative capabilities is still uncertain, and the timeline for near-complete automation of research remains speculative.
Next Steps in Monitoring AI R&D Capabilities
In the coming months, researchers and industry observers will closely track the progress of benchmarks and real-world applications. Improvements in AI’s research generation, hypothesis testing, and experimental design are expected to be the focus. Institutional responses may include increased investment in AI automation tools and reevaluation of research workflows. The key milestone will be whether AI can autonomously produce novel, publishable research at scale, which could redefine the landscape of AI development within the next 32 months.
Key Questions
What specific tasks in AI research are now automatable?
Tasks such as reproducing experiments, running computational research papers, optimizing code kernels, and participating in Kaggle competitions are now largely automatable, with AI systems reaching near-human or superhuman performance levels in these areas.
Does this mean human researchers are no longer needed?
While automation is advancing rapidly in engineering tasks, the creative, conceptual, and hypothesis-driven aspects of research remain less automated. Human researchers are still essential for guiding research directions and interpreting results, though this may change as AI capabilities evolve.
How reliable are these benchmark results as indicators of future capabilities?
The benchmarks provide concrete, measurable progress in specific skills relevant to AI R&D. However, translating these capabilities into fully autonomous research processes involves additional challenges, and the timeline for such a transition remains uncertain.
What are the potential risks or downsides of automation in AI research?
Increased automation could lead to reduced human oversight, potential biases in AI-generated hypotheses, and ethical concerns about autonomous research. It also raises questions about the transparency and reproducibility of AI-driven discoveries.
Source: ThorstenMeyerAI.com