
I. Overview
American companies are rapidly pushing the boundaries of artificial intelligence, creating unprecedented potential for material abundance and social flourishing. But this rapid technological advance sits atop a growing market failure, as AI labs are incentivized to race ahead technologically without fully understanding how AI systems work and when they may go wrong. Fortunately, this risk can be significantly reduced through scientific study of AI’s mechanics and other measures.
In the last year, both the Trump Administration and Congress have published productive visions of AI policy.1,2 Below, we build out recommendations from a few consensus priorities.
Terms
|
II. Problem
America’s leading AI labs are incentivized to make AI smarter first and to understand its intelligence second. As the White House Action Plan explains, “Technologists know how LLMs work at a high level, but often cannot explain why a model produced a specific output.”3 Errors follow, like models that override intended guardrails to facilitate illegal or violent activity and models that effectively lie.4,5,6 But as extraordinary volumes of private capital flow to the AI race, less flows to fixing this problem.7
As AI systems grow smarter and more capable, the consequences of their mistakes will grow more severe, risking devastating mistakes and rogue actions. The likelihood of public backlash, overregulation, and stalled progress will rise too. Researchers who work in interpretability in academia and at industry labs have achieved foundational breakthroughs but call for accelerated research.8,9 (The fields of control and robustness are even less developed.) Models are learning to deceive people more quickly than we are learning to understand them, and adding urgency to the problem, experts predict a significant step-change in AI capabilities in 2027.10,11
When important engines of economic growth come with widely shared pains, we must respond with invention and innovation. Responding to these “externalities” with excessive regulation would throttle AI progress and competition, which more aggressive proposals on AI already risk.12 Innovation offers a better way, helping build a better product without those pains — like how American manufacturing or energy has become less pollutive.
Invention and innovation can be steered by institutions and policies. To that end, we suggest measures to build thriving areas of research and practice in interpretability, control, and robustness:13
III. Recommendations (Research and Application)
Research — Establish a federal research program, growing over time to $1B/yr,14 dedicated to robustness, interpretability, and control advancement. This funding would be deployed across multiple forms, paces, and agencies according to executive discretion,15 assuming some baseline mix of:
- long-term academic grants for slow, fundamental science,
- shorter-term applied projects with frontier labs under cost-recovery contracts,
- and prize or challenge programs to incentivize specific breakthroughs, especially at Focused Research Organizations.16
Codify and scale research tools. A scaled National Artificial Intelligence Research Resource (“NAIRR”) — which provides compute, data, secure testing environments, and more to researchers — will help build a broader and decentralized research ecosystem, allowing more great ideas to come from more places. This NAIRR would be best complemented by an expanded National Deep Inference Fabric, a partnership pilot between government and academia that allows researchers to experiment nimbly with some of the largest-scale models.17
Reduce regulatory burden on innovation. Regulators and grant managers should be empowered to reduce red tape or quickly clarify regulations — especially for data-sharing — at the request of researchers, provided they closely track, monitor, and report such instances to Congress and ensure sensible substitute guardrails.18,19
Practice — Encourage voluntary disclosure standards for new models. The Center for AI Standards and Innovation (CAISI) should recommend new transparency standards to encourage labs to disclose the size and composition of their relevant research teams and current research projects, their testing methods and whistleblower protections, and their preferred control and robustness methods — a modest extension of the System Cards they already publish.20 These transparency standards would facilitate healthy dialogue and scrutiny across the ecosystem.21
Systematize regular tests of leading and sensitive AI systems, known as “red-teaming,” that are currently done informally. This would best be executed through experienced agencies like Intelligence Advanced Research Projects Activity (IARPA) and CAISI, perhaps in concert with private parties. These red-teaming exercises would probe model susceptibility to threats like jailbreaking and sleeper agents and help build in-house expertise in government.
IV. Risks and Politics
Americans are widely concerned about the pace of AI’s development and the risks of major, damaging events, and these numbers do not vary meaningfully by party.22 However, the public is generally ambivalent on additional R&D in science and technology, except for strong support among the most highly educated.23 It stands to reason that communicating that some action is being taken to reduce risks will be more popular than detailing exact solutions.
Federal research programs come with some political risks, as failed or unusual research projects tend to attract public scrutiny.24 Given that many tech leaders are controversial, contracts made with their organizations may become a particular lightning rod. Bipartisan buy-in, high program integrity, and strong relationships between program administrators and their congressional committees of oversight should help weather any incidents — and is preferable to programmatic risk-aversion.
1 The White House, America’s AI Action Plan
2 Bipartisan House Task Force on Artificial Intelligence, Bipartisan House Task Force Report on Artificial Intelligence
3 The White House, America’s AI Action Plan
4 TIME, AI Chatbots Can Be Manipulated to Provide Advice on How to Self-Harm
5 Anthropic, Alignment faking in large language models
6 SecurityWeek, New AI Jailbreak Bypasses Guardrails With Ease. Results appear credible, but note that the research was conducted by a firm that sells robustness technology.
7 LessWrong, An Overview of the AI Safety Funding Situation
8 Neel Nanda, 80,000 hours. Nanda calls first and foremost for rationalized research — that is, diversified, methodologically sensible, and not overoptimistic — adding important nuances to the general consensus.
9 OpenAI, Detecting and reducing scheming in AI models
10 Dario Amodei, The Urgency of Interpretability
11 Metaculus, When will the first weakly general AI system…
12 Dean Ball, Turning a Blind Eye
13 We deal with recommendations on top talent elsewhere — see our explainer: Innovation: Artificial Intelligence Talent (Private Sector)
14 A mostly illustrative suggestion of size derived from the federal government’s considerable overall R&D budget — including $200B in annual R&D funding and over $50B appropriated under the CHIPS and Science Act — and its modest AI research budgets to date.
15 A clear, empowered program leader will be necessary to ensure maximum success.
16 Federation of American Scientists, Focused Research Organizations
17 NDIF, NSF National Deep Inference Fabric
18 An example may include expedited FTC guidance clarifying that narrow information-sharing between researchers at labs does not constitute an anti-competitive act.
19 Interviewees underscored that good-faith, rather than combative, oversight would be key to enable innovation in research methods.
20 See OpenAI, GPT-5 System Card
21 TIME, 4 Ways to Advance Transparency in Frontier AI Development
22 Artificial Intelligence Policy Institute (AIPI), Poll Shows Overwhelming Concern About Risks From AI... Concern is generally evident across polls.
23 Private data from Blue Rose Research
24 NPR, Shrimp On a Treadmill