US Government to Pre-Release Test AI Models from Major Tech Companies: What You Need to Know

In a landmark move for AI governance, several leading artificial intelligence companies—including Google, Microsoft, xAI, OpenAI, and Anthropic—have agreed to allow the US government to test their AI models before public release. This initiative aligns with the AI Action Plan under the Trump administration and builds on existing evaluation partnerships. Below we explore key questions about this unprecedented collaboration.

1. What exactly is the agreement between AI companies and the US government?

The agreement commits major AI developers to voluntarily submit their most advanced models for pre-release safety testing by a US government evaluation center. This is not a legal requirement but a cooperative measure to identify risks—such as bias, misuse, or security vulnerabilities—before models are deployed widely. In exchange, companies receive early feedback and can adjust their systems, while the government gains insight into emerging AI capabilities. The center, which began working with OpenAI and Anthropic in 2024, now includes Google, Microsoft, and xAI under new terms that reflect priorities in the AI Action Plan.

US Government to Pre-Release Test AI Models from Major Tech Companies: What You Need to Know — Source: www.tomshardware.com

2. Which companies are participating and why?

Five major players are on board: Google, Microsoft, xAI (Elon Musk’s venture), OpenAI, and Anthropic. Each has its own motivation: Google and Microsoft want to demonstrate responsible leadership; xAI seeks to build trust amid regulatory scrutiny; OpenAI and Anthropic had existing partnerships with the evaluation center dating to 2024 and renegotiated to align with the new administration’s goals. By participating, these companies help shape the testing framework, gain a competitive edge in safety, and avoid potential mandatory rules. The collective agreement signals a shift toward proactive AI governance that balances innovation with public safety.

3. How does this align with Trump's AI Action Plan?

The AI Action Plan, issued under the Trump administration, emphasizes American leadership in AI while ensuring safety and national security. It calls for voluntary industry commitments to prevent harmful outcomes without stifling progress. The pre-release testing deal directly supports these goals by creating a public-private partnership for risk assessment. OpenAI and Anthropic renegotiated their previous evaluation agreements to align with the plan’s priorities, such as focusing on dual-use risks and export controls. The plan also encourages transparency—companies share test results with the government but not necessarily the public, protecting proprietary information while enabling oversight.

4. What role does the existing evaluation center play?

The US Evaluation Center, likely part of the AI Safety Institute (AISI) or a similar federal body, acts as the technical hub. It designs benchmarks, runs simulations, and assesses models for dangerous capabilities—like cyberattack automation or persuasive deception. Since 2024, the center tested OpenAI’s GPT-4o and Anthropic’s Claude 3, building methodologies now extended to other companies. Its role has expanded: it now coordinates with national labs, the Department of Defense, and intelligence agencies to ensure testing covers catastrophic risks. By hosting pre-release evaluations, the center provides a neutral, expert-driven filter before models reach millions of users.

5. How will pre-release testing work in practice?

Companies will share their latest models with the evaluation center under non-disclosure agreements. The center runs a battery of tests: red-teaming for security flaws, bias audits, and stress tests for unexpected outputs. Results are reported back to the company, which can patch issues before public launch. The process is iterative—a model might go through multiple test-fix cycles. Testing does not grant government approval; it’s advisory. However, if severe risks are flagged, the company faces pressure to delay release. The exact timeline varies, but typical evaluations take weeks to months, depending on model complexity.

6. What are the implications for AI safety and innovation?

This agreement could raise the safety bar for the entire industry. By catching flaws early, it reduces the chance of harmful deployments—whether from biased algorithms or systems that enable misinformation. At the same time, companies worry that pre-release testing might slow innovation or leak competitive secrets. The voluntary nature and confidentiality clauses aim to mitigate these concerns. For the government, it builds institutional knowledge to craft future regulations. For the public, it offers greater reassurance that advanced AI won’t be unleashed without scrutiny. The balance between safety and speed will be closely watched as more companies join.