

The AI models at the frontier of performance are also the least transparent about how they are built and tested, according to Stanford HAI’s 2026 AI Index released Monday, which found that companies are sharing progressively less about training data and benchmark performance even as their models become more powerful and more widely deployed.
Summary
SiliconAngle reported that the 2026 index documents a world where AI adoption is accelerating at historic speed while “public trust in AI oversight and transparency hits new lows.” The two trends are directly related: as AI tools reach more than half the global population and generate $172 billion in annual consumer value in the US alone, the lack of visibility into how the most powerful models are built and evaluated creates a governance gap that neither regulators nor the public can easily close without the data to work from.
The benchmark problem is not abstract. If a model scores well because it was trained on test data, that score provides no meaningful signal about how the model will perform on novel tasks in deployment. For complex use cases like AI agents and robots, the report notes that benchmarks barely exist yet, meaning the most consequential AI applications are being deployed with almost no standardized external validation.
The opacity operates at multiple levels. At the training level, companies have reduced disclosure about the datasets, filtering methods, and human feedback processes used to build their models. At the evaluation level, they are choosing which benchmarks to publish results on, a selection that naturally favors the tests on which their models perform well. At the deployment level, independent researchers testing the same models sometimes find results that contradict what companies have publicly stated. The Stanford report does not name specific companies but documents the pattern as industry-wide.
Two years ago, frontier AI models were research tools used primarily by developers and researchers. Today they are integrated into customer service systems, hiring workflows, medical information delivery, financial advice, and legal research. The gap between benchmark performance and real-world performance is no longer an academic concern; it determines whether the systems that millions of people interact with daily are actually doing what their developers claim. The report’s finding that responsible-AI benchmarks are the category companies most often decline to publish results on is precisely the category that matters most for those real-world applications.
As crypto.news has reported, the AI infrastructure buildout is advancing faster than the governance structures designed to evaluate it, a tension that is visible in both investment markets and public policy debates. As crypto.news has noted, the competitive pressure among frontier AI labs to release capable models quickly creates structural incentives against transparency, because publishing benchmark weaknesses or training methodology details can be exploited by competitors. Stanford’s report frames that dynamic as the central accountability problem of the current AI era, with 47 countries now having introduced AI-specific legislation but only 23 having enacted laws with active enforcement mechanisms.






