The Foundation Problem

Every business in Britain is being sold an AI strategy. Almost none of them have the data to make one work. This is the cost of building on the wrong foundations.

There is a finding in the academic literature that ought to be at the top of every executive's reading list this quarter, and almost certainly is not.

In a 2024 review synthesising 127 peer-reviewed studies and industry reports on AI implementation, the authors concluded that 68% of AI implementation failures can be traced back to data quality issues, with 43% of deployed systems exhibiting significant algorithmic bias rooted in the same problem (Mahmood & Ali, 2024). It is one of the most rigorous numbers available in the field, and it has been almost completely absent from the boardroom conversation about why AI initiatives are stalling.

Practitioner research from the same period agrees. Gartner predicts that 60% of AI projects will be abandoned through 2026 because the organisations running them lack AI-ready data, with the same Gartner survey finding that 63% of organisations either do not have, or are not sure they have, the right data management practices in place to support AI at all (Gartner, 2025). The Massachusetts Institute of Technology, through its NANDA initiative, published The GenAI Divide: State of AI in Business 2025 in July of last year, concluding that 95% of organisations have seen zero measurable return from generative AI despite an estimated $30 to $40 billion in enterprise spending (MIT NANDA, 2025).

The peer-reviewed academic literature, the largest practitioner survey of the year, and the most-cited industry analyst all converge on the same conclusion. The technology is not the problem. The data underneath it is.

What the academic community is saying

The most thorough academic treatment of the topic in 2025 was a review published in the journal Data, titled Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles (Crespo Marquez et al., 2025). The authors synthesised scientific and technical literature from 1996 to 2025, complemented by the international standards ISO/IEC 25012 and ISO 8000, drawing from PubMed, Scopus, Web of Science, and grey literature. The methodology is unusually rigorous for this subject area, which has tended to be dominated by vendor commentary rather than peer-reviewed work.

The review identifies five universal data quality dimensions that consistently emerged across every sector studied: accuracy, completeness, consistency, timeliness, and accessibility. These are not new. They are codified in ISO 8000, the international standard for data quality, and have been understood as the foundations of trustworthy data for decades. What the authors emphasise is that the consequences of failing on these dimensions have changed.

The review documents in detail the case of Unity Technologies' Audience Pinpoint algorithmic system. Poor input data fed into machine learning models produced faulty algorithms that ultimately resulted in $110 million in losses and a 37% decline in stock value. The lesson the authors draw is unambiguous: AI systems are only as reliable as their training data, and governance must begin at the data ingestion stage, not at the model deployment stage.

A separate body of academic work on the FAIR data principles — Findable, Accessible, Interoperable, Reusable — has been developed by an international consortium of universities including the Massachusetts Institute of Technology, the University of Manchester, the Technical University of Munich, the University of California Berkeley, and several others, working with national laboratories at Argonne, Berkeley, and Brookhaven (Ravi et al., 2022). The FAIR framework was originally developed for scientific data management, but its principles have become the basis for enterprise data governance in regulated industries throughout 2024 and 2025. The framework's central claim is that data which cannot be found, accessed, interoperated with, or reused cannot reliably underpin AI systems, regardless of the sophistication of the model running on top.

The picture from the academic literature is consistent. The foundations matter, the foundations are not in place in most organisations, and the consequence of building AI on weak foundations is precisely what we are now observing across the enterprise market.

What is actually broken

The conventional explanation for AI underperformance is that the technology is immature, the models are hallucinating, or the implementation team has not yet found the right use case. The evidence does not support any of those readings. The 5% of enterprises that have succeeded with generative AI, in MIT NANDA's analysis, are using the same models, the same vendors and frequently the same use cases as the 95% that have not. The difference, consistent across the academic and practitioner research, is the data underneath.

The shape of the problem is familiar to anyone who has worked in enterprise data. Customer records are duplicated across systems with no master record. Addresses are out of date. Phone numbers and email addresses are invalid. Consent and preference data is incomplete. Product hierarchies disagree between the ERP and the e-commerce platform. Finance reports a different revenue figure to operations because they are aggregating from different sources at different times. Data lineage is undocumented, so when a model produces an unexpected output, no one can trace which dataset is responsible.

This is not a new problem. It has been the subject of academic research and consulting engagements for thirty years. What has changed is the consequence.

Until recently, dirty data manifested as inconvenient dashboards, the occasional botched mail-merge, and quarterly arguments about whose number was right. The data sat in warehouses, looked at intermittently, mostly tolerated. With AI in production, the same data now actively generates decisions. It writes customer communications. It approves credit. It triages support tickets. It informs investment recommendations. It feeds agentic systems that take autonomous action on the data's behalf.

What was once a reporting problem is now an operational one. The academic literature has been ringing this bell, increasingly loudly, for the past five years (Hagendorff, 2021). The boardroom is now hearing it.

What good actually looks like

The interventions that close the foundation gap are well-understood, available to every organisation, and unglamorous to a fault. They are also, helpfully, supported by international standards rather than by competing vendor methodologies.

ISO/IEC 25012 defines data quality across fifteen characteristics organised into inherent and system-dependent categories. ISO 8000 specifies the requirements for data quality management. The FAIR principles set out how data should be structured to be reliably usable by both humans and machines. Together these standards provide a more robust, internationally agreed framework than anything any single vendor or consultancy is selling.

What that framework requires of an organisation is straightforward in description and demanding in execution.

Inventory the data assets that the AI initiatives actually depend on. Not all of the data, just the data that matters for the use cases being prioritised. Most organisations do not have this list, which is itself revealing.

Measure quality against those assets using the dimensions defined in ISO 8000 and ISO/IEC 25012. Duplication, completeness, validity, freshness, lineage. The metrics have been the same for thirty years; what has changed is that the consequences of measuring badly are now operational rather than analytical.

Establish named, accountable ownership for each critical dataset. The person responsible for the customer master record. The person responsible for the product hierarchy. The person responsible for the contract data. Without ownership, governance does not happen; with it, almost everything else falls into place.

Apply the FAIR principles to AI-relevant data, particularly Findability and Interoperability, which determine whether the data can be combined, queried, and used reliably by AI systems at scale.

Build continuous quality monitoring rather than annual audits. AI models in production need data quality signals measured in hours, not quarters. The tooling to do this exists at every price point.

Then, and only then, the AI implementation has a foundation to stand on. The 5% of enterprises capturing value from generative AI, in MIT NANDA's research, have done this work. The 95% that have not are being told, in increasingly direct language by their own boards, that they need to start.

Where Neurotic comes in

For most organisations, the data foundation problem cannot be solved by buying another platform or hiring another analyst. It is bridged by an independent technical team doing the unglamorous work that vendors do not sell and auditors do not catch. Inventory the assets, measure the quality against international standards, name the owners, build the monitoring, and document what good looks like for the business going forward.

Neurotic's data governance and data intelligence platform services are built precisely for this work. We are independent of any AI platform or data vendor, which means the recommendations are driven by what is actually in your business rather than what someone is trying to sell you. We work to internationally recognised standards (ISO/IEC 25012, ISO 8000, FAIR), not to vendor methodologies. If your AI initiatives have stalled at the proof of concept stage, or if your board is asking uncomfortable questions about why the investment is not showing results, the foundation is almost certainly where the answers are.

Talk to us → neurotic.co

References

Crespo Marquez, A., Sola Rosique, A., de la Fuente Carmona, A., Lopez Campos, M., Crespo del Castillo, A. & Crespo del Castillo, J. (2025) 'Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles', Data, 10(12), p. 201. Available at: https://www.mdpi.com/2306-5729/10/12/201 [Accessed 10 June 2026].

Gartner (2025) Lack of AI-Ready Data Puts AI Projects at Risk, press release, 26 February 2025. Available at: https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk [Accessed 10 June 2026].

Hagendorff, T. (2021) 'Linking Human and Machine Behavior: A New Approach to Evaluate Training Data Quality for Beneficial Machine Learning', Minds and Machines, 31, pp. 563–593. Available at: https://doi.org/10.1007/s11023-021-09573-8 [Accessed 10 June 2026].

Mahmood, A. & Ali, S. (2024) 'AI Data Quality and Bias: Challenges, Implications, and Solutions in Modern Machine Learning', peer-reviewed review synthesising 127 studies. Available at: https://www.researchgate.net/publication/386083845AIDataQualityandBiasChallengesImplicationsandSolutionsinModernMachine_Learning [Accessed 10 June 2026].

MIT NANDA (2025) The GenAI Divide: State of AI in Business 2025, Massachusetts Institute of Technology Project NANDA, July 2025. Reporting available via Fortune (2025). Available at: https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/ [Accessed 10 June 2026].

Ravi, N., Chaturvedi, P., Huerta, E.A. et al. (2022) 'FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy', Scientific Data, 9, p. 657. Available at: https://doi.org/10.1038/s41597-022-01712-9 [Accessed 10 June 2026].