Thinking of investing in AI? Data governance is the new due diligence

For investors looking to understand where AI-driven value will be created in financial services, the answer increasingly starts beneath the surface, in the data infrastructure that underpins every decision. For decades, the industry produced regulatory reports, satisfied the minimum compliance requirements, and kept regulators content, all while knowing that underneath it all, their data infrastructure was, at best, on shaky ground. 

This amounted to a persistent blind spot as capital was allocated based on outputs that were compliant, but not necessarily reliable or reproducible. What is now becoming clear is that this was not just a technical issue, but a systemic underinvestment in a critical layer of enterprise infrastructure. As AI adoption accelerates, that layer is being rebuilt in real time, and in doing so, is emerging as a distinct and rapidly expanding category in its own right.

In practical terms, every dollar committed to AI is beginning to pull through additional spend on data governance and lineage. It’s only now therefore, as the finance sector looks to lead the way in terms of AI adoption, that CDOs are having to rethink their entire approach to data infrastructure and investors are starting to differentiate between firms that are AI-ready and those that are not. 

Shifting priorities and old bargains

Give a generative model unreliable data, and it will confidently deliver unreliable outputs at scale. Point it at fragmented sources with no clear lineage, and you’ve built an audit nightmare that no regulator will tolerate and for markets, that kind of failure does not just trigger scrutiny, it destroys trust. The transparency that compliance teams have been advocating for years has become a technical prerequisite for deploying AI safely, and that haschanged everything.

When compliance and regulatory reporting first moved beyond manual processes, most organisations took a pragmatic view. Build enough infrastructure to meet the requirementsand avoid asking uncomfortable questions about what’s happening upstream. Data lineage was often documented after the fact, if at all. Governance frameworks existed on paper but rarely reflected operational reality.

This approach had a certain logic to it. Fixing the underlying data estate was expensive, difficult to explain to budget holders, and offered no immediate return. Besides, regulators want evidence of control, not architectural perfection, so organisations optimised for the path of least resistance in delivering what was required rather than what was optimum. For a long time, it worked reasonably well, but AI has now rewritten the equation entirely.

What AI actually requires

Generative AI models are extraordinarily good at finding patterns in data, which also means they excel at learning the wrong patterns if your data is messy, fragmented, biased, or poorly understood. When an AI system is making lending decisions, pricing derivatives, or assessing credit risk, governance needs to be watertight.

AI initiatives demand clean, traceable, and well-governed data for three interconnected reasons. First, model training requires knowing exactly where your data comes from, how it’s been transformed, and whether it’s fit for purpose. You can’t train a reliable model on data you don’t understand. Second, AI outputs need to be explainable, especially in regulated environments. If a model declines a loan application or flags a transaction as suspicious, you need to trace that decision back through every transformation and source. Third, AI systems need to be monitored and updated as data changes over time. Without clear lineage, you’re flying blind.

This goes beyond compliance into engineering necessity. Perhaps the most cited stat of last year was MIT Technology Review’s finding that 95% of AI projects fail, and poor data quality is consistently a primary cause. For investors, that translates into misallocated capital, delayed returns, and inflated expectations around AI-driven growth. Each failed initiative is not just a technical miss, but a signal of unmet demand for infrastructure that can make AI reliable at scale.

Organisations are discovering that their biggest barrier to AI adoption isn’t compute power or talent, but rather the messy data infrastructure they’ve been ignoring for years has become the bottleneck.

This is the gap between AI experimentation and AI deployment, or put more bluntly, between AI promise and production reality. In Silicon Valley terms, data governance is becoming the new technical due diligence for AI. The widening gap between AI ambition and data readiness is creating a distinct market opportunity for platforms that can bridge it quickly and at scale.

Regulators are paying attention

If the technical requirements weren’t enough of a forcing function, regulatory pressure is mounting fast. As AI becomes embedded in financial decision-making, oversight bodies are shifting from principles-based guidance to concrete expectations about how these systems should operate.

The Basel Committee’s BCBS 239 principles have long required banks to demonstrate robust data aggregation and risk reporting capabilities, including clear data lineage. As AI integrates into risk functions, those principles apply with renewed force. The EU’s AI Act alsointroduces explicit requirements for high-risk AI systems, including traceability, transparency, and human oversight and the UK’s Financial Conduct Authority has signalled it expects firms using AI to demonstrate they understand how models reach conclusions and can evidence their decision-making processes.

The direction of travel is unmistakable as regulators will not tolerate black-box AI in the financial services. They want to see end-to-end lineage, robust governance, and proof that organisations can explain and audit AI-driven outcomes. Organisations that can’t demonstrate this will find their AI ambitions curtailed by regulators long before they reach production.

Shifting the mindset from reporting to competitive advantage

When clean data lineage directly enables faster AI deployment, governance becomes competitive advantage. When poor metadata management blocks innovation, fixing it becomes strategically urgent. Organisations with mature data governance can move quickly, experiment confidently, and scale AI initiatives without hitting compliance roadblocks. 

I recently spoke with a major financial exchange that undertook a comprehensive data transformation programme. They faced the familiar challenge of a sprawling data estate across legacy systems and business units, limited visibility into data flows, and mounting regulatory pressure under frameworks like DORA and BCBS 239. The immediate driver was compliance, but the real prize turned out to be commercial. By establishing clear, granular data lineage, they reduced key risk indicators and improved operational resilience through real-time impact analysis when data feeds or systems changed. More significantly, they unlocked new revenue opportunities by exposing trusted metadata to customers as a differentiated service, transforming what began as a compliance exercise into a driver for growth.

In effect, they turned governed data from an internal control mechanism into an external asset. This is the kind of transformation investors should pay attention to, infrastructure spend that converts directly into new revenue lines and differentiated products. Examples like this are becoming more common, as firms realise that once lineage is established, it can be reused across multiple high-value use cases, compounding returns on the initial investment.

The commercial momentum behind this shift is already visible in the market. The dataset lineage tracking market has experienced rapid expansion in recent years and is projected to increase from $1.53 billion in 2025 to $4.14 Bn by 2030. This growth is being driven by stricter regulatory compliance demands, the rise of big data analytics, increasing focus on data quality and accuracy, and a growing need for comprehensive data visibility across enterprises. 

Now add in the prospect of an easier, more efficient AI deployment, with less risk, and morecommercial opportunity, compliance teams, once seen as gatekeepers, start to look like enablers of strategic initiatives.

The choice ahead

Organisations now face a straightforward decision, even if executing on it is anything but simple. They can invest proactively in data transparency, lineage, and governance, positioning themselves to deploy AI confidently while satisfying regulators. Or they can wait until their AI projects stall, their models fail audits, or regulators impose requirements that force expensive remediation under time pressure.

The proactive path requires confronting complexity that’s been deferred for years, breaking down silos, and investing in capabilities that don’t deliver immediate revenue. But this approach turns compliance into strategic advantage and unlocks AI’s genuine potential.

AI may turn out to be the forcing function the industry needed simply because it makes the cost of ambiguity too high to sustain. Organisations will adopt transparent, governed data practices, the only variable is whether they’ll do it deliberately or wait until they have no choice. The window for making that decision strategically, rather than reactively, is closing fast.


Discover more from RSS Feeds Cloud

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Discover more from RSS Feeds Cloud

Subscribe now to keep reading and get access to the full archive.

Continue reading