There is a lot of optimism around AI in Financial Crime these days. Most of us use large language models such as Chat GPT in our everyday lives, so we have enough hands-on exposure to understand what they are good at, where they struggle, and how quickly they can become useful when they are pointed at the right problem.
If you are running a Financial Crime function, expectations around AI adoption will have risen sharply. That pressure is coming from boards as well as from within your teams.
What has not moved at the same pace is the realisation of value in live production environments.
In most banks, there is plenty of AI activity, but far less that you could confidently say is properly live, governed, and changing how work gets done day to day. There are sandboxes, pilots and proofs of concept, and there are demos that look impressive in isolation. But when the conversation turns to ownership, control, and whether any of this is actually taking pressure out of BAU, things become much less clear.
That gap between expectations and outcomes is where many Financial Crime AI efforts are getting stuck.
From what we are seeing, the hardest part of scaling AI in Financial Crime is not the technology. It is finding the right balance between exploration and control.
Too much experimentation, left unchecked, leads to ‘pilot purgatory’. Promising ideas are played with and endlessly refined in sandboxes or Copilot tools, but never quite make the jump into BAU.
At the other end of the spectrum, heavy governance and production expectations are imposed before teams have properly understood the problem, and innovation and momentum are strangled at birth.
In practice, both extremes lead to the same outcome: limited value realised in live operations.
Bringing complex and emerging technology capabilities into a live environment requires progression through distinct stages, with increasing levels of delivery discipline.
This staged approach has been used for decades, but it has become more important as AI tools make experimentation far more accessible to non-technical teams. It changes who can generate ideas and how quickly they can do it, increasing the need for clear alignment between compliance, technology and risk functions.
What seems to work best is being explicit about the path from idea to production, and recognising that different stages need different rules. That journey naturally falls into three phases, each with a distinct purpose and set of expectations:
Stage 1: Open experimentation
Stage 1 is where experimentation is exactly what you want. Investigators, analysts and SMEs need space to explore ideas, test use cases and build confidence with the tools. This stage is deliberately open and lightweight. People are asking “what if we could?” and getting a feel for where AI might help and where it clearly does not.
The key point of difference today, is we now have tools that make it easier for non-technical colleagues to experiment and test their ideas.
Data use at this point is intentionally constrained, often limited to public or zero-PII datasets, and governance is minimal beyond basic data privacy and security expectations. The aim is discovery and fluency, not delivery. Success looks like insight and engagement rather than measurable operational impact.
The risk here is not experimentation itself, but allowing this stage to drift on indefinitely.
Stage 2: Controlled proof of concept
Stage 2 is the transition point for ideas that show promise. For those that do, a proof of concept can be defined with target benefits, a plan and an investment case.
This is where many banks struggle: the point at which a promising idea needs to move into a more controlled environment. Success at this stage depends on close collaboration between engineers and non-engineers (e.g. Front Office, Compliance) to design, test and deliver these solutions.
The next stage, production delivery, will be led by Engineers, so they must be part of the solution early on. That early engagement gives them the right context, builds shared understanding, and creates a sense of ownership that carries through into live delivery.
Scope tightens and expectations change, with clear target benefits being defined to secure the required investment. Data becomes more realistic, typically moving to high-fidelity synthetic data or masked historical data so that models are tested against something closer to real-world complexity. SMEs move from being idea owners to validators, pressure-testing whether outputs actually match operational reality.
Governance also steps up at this stage. Model risk considerations, internal audit interest and early compliance engagement start to matter. This phase is uncomfortable because it forces choices about investment, ownership and delivery. It exposes trade-offs and surfaces weaknesses that were easy to ignore in a sandbox.
But this is also the phase that prevents AI from either being pushed too quickly into production or dying in pilot purgatory.
Stage 3: Production delivery
Stage 3 is production delivery, and in Financial Crime that means more than simply getting something live. Production implies operational reliability, regulatory defensibility, and the ability to sustainably deliver and improve outcomes over time.
By this stage, ownership is clear and the use case is tightly defined. Live, integrated production data feeds replace test datasets. People interact with the capability as end users, and they need to trust that it supports their controlled activities rather than creating new risk.
Governance increases again. Standard delivery processes apply, including full audit trails, change control and ongoing monitoring. Model behaviour needs to be observable, bias needs to be monitored, and decisions need to be explainable. Success is measured in terms of stability, accuracy over time and regulatory credibility.
Progress will come from being explicit about these stages and disciplined about the transitions between them.
Open experimentation builds fluency and surfaces ideas. Controlled proofs of concept determine which ideas are worth investing in. Production delivery turns a small number of validated use cases into something that genuinely changes how the function operates.
High-volume, judgement-heavy activities with obvious pain points are usually the most productive places to apply this approach, particularly where governance and evidencing can be designed in from the outset. Alert triage, investigative support and certain elements of KYC often fall into this category.
The organisations we believe are most likely to move ahead will be those that give teams space to explore, while being clear about when exploration ends and engineering begins. Those that bring together the right teams at the right time, in a way that supports delivery rather than slowing it down.
Getting that balance right will help avoid pilot purgatory and give AI the best chance of delivering tangible value in live operations.
