by Nigel Cannings
Financial Institutions that fall foul of compliance regulations were fined $2.7 billion in 2021 globally, according to Forbes. This included violations of Anti-Money Laundering (AML) rules, Know Your Customer (KYC) checks and operating guidelines. As financial regulation increases to ensure the safety and security of the financial system and to protect consumers, financial institutions must consider how they can best protect themselves from compliance-related fines.
The latest technologies and automation, including voice recording, have the potential to be a crucial component of the Regulation Technology (RegTech) ecosystem and to help the finance industry monitor compliance. But how can voice recording be used, and why is it so effective?
Why record at all?
As part of an overall electronic communications capture requirement, voice recording technology is mandated in a number of different areas of financial services, and even where it is not strictly required, can be used not only to help manage risk but also to boost performance. This stretches from the trading floor on the investment banking side, to marketing new products to retail and insurance customers.
The pandemic has seen an explosion in the need to plug gaps in companies’ recording strategies: in January 2021, the UK’s Financial Conduct Authority released Market Watch 66 highlighting that home working employees were more at risk of falling outside of the regulations than those working in a traditional office environment, and reminding companies that the recording obligations still applied. The infrastructure changes to support this were not trivial, with massive extra capacity being laid on by MVNOs such as the now-beleaguered Truphone to support mobile recording, and a rush to install “Compliance” recording for Teams, a technology that was very much in its infancy in March 2020. And the pressure to capture online meetings is still growing.
But this massive investment is just the beginning of the story.
How is voice recording already being used in compliance?
The recording of voice calls is nothing new for financial institutions. Nor is its monitoring. But while automated means of scanning emails and other text documents have existed for decades, this has traditionally been a manual process for audio files. It is not feasible for every single call recording would need to be manually listened to. Even a sample size of 5% would require a large team to manage that process in most financial organisations, so companies have agreed to a “dip sampling” approach with regulators whereby they listen to an agreed amount of audio every week.
Some companies adopted phonetic search engines to power an automated process of picking out keywords and phrases, but this was often plagued by over-retrieval and inaccuracy. All of this leaves huge potential for compliance failings.
The role of machine learning and speech recognition in compliance monitoring
Since the days of phonetic search, there have been huge strides in speech recognition technology. The application of Convolutional Neural Networks and other Deep Learning approaches to the area has transformed the ability of machines to understand humans, and it allows this across a much wider range of domains and accents than was previously possible. This is particularly important for telephony quality speech, which has traditionally been very hard for computers to turn into something meaningful. There is still a long way to go, however, so the journey is by no means over. Particularly on the trading floor, the quality of audio and its subsequent storage can be shocking, to the level that a human can often not understand it. A lot of research is going into ways to restore this audio to the point where it can be better understood by machines and humans alike.
All of that said, speech recognition technology is very much at the point where it can serve a very useful function in an overall compliance monitoring strategy, and at a time when those very strategies are evolving because of the rise of Artificial Intelligence. The historical gold standard of automated compliance monitoring systems was to use a series of keywords and phrases that were felt to be “risky” and to search through the text to find them. Modern speech recognition systems are now very capable of fitting into this strategy, especially when using a feedback loop to identify common mistranscriptions to help the system “learn” where it has gone wrong. All companies have their own language, so it is important to help the system understand what that is.
But compliance monitoring systems are now moving beyond this crude approach, and are looking not only at the content of a communication, but its context, which means that communications need to be accurately mapped between parties to ensure that a series of communications can be analysed as one. This leads to a second major advance in speech technology due to neural networks, that of biometric identification.
The idea of “My voice is my password” is not new: extracting features from the human voice to identify some sort of voice “fingerprint”. But the process of doing it generated a lot of false positives and misidentifications, especially where the quality of the data was poorer. Modern neural networks are significantly better at filtering out the “noise”, both literal and figurative, in an audio signal, and identifying what is key to identifying a person.
This is important, because metadata associated with phone calls is often inaccurate or missing, and so we can use the power of biometric analysis to work out who was or was not on a call or video meeting, to better stitch communications together.
Why is voice data so useful?
There are many more areas outside of just compliance, such as fraud prevention, training, customer profiling and service, that voice data can be used for, and that comes down to the things that are unique about it. Conversation provides its own narrative: How we express ourselves, what we say or don’t say, how quickly we say it, and how we react to questions. All of these give insight into the state of mind of the person speaking in a way that is lost from mere text communication like email.
Add to that the way in which we say things, our pitch and tone, and the timbre of our voices. One study from UC Berkeley found that voice can fit into over two-dozen distinct categories of emotion. This tone of voice and speech patterns can also be monitored through voice data. And this can be particularly important when seeking to determine why something was said. Is a service operative exchanging pleasantries and making a light-hearted joke, or are they being inappropriate? Did the customer really understand the terms they were agreeing to, or should the service agent have spent more time explaining them? Are your agents providing value-added service, or are they miss selling?
How voice recording and voice AI is a crucial component of the Regulatory Technology ecosystem for the finance industry
The advancements in conversational AI and machine learning mean that all of the data gathered over voice and video channels are easily accessible in volume and real-time. When it comes to compliance, financial institutions can no longer take a chance on selective sampling. Compliance officers are bound to monitor and verify customer interactions. Where voice recordings are combined with AI, businesses can deploy biometrics, behavioural analytics, metadata, and search, providing their teams with the ability to rapidly detect or identify information that is similar or identical. Allowing vulnerable customers to be identified in real-time and provided with a tailored service. Further protecting businesses against the risk of non-compliance.
One further area to touch upon is the field of explainable AI. One of the key features of the General Data Protection Regulation (GDPR) as enacted in the EU and the UK is that decisions made by a machine must be capable of being challenged by a human, and the reasoning for the decision has to be given. In a rules-based system, this is reasonably straightforward. If I apply for a Platinum Amex Card and I have significant undischarged debts, it is most likely that I will fail an affordability threshold based on my income and debts. While that may be automated, it is explainable.
But if a system monitoring my communications says I am becoming angry with customers, and because of that I am disciplined or I lose my job, I want to be able to understand what it was that made the machine reach that decision. And unfortunately, the modern neural network is a classic “black box”, only capable of taking an input and rendering output. Anyone looking to implement decision-making systems that are reliant on AI must ensure that they build in explainability from the core.
The insurance and financial industries are under significant scrutiny. Regulatory compliance and customer protection have become key priorities. RegTech empowers compliance. And it is underpinned by voice data and AI. Helping businesses mitigate risk, enhance processes, improve data collection and storage, and remain compliant is crucial, without putting undue pressure on employees, or compromising the quality of customer interactions.
The author, Nigel Cannings is the CTO at Intelligent Voice.