Working In Uncertainty

Time to put numbers on internal controls: Supporting risk judgements with quantitive risk analysis

by Matthew Leitch, first appeared on in July 2005.


There's a difference in culture and approach between two of the major groups involved in ‘risk management’ in its widest sense.

On one side are the insurance specialists, busy putting numbers on risk. Their high priesthood, the actuaries, like mathematics and thirst for empirical data.

On the other side are the accountants and auditors, busy putting words on risk. Their high priesthood, the audit partners in big external audit firms, rely on judgement and seek ‘comfort’.

Why this difference when, in effect, external audit is a form of insurance too? The differences are that external audit firms very rarely receive ‘claims’ (i.e. get taken to court) and the claim is decided by a court after extensive analysis of the auditor's actions. An insurance company receives many claims and just pays out if the terms of the policy are met.

Consequently, the auditor's strategy is based around leaving no evidence that a hostile lawyer could exploit. Although the big firms have occasionally flirted with quantitive methods, for example for setting audit sample sizes, they have shied away from them, preferring to obscure their decisions under the general heading of ‘professional judgement.’

Risks are described as ‘high’, assurance is often described as ‘reasonable’ (or ‘high’ if necessary), and control weaknesses may be deemed ‘significant’, ‘serious’, or even ‘material.’ None of these key terms has any quantitive basis.

The problem with ‘professional judgement’

As far as I know ‘professional judgement’ as exercised by auditors and controls specialists from that background (including me) has never been systematically tested. If it were to be what might be found?

Judgements of probability have been studied in a few other professions. By far the best data comes from studies of weather forecasters. These have shown that people who routinely make probability judgements and get feedback on their accuracy can become what is called ‘well calibrated.’ This means that, for example, if you take all the instances when the forecaster has said the probability of rain is 50% then in fact there was rain on about 50% of those instances.

(You might think that makes their probabilities good ones. Not necessarily. There is rain over Cambridge on about 50% of days. If a forecaster says the chance of rain is 50% every day he/she will be perfectly calibrated but how useful is that?)

The key point is that this good calibration is rare and the result of long practice and quantitive feedback. Controls specialists get a lot of practice but next to no quantitive feedback so it seems unlikely that they will be well calibrated.

Indeed research generally into human judgement paints a bleak picture. We are biased in favour of confirming what we already believe. We cannot weigh more than two or three factors simultaneously without being inconsistent.

The need for empirical support

For ‘professional judgement’ the precedents are discouraging and we should consider the controls specialists guilty until proven innocent.

For all anybody knows organisations across the world could be spending millions on audit work and controls they do not need, while doing nothing on controls that matter more than anyone realises.

If I were a lawyer building a case against an audit firm I would argue that the firm had been negligent in not collecting and making use of empirical data to support its judgements. The fact that other auditors have also failed to do this is no defence because other risk management professionals, such as those dealing with insurance, health, safety, and project management, have all made efforts to gather and use empirical data – as any educated person would expect.

A lot can be done

The data we need is all around us, collected already. Virtually all large organisations collect data on processing errors, fraud, safety incidents, and so on. Usability testing generates thousands of statistics a year about error rates from different kinds of computer interface in different situations and tasks. Manufacturers test their computer systems and peripherals providing extensive information about their reliability. Telecom companies continually monitor their network availability. Internal and external auditors perform hundreds of thousands of audits a year, gathering information about millions of internal controls, and investigating hundreds of thousands of errors and a smaller number of frauds.

There's no shortage of data.

The trick is simply to pull some of it together into a usable form. I have two suggestions.

Suggestion 1: Controls designers' risk tables

Imagine you are tasked with designing controls over some process or system and estimate that one of the controls you want would cost 100,000 to implement but would save money if your assumptions about the number of errors it will have to handle are correct. You are challenged on the need for spending 100,000.

Wouldn't it be helpful to be able to refer to tables of error rates for different types of work and system derived from research? It may be that the tables do not cover the exact situation you are looking at, but even having something slightly similar would give you a starting point. Let's imagine the tables say 1 in 50 invoices will be wrong without the control you are thinking of, but in your case the risk factors are slightly worse than those in the table. At least you have a starting point and that's a lot better than nothing, which is what we have today.

Suggestion 2: Research risk factors in your organisation

Consider the benefits of understanding and quantifying how risk works in your organisation. For example, you may have the view that staff turnover leads to more mistakes and bigger backlogs of work. It drives risk and productivity. But how much? Does it matter who changes their job? Is there an interaction with the complexity of work? When weighing decisions about staff how important are these effects?

Another example concerns the effectiveness of attempts to improve controls. Did past attempts actually improve control or did it just encourage people to change their priorities and let quality problems show somewhere else? For example, after being pushed for greater accuracy people often slow down and this increases incidences of lateness. Taking all things into consideration – including changes in workload, customer and supplier originated errors, new developments, staffing changes, and so on – did our attempts to improve controls actually work?

Or consider workload. If workload is related to mistakes and backlogs then can the impact for process performance of taking on new business by quantified? Is it possible to say how much effort might be needed on controls development to meet new requirements?

One final example concerns money lost through undetected billing errors. In telecom this is called ‘revenue leakage’ and it has been searched for extensively over the last several years. A big part of solving revenue leakage problems is deciding where to search for them. Common sense says that some risk factors make leakage more likely, but the problem is that nobody knows for sure which factors are most important, and how much weight to put on each factor. Research is needed.

Already, leading banks seeking to measure Operational Risk accurately are gathering and analysing data in an attempt to quantify the sort of problems that internal control has traditionally tackled. Although the objectives of this work are narrow and concerned with compliance, perhaps we will see some exciting progress nonetheless.

Useful techniques

To get the most out of our data (for either of my suggestions) we need to use multivariate statistics to tease out the impact of different factors on error and other risks. Fortunately, there are several great software packages that offer a range of techniques for visualising and quantifying these multiple relationships.

Although the mathematics and algorithms underlying these packages are often complex and hard to understand, we don't have to understand everything to use it effectively. (Just like you don't have to understand how a car works to drive safely from A to B.)

The techniques available go far beyond fitting straight lines to scatterplots as most of us did at school. The model doesn't have to be a straight line. It can rely on a potentially large number of variables, some of which are numbers while others are categories. This is an exciting area of development with new ideas being tried all the time.

Neural networks are still an important area. Decision trees (e.g. C4.5, CART) use information theory to identify the most important variables. Statistical learning can also be done using Kernel machines (not actually a physical machine) such as Support Vector Machines.

Many of these tools work best when give a large amount of data to learn from (e.g. hundreds or thousands of examples). However, there are good reasons to consider using mathematical methods even where the amount of data available is low.

One problem with human judgement is that we struggle to weigh more than two or three variables at once. We try to eliminate variables from consideration by finding one or two that are decisive on their own, or by pairing up pros and cons in the hope that we can ignore those that seem to balance. But, often, none of these strategies is applicable or safe.

Using even very crude mathematical formulae can be advantageous. The formula does not make mistakes, succumb to confirmation bias, or give way to special pleading. If we want to make objective decisions this is very useful.

Some traditional statistical methods do not give sensible results on the basis of small amounts of empirical data, or simply refuse to provide any estimates at all. In contrast, Bayesian methods involve starting with an initial view and modifying it as data are received. Consequently there is always some view to use.

Another fascinating approach is the use of ‘fast and frugal’ algorithms that analyse the database of past experience only when a prediction is required. One interesting example is PROBEX (Probabilities from Examples) which relies on a similarity function and then computes probabilities as a similarity weighted frequency. Although it does not discriminate between important and irrelevant variables it still performs well, giving results close to human judgement but without the mistakes and biases.


For too long internal control and audit specialists have relied on ‘professional judgement’ and failed to seek and use empirical evidence relevant to their judgements about risk and control.

The data to remedy this are all around us and today's powerful, yet easy to use, statistical tools give us the best chance ever to make sense of this data.

Further reading:

Fast and Frugal use of Cue Direction in States of Limited Knowledge’ by Magnus Persson and Peter Juslin.


I would like to acknowledge the influence of Colin Tuerena at British Telecommunications plc for highlighting the value of empirically testing beliefs about risk, and of Michael Mainelli of Z/Yen Limited, who introduced me to Support Vector Machines.

Made in England


Words © 2005 Matthew Leitch.