Matthew Leitch, educator, consultant, researcher SERVICES OTHER MATERIAL |
## Working In Uncertainty## Probability axioms in Bayesian languageThe gradual swing back from Frequentist probability thinking to Bayesian probability thinking has been quite slow for a number of reasons. One of these is that the language of the most famous, most copied introduction to basic probability theory is set in Frequentist language. The aim of this simple paper is to present the usual axiomatic basis of probability theory using simple language consistent with a Bayesian approach. A typical Frequentist presentation is shown side-by-side with a Bayesian version so that you can see the differences in language. Frequentist language is highlighted in red while the Bayesian alternative language I am suggesting is in green. (In case you don't know the Greek alphabet very well, note that σ is sigma and Ω is capital Omega.)
Most books covering this basic theory illustrate the ideas with some examples. Here are some in Frequentist and Bayesian language. Again, the changes are easy to make.
You probably noticed that the second example feels much more naturally ‘Bayesian‘ than the first, which is a typical Frequentist example. One final element of some introductions to probability theory is an attempt to explain what probabilities mean. This of course is where the Bayesian approach is fundamentally different to the Frequentist approach so I haven't added colour. Here are two alternative explanations.
Having got this basic introduction of axioms out the way there is still much about the modern Bayesian approach that needs to be explained. In particular, the approach usually focuses on some kind of system or process that can be observed, generating data from those observations, and which the analyst wants to represent with a mathematical model. What is the question, and what is the set of answers that would be used in the probability space? The question is a compound question (really two questions in one) that asks: ‘Which model is best and what data could the process produce?’ The answers will be every possible combination of model paired with a set of data that might be observed. The analyst will, in effect, use information about the probability of each combination of model and observed data to deduce how likely it is that each model is the best model, given the data actually observed. These probabilities will usually be represented by a distribution that says how likely it is that each model is the best of the set, and another, conditional, probability distribution that says how likely each set of data is given that each model is true. Made in England |

Words © 2014 Matthew Leitch