I've been thinking. Occam's Razor isn't just thing we use because we prefer simpler situations, it works mathematically.
Suppose there are two models that accurately describe a set of observations. Both work but are unproven. Model 1 explains the observations in terms of variables X and Y. Model 2 explains the observations of variables A, B and C.
Since Model 2, uses three variables compared with two in Model 1, Model 2 is clearly more complicated.
The variables A, B, C, X and Y are unproven so the probability that they are true is less than 1. Say, just for sake of argument that the probability of each of them being true is 0.9.
In order for the models to be true, the variable in terms of which they are described must be true. It follows that the probability of them being true is the product of the probabilities of the variables involved being true.
P(Model 1 is true) = P(X is true, Y is true)
=0.9 × 0.9 = 0.81
P(Model 2 is true) = P(A is true, B is true, C is true)
=0.9 × 0.9 × 0.9 = 0.729
Model 2 is the more complicated one and it has a smaller probability of being true than the simpler Model 1.
It is intuitive from basic arithmetic, that since probabilities cannot be greater than 1 by definition, the more variables involved, a the more probabilities involved in the product, in general, the smaller the probability of the model being true. Therefore, simpler models, ie ones with less variables, have higher probabilities of being true and are therefore the best ones to use without any other deciding factors. Just as specified in Occam's Razor.
Note that Model 1 hasn't been proven to be true while Model 2 is false, it simply states that based on the evidence available, which describes both models as being equally effective at explaining the observations, Model 1 is more likely to be true than Model 2.