I pretended the whole year that I understood Baye's theorem but in reality, I have no idea what the hell it is.

I think people overcomplicate this.

It's way to reverse our understanding of which thing we take as given and which we then consider to be the probability.

It  does require that we know a lot of things to use it, which tends to be its downfall.

Suppose that we know that 30% of men are tall, call this our prior probability, and we want to know the likelihood that someone tall is a man, the posterior probability.  Those are different things, which sometimes trips people up.  (I.e. the likelihood that someone who tested positive is sick is different than the likelihood that someone who is sick tested positive, even though those sound pretty similar.)

To do that, we need to know how common it is to be tall and to be a man.  Suppose that 20% of our population is tall, and 50% are men.  So then we make a fraction using our assumption (men) over our observation (tall) and get .5 / .2 = 2.5, and we multiply that by our prior probability .3 to get .75.  I..e if someone is tall, then 3/4 of the time, they are a man. (You can note that the numbers have to be consistent to make this work out.  If you used the same 30% of men are tall observation but thought that 10% of the population was tall and 50% men, you would get a probability greater than one, so you know that your values were mutually inconsistent and impossible.)

In terms of intuition vs. plugging and chugging: here we see that assuming this is a man gave us a higher probability of being tall than the population as a whole, so the event tall and the event man are not independent.  There must be more tall men than tall non-men in the same size sample of each to get the math to work out.  The calculation tells us how much that distribution is one-sided.
By the definition of conditional probability, P(B|A)P(A) = P(A&B) = P(A|B)P(B).

Divide through by P(A), then use the total probability formula on P(A), and that's Bayes.

That's all it is: a trivial algebraic rearrangement of a definition. Of course, much can be said about its real-world application and meaning, but if you understand conditional probabilities, then none of that will be surprising to you.
You say you understand the conditional probability, so if you have two events A and B, you know how to express the probability of `A` if we know that `B` has happened, and the probability of `B` if we know that `B` has happened:

* `P(A|B) = P(A∩B) / P(B)`
* `P(B|A) = P(A∩B) / P(A)`

Now, in the above, notice that `P(A∩B)` appears in both expressions. This gives us the option to express the algebraic connection between `P(A|B)` and `P(B|A)`.

First, we express `P(A∩B)` from the two equations above:

* `P(A∩B) = P(B) · P(A|B)`
* `P(A∩B) = P(A) · P(B|A)`

From here, we can equate the two expressions for `P(A∩B)` to get

`P(B) · P(A|B) = P(A) · P(B|A)`,

which gives us the option to express one conditional probability in terms of the other:

`P(A|B) = P(A) · P(B|A) / P(B)`.

And that is what the Bayes' theorem is: the algebraic connection between the following four probabilities:

* `P(A)` — the probability of event `A`;
* `P(B)` — the probability of event `B`;
* `P(A|B)` — the probability that `A` happens in the cases when `B` happens;
* `P(B|A)` — the probability that `B` happens in the cases when `A` happens.

So, the theorem literally falls out immediately from the definition of the conditional probability after extremely simple algebraic manipulation, but it is extremely important! Having the ability to connect these four probabilities turns out to be of extreme importance and you need it all the time when thinking about real-world situations.

&#x200B;

&#x200B;

By the way, the theorem is named after Thomas Bayes so it's **Bayes' theorem**, not *~~Baye's theorem~~*.
This is the odd-interpretation which I think is more intuitive.

Let's say you have a bunch of mutually exclusive and exhaustive possibilities. For each of them, you assign a number to them telling how much you believe in them, the numbers range from -infinity to +infinity, the more negative the less you believe in it. On this scale, only relative distance matter, so for convenient you could choose any time to shift down all the numbers on this scale by a constant amount.

When a new fact is given, this fact moves this scale, everything to the left (more negative). The amount of movement (how additionally negative it is) is the strength of evidence of this new fact in against each of the possibility. The strength of evidence is less negative if the chance of this fact happen conditioned on that possibility is high.

0 like 0 dislike