I think people overcomplicate this.
It's way to reverse our understanding of which thing we take as given and which we then consider to be the probability.
It does require that we know a lot of things to use it, which tends to be its downfall.
Suppose that we know that 30% of men are tall, call this our prior probability, and we want to know the likelihood that someone tall is a man, the posterior probability. Those are different things, which sometimes trips people up. (I.e. the likelihood that someone who tested positive is sick is different than the likelihood that someone who is sick tested positive, even though those sound pretty similar.)
To do that, we need to know how common it is to be tall and to be a man. Suppose that 20% of our population is tall, and 50% are men. So then we make a fraction using our assumption (men) over our observation (tall) and get .5 / .2 = 2.5, and we multiply that by our prior probability .3 to get .75. I..e if someone is tall, then 3/4 of the time, they are a man. (You can note that the numbers have to be consistent to make this work out. If you used the same 30% of men are tall observation but thought that 10% of the population was tall and 50% men, you would get a probability greater than one, so you know that your values were mutually inconsistent and impossible.)
In terms of intuition vs. plugging and chugging: here we see that assuming this is a man gave us a higher probability of being tall than the population as a whole, so the event tall and the event man are not independent. There must be more tall men than tall non-men in the same size sample of each to get the math to work out. The calculation tells us how much that distribution is one-sided.