A simple way to understand Bayes' Theorem
Check out this group of faces:

The probability someone’s smiling is 1/3 (the top row of five).
The probability someone has their mouth open is 4/15 (there are only four faces where we can see the whites of their open mouths).
Given someone’s smiling, what’s the probability they have their mouth open? We count five smilers. Out of those five, two have their mouths open: 2/5. Note that we typically say this as “probability of mouth open, conditional on smiling.” It’s written as P(mouth open | smiling) = 2/5.
What about the other way around? If someone has their mouth open, what’s the chance they’re smiling? We count four mouths open, two smilers among them. P(smiling | mouth open) = 2/4 = 1/2.
What’s the probability someone has their mouth open and is smiling, or P(smiling, mouth open)?
Well, we know the chance someone is smiling (1/3). And we know the chance they have their mouth open if they’re smiling (2/5). So if we multiply, we get the answer: 2/15.
There’s another way we can go about this. We know the chance someone has their mouth open (4/15), and the chance they’re smiling if their mouth is open (1/2). Multiplying: 2/15.
(Notice that we can also get P(smiling, mouth open) by just counting two open-mouthed smilers out of the fifteen total faces: 2/15).
Thus we see P(smiling) * P(mouth open | smiling) = P(mouth open) * P(smiling | mouth open).
Simplifying a bit, we get:
P(mouth open | smiling) = P(mouth open) * P(smiling | mouth open) / P(smiling)
Which, after substituting mouth open = A and smiling = B, gives us exactly Bayes’ Theorem:
P(A | B) = P(B | A) * P(A) / P(B)
I’ve found it easiest to understand Bayes’ Theorem in this way: the equation is just describing different ways to calculate the chance of two variables happening at the same time.
To drive the point home, it’s like filtering a group of people (some wearing hats, some wearing glasses, some wearing both). Assume your goal is to figure out how many are wearing both a hat and glasses.
You can do this by first finding everyone wearing a hat, then everyone wearing glasses within the hat group. Or you can do it by first finding all the glasses-wearers, then hat-and-glasses-wearers. Both give you the same answer.