0 like 0 dislike
0 like 0 dislike
Help with understanding Wasserstein distance

5 Answers

0 like 0 dislike
0 like 0 dislike
The typical intuition is:

-    imagine your two probability distributions as a two piles of sand with the same amount of sand
-    now think of one pile as being the pile before moving it, and the other one as the pile after moving it
-    ask yourself in what way you can move the one pile to the other, such that you have to do the least amount of work, that is, every time you carry sand you want to move it the shortest possible distance, especially if it’s a lot of sand you’re carrying.
-    if your distance is a Lp metric, then the answer to this question is the p-Wasserstein distance

To summarize, the Wasserstein distance tells you how lazy you can be when moving stuff around while still getting it done. For a more thorough and less handwavy look at it I recommend “Computational optimal transport” by peyré and cuturi, and “optimal transport: old and new” by villani.
0 like 0 dislike
0 like 0 dislike
Maybe limit yourself to a discrete example before thinking about the general definition. Imagine I have a line of concrete blocks on the ground. How much effort would I have to exert to lift a block and lay it on top of another? If I'm laying on the block next to it not very much effort since I don't have to walk far or maybe not walk at all, just lift the block and put it down. Compare that to laying the block on top of another further down the line. I will have to walk over to where I want to place it and that will require more effort to move. In this way the measurement being taken is the effort I have to expend to move a block, where the further I want to move the more effort I need to exert.
0 like 0 dislike
0 like 0 dislike
For probability and statistics, the Wasserstein metric defines a distance between probability distributions. You are really looking at _distances between random variables_ instead of distances between points in a Euclidean space.

You basically apply p-norm ideas from analysis to the distribution functions of random variables, and rely on the nice properties of distribution functions to ensure that your geometric interpretation of distance jives with reality.

In practice to compute actual distances between things, you do end up needing to do a bunch of integrals as you have noticed.

This is sort of related to how we can use _covariance_ as a distance between two zero-mean random variables, and then use that for ordinary least squares stuff.
0 like 0 dislike
0 like 0 dislike
Thank you everyone. The replies gave me a better intuition on Wasserstein metric
0 like 0 dislike
0 like 0 dislike
In addition to the geometric explanations here, I also like the fact that, if you use the squared L2 distance function, then the transport “map” is the joint distribution that maximizes the correlation

Related questions

0 like 0 dislike
0 like 0 dislike
79 answers
coL_Punisher asked Jun 21
Regretting majoring in math
coL_Punisher asked Jun 21
0 like 0 dislike
0 like 0 dislike
61 answers
_spunkki asked Jun 21
Just ordered a Klein Bottle from Cliff Stoll. He sent me about 2 dozen pictures of him packing it up. Why is he so cute :)
_spunkki asked Jun 21
0 like 0 dislike
0 like 0 dislike
21 answers
Brands_Hatch asked Jun 21
Is set theory dying?
Brands_Hatch asked Jun 21
0 like 0 dislike
0 like 0 dislike
2 answers
a_dalgleish asked Jun 21
Contributing to the right math area, If all areas are equally curious
a_dalgleish asked Jun 21
0 like 0 dislike
0 like 0 dislike
5 answers
BrianDenver7 asked Jun 21
Is there a nice way to recast riemannian geometry in terms of principal bundles?
BrianDenver7 asked Jun 21

33.4k questions

135k answers

0 comments

33.7k users

OhhAskMe is a math solving hub where high school and university students ask and answer loads of math questions, discuss the latest in math, and share their knowledge. It’s 100% free!