Absolute residual to 2 data points is minimized when the point is between the 2 data points. So when you have many data points, the total absolute residual is minimized if you are between as many pairs as possible: ie. the middle.

Think of residual square as potential energy, then the point is attracted by a force to each data point, and the force scale proportionally to distance. Minimizer of potential energy=stationary point=balance in the force, which happen when distance to all data points perfectly cancel each other out, so you get the mean.