Quaternion Manifolds? SO(3) distributions? I try to keep up with all the papers posted around here, but I'm gonna need to take some more math classes before this one.
Does this have any special tricks for situations where some orientations are equivalent by symmetry to others? I'm thinking crystallography specifically.
by
Please help me understand: You train a transformer, which gets as input image patches like in ViT and predicts the pose (i.e., the orientation) of the object. The image itself has multiple possible solutions to this problem, as some poses are ambiguous. By sampling 10000 x you get the probability distribution over all possible solutions.

Hope this is about right. Very cool!

1) I don't quite get whats the output of the transformer. Instead of directly predicting the orientation, you input parts of the rotation (x and y of q), and classify the Bins? There are over 50000 bins, so does it mean this is a multi class classification with 50k classes?

2) How do you get the gt labels? As I understand, you have positive samples (image - orientation pair) and a negative sample which has a random orientation assigned? Or is there something about the math I'm overseeing?

3) Why do we need a start token?
Nice work! Can this method be used to rapidly sample from the trained distribution?