In my multivariate class we constantly used covariance matrices, but it felt like something was missing since these only contain variances and covariances between 2 variables. In my understanding associations can exist between 3 or more variables that are not visible just from looking at the bivariate relationships between those variables. e.g. pairwise independence does not imply mutual independence.

So then to fully describe the structure of a high dimensional dataset, wouldn't you need something like a covariance tensor (a p-dimensional array where p is the number of variables, instead of a 2-dimensional array like a matrix) that contains the multiple R between variables? Has any work been done on this?
In general, yes, covariances or correlations are insufficient, hence the use of copulas (e.g. see wikipedia). Indeed correlation is not adequate even for describing bivariate relationships - you can have many distinct bivariate relationships with the same  correlation.

(If you didn't at least get mention of copulas in a class on multivariate statistics, you're missing both some helpful conceptual knowledge and some useful tools for modelling multivariate structure. It would be worth picking up some of the concepts.)

For a few  particular multivariate distributional classes (one of which includes the multivariate Gaussian), the correlation matrix plus the information to determine the marginal distributions is sufficient.

Copulas allow us to decouple marginal distributions from their multivariate relationships, and just focus on the latter.

Note that having normal distributions on each of the variables (univariate marginal normality) is not enough. Nor is having all lower dimensional margins being multivariate normal sufficient, as you suggest.

If I correctly understand your intent, in general the required information is not like a tensor, though.
(Joint) moments, when they exist, can help characterize the distribution of a random vector. Having all the same moments is a necessary but not sufficient condition for two random vectors to be identically distributed. As you know, the mean and covariance are first- and second-order (central) moments.

The moment-generating function, when it exists, uniquely determines the distribution. Note that for some distributions, moments of all orders exist and yet the MGF does not. In such cases, the collection of all moments need *not* uniquely determine the distribution.

Equivalently, one can consider the cumulant-generating function, which is the natural logarithm of the MGF. In many ways cumulants are nicer than moments, although they contain equivalent information. For instance, a cumulant of a sum of independent random variables is the sum of the cumulants of the individual variables, which is not true of arbitrary central moments.
Perhaps your class uses the multivariate normal distribution frequently. In this case, the covariance matrix actually contains all the information of the dependence structure.This follows from the definition.

But you can have a multivariate distribution with normally distributed margins that is not a multivariate normal distribution. Its dependence structure must then be described in a different way, e.g. by Copulas, as others mentioned.

For specific purposes, e.g. the distribution of powers off quadratic forms of general multivariate distributions, "covariance tensors" basing on different higher product moments may be an adequate way to describe the dependency structure and to calculate with it.
I think this is where Markov Networks (undirected graphs) can be used, but the continuous case is more complex than discrete
I don't know if this is what you were looking for but copulas are often used (at least in my field) to describe dependence between multiple random variables.