This isn’t really a math issue, it’s an issue of methodological science. Modeling, regardless of the model that you’re using, always consists of iterating between these two steps:

1. Make educated assumptions about how something works

2. test those assumptions against new data

If you dont have enough data to use statistical methods (e.g. ML), and you also don’t want to make any assumptions about how the system you’re analyzing works, then there’s nothing you can do.

Regarding the choice of causal models, any model of the following form will be causal:

x(t) = f(x(t’<t), t’<t)

Where I’m using “t’<t” as shorthand for “any and all times less than t”.

Differential equations are basically just the special case that

f(x(t’<t), t’<t)=x(t-dt)+dt*g(x(t-dt),t-dt)

for some function g(t) and really small time interval dt.

But you can choose anything you want, really. If its not too complicated and it fits the data then it’s a good model.