Would it be possible to model any possible curve to a mathematical model given data points only?

The main issue you have is that "best describes..." is ill-defined. You have to pick a norm to measure closeness.

In addition, if you have a finite set of data points, you have infinitely many functions that fit your points exactly.

Most of the theory on interpolation uses polynomial interpolants. This is more general than you seem to think. For example, sinx can be written as the polynomial:

sinx = 1 - x^(3)/3!+x^(5)/5! - x^(7)/7! + ...

The best recommendation would be to read the chapters on interpolation/approximation in any numerical analysis text.

ETA: If you have a finite set of, say, n data points, you can at best uniquely determine a polynomial of order (n-1), ax^(n-1)\+bx^(n-2)\+... You then write a set of n simultaneous equations for the value of the polynomial at each of the data points, then solve for the unknown a,b,c etc.
> I don't just mean linear or poly graphs etc, but using sections of sin, cos, tan, tanh, sin^(2) etc.

People fit all sorts of functions to data all the time. If you e.g. know you have data that should be some kind of sine wave you can do a least squares fit to find the best amplitude, wavelength, and phase-offset to use. (Or the same for whatever other kind of function.)

But unless you have some a priori knowledge of the function, it generally makes no sense to fit it via some large collection of other functions.

In practice people approximate arbitrary functions by fitting polynomials or rational functions (ratios of polynomials), because those are convenient to work with and computationally tractable (they involve only basic arithmetic), with a well developed theory. But if you e.g. have a function with a square-root singularity, a polynomial or rational approximant might be slow to converge; then you can try to approximate your function as the product of a square root and a rational function. Etc.

If the domain of your function is a periodic interval, you should use trigonometric polynomials (a function with terms like *a* sin(*kx*) or *b* cos(*mx*) for integers *k*, *m*) or trigonometric rational functions, instead.
You have to determine whether your measured data point have noise.

As some else mentioned, given n one-dimensional data points, you can fit it easily with an n-1 degree single variable polynomial.  But if the underlying phenomenon is actual closer to a low degree polynomial and your measured data is noisy, then the high degree polynomial model will match exactly at the n data points, but be so wiggly in between (interpolation) and beyond (extrapolation) those points, that it would be worthless.

It would be better to approximate your n point data using a low degree m polynomial where m<<n, and thus use a least squares fit since now your system is over-determined.  But to choose 'm', you would need to know something about the underlying physics of the process you are trying to model.
Feedforward Neural Network is basically a giant curve interpolation program. It has a specific form with tons of parameters to adjust, and the fitting is done through backpropagation, when the error is reduced by moving the parameters against the gradient of the error.
Try symbolic regression?
There is a theorem which says, effectively, *the more complicated of functions you are allowed to use to approximate a target function, the more data points you need in order to have an accurate approximation*. This is also often called the *bias-variance tradeoff*. If you want to permit your models to be "any smooth function" with no limitations on what is called the "hypothesis space," then your selected model will be truly *useless*: you will have no confidence that your model will generalize to unobserved data points. You are likely to "overfit" and model which is only applicable to the data points.

Instead of looking at a numerical analysis text, I might actually suggest a basic statistical learning theory text, for example Bishop's *Pattern Recognition and Machine Learning*.
I would say it's not for any possible arbitrary curve, there are an infinite amount of curves to choose from and none are arbitrarily better. Also data points on a curve indicates there is missing data and anything can be going on in between such as a discontinuity/spike/oscillation/etc.

Also if you are dealing with real data it can be dependent on unpredictable, messy events which can't be simply modelled.

For example at work I deal with a lot of data that spans over decades. Recently I had a data set that seemed random but was actually strongly linked to things like the cost of living. The cost of living curve actually became model curve (albeit translated) plus a normally distributed error term.
Not always:
- not enough data sometimes
- there might not be a compact way to express the curve (i.e there are near/infinite terms) and searching for a model might not be computationally feasible
- specific to statistical learning many algorithms rely on curves having certain properties (i.e continuous)

Also depends on what you mean by points; usually you need the inputs and outputs to fit a curve with any certainty.
If you don't know nothing about what can make sence and what no about your function, you will have more than one posible answer, many of them absolutly meaningless and not more usefull (ir even less) than any oracle.
One fun way to do this that's different from the other aswers is to use genetic programming.

This takes care of some of the "issues" already mentioned by others. As one person said, there is no one fixed list of "functions" that it makes sense to compare to fit with, since pretty much anything can be a function. With the GP approach, you decide what "primitive" functions you want (eg, basic arithmetic and trig functions) and the algorithm works with that. Want to include more functions? Just add em, no need to change anything else.

As others said, it's also not necessarily clear what's "best describes" is. GP doesn't answer this question, no, but it confronts it head on: the fitness function. The nice part of the GP approach is that we can very easily include "length of expression" as part of the fitness so that the system will tend to prefer "e^(x)" over "1 + x+ x^(2)/2 +...".

You then have the genes code a tree structure where the leaves are random numbers and all the other nodes are the functions. Binary functions get two children, unary functions get one, etc. If you Google "genetic programming function approximation" you can find some cool stuff.

0 like 0 dislike