0 like 0 dislike
0 like 0 dislike
I have hundreds of potential features to use in my DNN. Instead of doing a separate analysis to figure out which features are most important, can I just use all of them in my DNN and let the model figure which features are most predictive? I have millions of training data so overfitting will not be a problem, I just wonder whether the bad features may make the model difficult to utilize the good features?

Not absolutely crucial but if there is a paper that discusses this topic, that would be super awesome as well. Thanks in advance.
0 like 0 dislike
0 like 0 dislike
Feature selection is always important, if only to improve model performance (in a hardware sense, i.e. faster inference and training) and lower its footprint.
You can try to add L1 regularization as a first attempt. Oh and use Mutual Information -- any variable with zero MI is guaranteed to carry no information towards the target (doesn't mean you can blindly discard it, but in a statistical sense, they are independent). Those 2 should get you started with at least a very rough idea of which variables could be dropped (or investigated, sometimes variables that are expected to be important aren't, and that's worth looking into because there might be a problem with data!)
0 like 0 dislike
0 like 0 dislike
With DNNs , you don’t need to do feature selection. DNN+SGD will learn all it needs anyway. Especially if you have millions of training data.
0 like 0 dislike
0 like 0 dislike
No other feature selection approach you could do as preprocessing would be better than the "feature selection" implicitly done by the model as part of the optimization process. This is because no preprocessing step could know better which features are useful for the DNN than the DNN itself. The only reason you would use another feature selection step is if you don't have much data, so you can't rely on the DNN figuring it out itself or you want to reduce runtime/memory requirements of your model. However, with just a few hundred input features, this shouldn't be a concern.
by

No related questions found

33.4k questions

135k answers

0 comments

33.7k users

OhhAskMe is a math solving hub where high school and university students ask and answer loads of math questions, discuss the latest in math, and share their knowledge. It’s 100% free!