ago
0 like 0 dislike
0 like 0 dislike
I was in the understanding that two contiguous linear layers in a NN would be no better than only one linear layer. But it happen that the two layers had better results that when using only one. However, each layer had its own dropout, could that helped?
ago
0 like 0 dislike
0 like 0 dislike
Technically, and I emphasize the technically, the set of function represented by a neural network require only one layer. However, there is little guarantee that you can feasibly find the proper configuration or train the network accurately.

 By adding another layer, you can reduce the training burden by spreading it across layers. The extra dropout also allows more regularization.

This is the part of deep learning where it's less science and more, "eh, sounds like it works."
ago
by
0 like 0 dislike
0 like 0 dislike
If you have Y = A B X, then is M = A B full rank? If not, then they're not even equivalent.
ago
0 like 0 dislike
0 like 0 dislike
Dropout is not strictly a linear function (it can be randomly), and the chances are that it will add non-linearity for p>0, so yeah, that probably made the difference.
ago
by

No related questions found

33.4k questions

135k answers

0 comments

33.7k users

OhhAskMe is a math solving hub where high school and university students ask and answer loads of math questions, discuss the latest in math, and share their knowledge. It’s 100% free!