4 Deep Neural Networks

Deep L-layer neural network

Over the last several years the AI or the machine learning community has realized that there are functions that very deep neural networks can learn and the shallower models are often unable to. Although for any given problem it might be hard to predict in advance exactly how deep a neural network you would want, it would be reasonable to try logistic regression Symbol definition for deep learning :

[latex]L[/latex] : the number of layers in the network
[latex]n^{[1]} = 5[/latex] : the number of nodes or the number of units in layer
[latex]a^{[l]}[/latex] : the activations in layer l
computing [latex]a^{[l]}[/latex] as g
[latex]W^{[l]}[/latex] : weights on layer l
[latex]x[/latex] : feature and [latex]x = a^{[0]}[/latex]
[latex]\hat {y} = a^{[l]}[/latex] : the activation of the final layer

Forward and backward propagation

forward propagation : [latex]\begin{matrix} z^{[l]} = W^{[l]} \cdot a^{[l - 1]} + b^{[l]}\\ a^{[l]} = g^{[l]}(z^{[l]}) \end{matrix}[/latex] vectorized version : [latex]\begin{matrix} z^{[l]} = W^{[l]} \cdot A^{[l - 1]} + b^{[l]}\\ A^{[l]} = g^{[l]}(Z^{[l]}) \end{matrix}[/latex] backward propagation : [latex]\begin{matrix} \mathrm{d}z^{[l]} = \mathrm{d}a^{[l]} * g^{[l]^{‘}}(z^{[l]})\\ \mathrm{d}w^{[l]} = \mathrm{d}z^{[l]} \cdot a^{[l-1]}\\ \mathrm{d}b^{[l]} = \mathrm{d}z^{[l]} \\ \mathrm{d}a^{[l-1]} = w^{[l]T} \cdot \mathrm{d}z^{[l]}\\ \mathrm{d}z^{[l]} = w^{[l+1]T} \mathrm{d}z^{[l+1]} \cdot g^{[l]^{‘}}(z^{[l]}) \end{matrix}[/latex] vectorized version : [latex]\begin{matrix} \mathrm{d}Z^{[l]} = \mathrm{d}A^{[l]} * g^{[l]^{‘}}(Z^{[l]})\\ \mathrm{d}W^{[l]} = \frac{1}{m} \mathrm{d}Z^{[l]} \cdot A^{[l-1]T}\\ \mathrm{d}b^{[l]} = \frac{1}{m} np.sum(\mathrm{d}z^{[l]} , axis = 1, keepdims = True)\\ \mathrm{d}A^{[l-1]} = W^{[l]T} \cdot \mathrm{d}Z^{[l]} \end{matrix}[/latex]

Forward propagation in a Deep Network

for a single training example : [latex]z^{[l]} = w^{[l]}a^{[l-1]} + b^{[l]}, \ a^{[l]} = g^{[l]}(z^{[l]})[/latex] vectorized way : [latex]Z^{[l]} = W^{[l]}a^{[l-1]} + b^{[l]}, \ A^{[l]} = g^{[l]}(Z^{[l]}) \ (A^{[0]} = X)[/latex]

Getting your matrix dimensions right

one of the debugging tools to check the correctness of my code is to work through the dimensions and matrix. make sure that all the matrices dimensions are consistent that will usually help you go some ways toward eliminating some cause of possible bugs.

Why deep representations?

Deep neural networks work really well for a lot of problems it’s not just that they need to be big neural networks is that specifically they need to be deep or to have a lot of hidden layers

The earlier layers learn these low levels simpler features and then have the later deeper layers then put together the simpler things that’s detected in order to detect more complex things
If you try to compute the same function with a shallow network so we aren’t allowed enough hidden layers then you might require exponentially more hidden units to compute

Starting out on a new problem :

Start out with even logistic regressions and try something with one or two hidden layers and use that as a hyper parameter use that as a parameter or hyper parameter that you tune
But over the last several years there has been a trend toward people finding that for some applications very very deep neural networks sometimes can be the best model for a problem

Building blocks of deep neural networks

Nothing ……

Parameters vs Hyperparameters

These are parameters that control the ultimate parameters W and b and so we call all of these things below hyper parameters :

[latex]\alpha[/latex] (learning rate)
iterations (the number of iterations of gradient descent)
L (the number of hidden layers)
[latex]n^{[l]}[/latex] (the number of hidden units)
choice of activation function

Find the best value :

Idea—Code—Experiment—Idea— ……

Try a few values for the hyper parameters and double check if there’s a better value for the hyper parameters and as you do so you slowly gain intuition as well about the hyper parameters.

What does this have to do with the brain?

Maybe that was useful but now the field has moved to the point where that analogy is breaking down.