Basic calculus for Deep learning with code-1
Many Deep learning approaches like DNN,CNN,RNN have always been the black boxes which produces amazing results. But the math behind these inbuilt NN functions has always been a less interesting area to explore because of the patience required. But the understanding of the math behind these NNs paves way for many innovative approaches to utilise them
Another difficulty is the lag of understanding of how a pure mathematical concept is converted to code and the programming problems associated with it. I would attempt to bridge this gap by explaining the mathematical concepts along with code .
General Notation:
Basic calculus(Skip to the code part if you already know this)
Consider the simple equation:
Graph for the equation
Let us say our aim is to calculate the value of f’(2).Our intuitive way is to derive f’(x) first and calculate f’(2)
But this approach of calculating the derivative first and substituting the values becomes difficult to implement code when it comes to some non polynomial functions like sigmoid function ,circular functions and so on.
so we dive deep to the meaning of the derivative as the slope of the curve at the particular point which is
Therefore,
The code that implements the f’(x) for the functions, f(x)=square(x) and
f(x)=sigmoid(x) is as follows
Chain rule:
Let f(x)=u(v(x)),which is a function of a function.
The code snippet for the same is
In order to calculate the f’(n), for this case the general formula is
The code snippet to calculate the same and the result is
Similarly it can be extended for the chain length of 3.
Multiple inputs to a function:
The value of partial derivative of z over x at the point (x_0,y_0) is calculated as
The code for the above is
Matrix calculation
Since,
Hence ,
Another example where, X and W are multidimensional arrays and the calculation of sigmoid(X,Y) is as follows.
In this case were the matrices X, W and (X ×W) are of different sizes the calculation of partial derivative of the matrix (X ×W) upon X will be done with the lambda function.
Hence ,
Finally the summary of the composite functions from
Now we have L which returns a single value which can be viewed as the function of X and W. Now we can compute how the value of L would change upon changing the element of these input matrices(x₁₁,w₁₁ and so on) which is nothing but its derivative over X. That is represented as
By chain rule
- Take the first element (Partial derivative of L over S):
From equation 2
2.Take the second element (Partial derivative of S over N):
3.Take the third and last element (Partial derivative of S over N):
From Equation 1:
Hence .,
The code for all the above is
Since we have implemented basic matrix calculus in python ,we can try now to implement neural networks which fundamentally relies on backpropagation and gradient descent which in turn fundamentally rely on this chain rule in calculus.