Basic calculus for Deep learning with code-1

Rahuram Thiagarajan
5 min readOct 19, 2022

--

Many Deep learning approaches like DNN,CNN,RNN have always been the black boxes which produces amazing results. But the math behind these inbuilt NN functions has always been a less interesting area to explore because of the patience required. But the understanding of the math behind these NNs paves way for many innovative approaches to utilise them

Another difficulty is the lag of understanding of how a pure mathematical concept is converted to code and the programming problems associated with it. I would attempt to bridge this gap by explaining the mathematical concepts along with code .

General Notation:

Basic calculus(Skip to the code part if you already know this)

Consider the simple equation:

Equation of a parabola

Graph for the equation

Graph for y=x²

Let us say our aim is to calculate the value of f’(2).Our intuitive way is to derive f’(x) first and calculate f’(2)

Calculation of f’(x)

But this approach of calculating the derivative first and substituting the values becomes difficult to implement code when it comes to some non polynomial functions like sigmoid function ,circular functions and so on.

Sigmoid function

so we dive deep to the meaning of the derivative as the slope of the curve at the particular point which is

f’(x) as limits

Therefore,

Delta x is usually a very small number nearer to zero

The code that implements the f’(x) for the functions, f(x)=square(x) and

f(x)=sigmoid(x) is as follows

Chain rule:

Let f(x)=u(v(x)),which is a function of a function.

The code snippet for the same is

In order to calculate the f’(n), for this case the general formula is

The code snippet to calculate the same and the result is

Similarly it can be extended for the chain length of 3.

Multiple inputs to a function:

The value of partial derivative of z over x at the point (x_0,y_0) is calculated as

The code for the above is

Matrix calculation

N has only one element and Let gamma=N[1]

Since,

Hence ,

Equation 1

Another example where, X and W are multidimensional arrays and the calculation of sigmoid(X,Y) is as follows.

In this case were the matrices X, W and (X ×W) are of different sizes the calculation of partial derivative of the matrix (X ×W) upon X will be done with the lambda function.

Equation 2

Hence ,

Equation 3

Finally the summary of the composite functions from

Now we have L which returns a single value which can be viewed as the function of X and W. Now we can compute how the value of L would change upon changing the element of these input matrices(x₁₁,w₁₁ and so on) which is nothing but its derivative over X. That is represented as

Derivative of L over X

By chain rule

Equation 4
  1. Take the first element (Partial derivative of L over S):

From equation 2

Equation 5

2.Take the second element (Partial derivative of S over N):

3.Take the third and last element (Partial derivative of S over N):

From Equation 1:

Hence .,

The code for all the above is

Since we have implemented basic matrix calculus in python ,we can try now to implement neural networks which fundamentally relies on backpropagation and gradient descent which in turn fundamentally rely on this chain rule in calculus.

--

--

Rahuram Thiagarajan
Rahuram Thiagarajan

Written by Rahuram Thiagarajan

Bachelor of Engineer(CS), Philomath, Backend Engineer at C1X

No responses yet