Training the Shallow Neural Network

Training using the Lavenberg-Marquardt Backpropagation Algorithm  


In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting [1]. This shallow neural network has 10 hidden layers. 

Here's a step-by-step explanation of how the Levenberg-Marquardt backpropagation algorithm works:

  1. Initialization: Initialize the weights and biases of the neural network with small random values.
  2. Forward Propagation: Feed a training input through the network and compute the corresponding output using the current weights and biases. This involves propagating the input forward through each layer of the network, applying activation functions and computing the output of each neuron.
  3. Compute the error: Compare the computed output with the desired output for the given training example and calculate the error. This error is typically defined as the difference between the desired output and the actual output.
  4. Backpropagation: Perform backward propagation to calculate the gradients of the error function with respect to the weights and biases. This involves calculating the partial derivatives of the error with respect to each weight and bias by applying the chain rule.
  5. Gauss-Newton approximation: The key idea of the LM algorithm is to approximate the Hessian matrix (which represents the second derivatives of the error function) using a combination of the identity matrix and the Jacobian matrix (which represents the first derivatives of the error function). The resulting approximate Hessian matrix is then used to update the weights and biases.
  6. Compute the update: Calculate the weight and bias update values using the approximate Hessian matrix, the gradients computed in the backpropagation step, and a damping factor. The damping factor controls the balance between the Gauss-Newton approximation and the gradient descent update. If the Gauss-Newton approximation is not providing good results, the damping factor can be increased to rely more on gradient descent.
  7. Update the weights and biases: Adjust the weights and biases of the network using the update values computed in the previous step. This update is performed to minimize the error function and improve the network's performance.
  8. Repeat: Iterate steps 2 to 7 for the entire training dataset or a subset of it. Continue these iterations until the network's performance converges or a predetermined stopping criterion is met, such as reaching a maximum number of iterations or achieving a desired level of error [2].

How the code works in MATLAB script

Figure 1: Flowchart representing the training process


The training, validation and testing data were split into rations of 70:20:10 which means that 70% of the observations were assigned for testing, 20 for validating the trained model and 10 for testing if the models work accordingly. Another important initialization was the response array which in this case was the discharge capacitance because we were interested in seeing if the model can predict the amount of capacitance it can discharge after various cycles of usage. 

Figure 2: Training parameters

From the performance variables of the training process it can be seen that after training stopped after 5 validation checks which indicates that the performance curve reached the absolute minimum and begins to rise after that point and increase the error, thus to avoid this the training stops. 



Figure 3: Fitting coefficients

From the fitting coefficients seen above we can see that the model performed promisingly as the data points of all three training processes very much mapped onto the bisector line and the fitting coefficients were very close to 1. 

Figure 4: Training Performance


Comparing the forecasted and actual capacitances

Their were 5 models trained for the shallow neural network. Each of those models were based of the training dataset that it was exposed to. The first set was the model trained using supercapacitor 2 features from batch 1, the second was for supercapacitor 8, then supercapacitor 13, 19 and lastly 25. Each of these models were tested using the same and foreign datasets such as supercapacitor 3, 6, 9, 12 and 15

Some of the results of the predicted and actual responses are illustrated below along with their error values. 
Figure 5: Forecasted and actual capacitance values 

Figure 6: Closeup graph of the actual and forecasted capacitance graphs


As seen in figure 5, the input shown in green was given for the first 300 cycles and the model was instructed to predict the capacitance values for the next 300 cycles. The reason only 300 cycles was given was that we needed to have a sufficient amount of data fed to the model so that it can have an adequate amount of references to teach itself how the curve is behaving and also because since we want to deem the supercapacitor as faulty after it reaches 0.92F, the next 300 cycles (301-600) contains this failure point and thus 300 was given to predict the next 300 cycles worth of data.

 Table 1: Error values for SC2 Trained and Tested model 

Model

MSE

RMSE

MAE

MAPE

Batch 4 Trained SC2

Tested SC2

1.1509e-07

0.00033925

0.00027829

0.030337


The table shown above represents the error values for the four different types of errors such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MSE) and Mean Absolute Percentage Error (MAPE).
Now all these errors are a prediction results of a model that was trained and tested using the same dataset which was for supercapacitor 2. What if the trained model was subjected to new or foreign data that it is not accustomed to and has never seen. The next figures and error values portrays this scenario

Figure 7: Capacitance curve of the batch 4 SC2 trained and SC9 tested data

By the comparison of the capacitance curves see from figures 5 and 7, it can be observed that when the same model is exposed to new set of data for testing, the forecasted capacitance values slightly deviates from the actual values but not a lot proving that the trained model is capable of learning, understanding an adapting to the new elements still forecast a fairly accurate capacitance values with slight errors

 Table 2: Error values for SC2 Trained and SC9 Tested model 

Model

MSE

RMSE

MAE

MAPE

Batch 4 Trained SC2

Tested SC9

5.6447e-06

0.0023759

0.0021936

0.23897


Comparing the error values in table 2 and table 3 I can be seen that the errors have slightly increased but still quite close to 0. In a similar manner, the shallow neural network model was trained and tested based on the scheme shown in figure 12 previously and recorded below in the table.

Model

MSE

RMSE

MAE

MAPE

Batch 4 Trained SC2

Tested SC3

0.00067988

0.026075

0.25736

2.7052

Tested SC6

0.0009954

0.031551

0.03128

3.2687

Tested SC9

5.6447e-06

0.0023759

0.0021936

023897

TestedSC12

7.617e-05

0.0087276

0.0087139

0.9389

TestedSC15

0.001078

0.031746

0.031678

3.6199

Batch 4 Trained SC8

Tested SC3

2.2997e-05

0.0047955

0.0047376

0.49904

Tested SC6

1.5668e-06

0.0012517

0.0009977

0.10389

Tested SC9

0.0005236

0.022882

0.022397

2.433

TestedSC12

0.00030993

0.017605

0. 017118

1.842

TestedSC15

0.0033483

0.057865

0.057666

6.5907

Batch 4 Trained SC13

Tested SC3

0.00059812

0.024456

0.024369

2.563

Tested SC6

0.00094045

0.030667

0.030565

3.1952

Tested SC9

9.7097e-06

0.003116

0.0026363

0.28651

TestedSC12

5.0173e-05

0.0070833

0.0067123

0.72452

TestedSC15

0.001192

0.034526

0.034403

3.9319

Batch 4 Trained SC19

Tested SC3

9.3283e-05

0.0096583

0.0096418

1.0149

Tested SC6

0.00024118

0.01553

0.015513

1.6222

Tested SC9

0.00011698

0.010816

0.009237

1.0053

TestedSC12

3.8524e-05

0.0062067

0.0044713

0.47941

TestedSC15

0.0021151

0.4599

0.045738

5.2284

Batch 4 Trained SC25

Tested SC3

4.4502e-05

0.006671

0.0061862

0.64905

Tested SC6

0.00014434

0.012014

0.01169

1.2207

Tested SC9

0.0001961

0.014004

0.013693

1.4874

TestedSC12

6.9188e-05

0.0083179

0.008012

0.86201

TestedSC15

0.0022639

0.04758

0.047469

5.4246


From the errors obtained after the testing of the different trained models with different test sets, it was seen that the error values variated when compared to each other but when compared to how close the RMSE was to zero, it was deemed to be quite close and accurate.

References

[1] J. S. Smith, B. Wu and B. M. Wilamowski, "Neural Network Training With Levenberg–
            Marquardt and Adaptable Weight Compression," in IEEE Transactions on Neural Networks             and Learning Systems, vol. 30, no. 2, pp. 580-587, Feb. 2019, doi:                                   
            10.1109/TNNLS.2018.2846775.

[2] G. Lera and M. Pinzolas, "Neighborhood based Levenberg-Marquardt algorithm for neural 
           network training," in IEEE Transactions on Neural Networks, vol. 13, no. 5, pp. 1200-1203,               Sept. 2002, doi: 10.1109/TNN.2002.1031951.









Edited by Shahil and Henal

S11172483@student.usp.ac.fj 

S11085370@student.usp.ac.fj






Comments

Popular posts from this blog