Training the Shallow Neural Network
Training using the Lavenberg-Marquardt Backpropagation Algorithm
In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting [1]. This shallow neural network has 10 hidden layers.
Here's a step-by-step explanation of how the
Levenberg-Marquardt backpropagation algorithm works:
- Initialization:
Initialize the weights and biases of the neural network with small random
values.
- Forward
Propagation: Feed a training input through the network and compute the
corresponding output using the current weights and biases. This involves
propagating the input forward through each layer of the network, applying
activation functions and computing the output of each neuron.
- Compute
the error: Compare the computed output with the desired output for the
given training example and calculate the error. This error is typically
defined as the difference between the desired output and the actual
output.
- Backpropagation:
Perform backward propagation to calculate the gradients of the error
function with respect to the weights and biases. This involves calculating
the partial derivatives of the error with respect to each weight and bias
by applying the chain rule.
- Gauss-Newton
approximation: The key idea of the LM algorithm is to approximate the
Hessian matrix (which represents the second derivatives of the error
function) using a combination of the identity matrix and the Jacobian
matrix (which represents the first derivatives of the error function). The
resulting approximate Hessian matrix is then used to update the weights
and biases.
- Compute
the update: Calculate the weight and bias update values using the
approximate Hessian matrix, the gradients computed in the backpropagation
step, and a damping factor. The damping factor controls the balance
between the Gauss-Newton approximation and the gradient descent update. If
the Gauss-Newton approximation is not providing good results, the damping
factor can be increased to rely more on gradient descent.
- Update
the weights and biases: Adjust the weights and biases of the network using
the update values computed in the previous step. This update is performed
to minimize the error function and improve the network's performance.
- Repeat:
Iterate steps 2 to 7 for the entire training dataset or a subset of it.
Continue these iterations until the network's performance converges or a
predetermined stopping criterion is met, such as reaching a maximum number
of iterations or achieving a desired level of error [2].
Comparing the forecasted and actual capacitances
|
Model |
MSE |
RMSE |
MAE |
MAPE |
|
Batch 4 Trained SC2 |
||||
|
Tested SC2 |
1.1509e-07 |
0.00033925 |
0.00027829 |
0.030337 |
|
Model |
MSE |
RMSE |
MAE |
MAPE |
|
Batch 4 Trained
SC2 |
||||
|
Tested SC9 |
5.6447e-06 |
0.0023759 |
0.0021936 |
0.23897 |
|
Model |
MSE |
RMSE |
MAE |
MAPE |
|
Batch 4 Trained
SC2 |
||||
|
Tested SC3 |
0.00067988 |
0.026075 |
0.25736 |
2.7052 |
|
Tested SC6 |
0.0009954 |
0.031551 |
0.03128 |
3.2687 |
|
Tested SC9 |
5.6447e-06 |
0.0023759 |
0.0021936 |
023897 |
|
TestedSC12 |
7.617e-05 |
0.0087276 |
0.0087139 |
0.9389 |
|
TestedSC15 |
0.001078 |
0.031746 |
0.031678 |
3.6199 |
|
Batch 4 Trained
SC8 |
||||
|
Tested SC3 |
2.2997e-05 |
0.0047955 |
0.0047376 |
0.49904 |
|
Tested SC6 |
1.5668e-06 |
0.0012517 |
0.0009977 |
0.10389 |
|
Tested SC9 |
0.0005236 |
0.022882 |
0.022397 |
2.433 |
|
TestedSC12 |
0.00030993 |
0.017605 |
0. 017118 |
1.842 |
|
TestedSC15 |
0.0033483 |
0.057865 |
0.057666 |
6.5907 |
|
Batch 4 Trained
SC13 |
||||
|
Tested SC3 |
0.00059812 |
0.024456 |
0.024369 |
2.563 |
|
Tested SC6 |
0.00094045 |
0.030667 |
0.030565 |
3.1952 |
|
Tested SC9 |
9.7097e-06 |
0.003116 |
0.0026363 |
0.28651 |
|
TestedSC12 |
5.0173e-05 |
0.0070833 |
0.0067123 |
0.72452 |
|
TestedSC15 |
0.001192 |
0.034526 |
0.034403 |
3.9319 |
|
Batch 4 Trained
SC19 |
||||
|
Tested SC3 |
9.3283e-05 |
0.0096583 |
0.0096418 |
1.0149 |
|
Tested SC6 |
0.00024118 |
0.01553 |
0.015513 |
1.6222 |
|
Tested SC9 |
0.00011698 |
0.010816 |
0.009237 |
1.0053 |
|
TestedSC12 |
3.8524e-05 |
0.0062067 |
0.0044713 |
0.47941 |
|
TestedSC15 |
0.0021151 |
0.4599 |
0.045738 |
5.2284 |
|
Batch 4 Trained
SC25 |
||||
|
Tested SC3 |
4.4502e-05 |
0.006671 |
0.0061862 |
0.64905 |
|
Tested SC6 |
0.00014434 |
0.012014 |
0.01169 |
1.2207 |
|
Tested SC9 |
0.0001961 |
0.014004 |
0.013693 |
1.4874 |
|
TestedSC12 |
6.9188e-05 |
0.0083179 |
0.008012 |
0.86201 |
|
TestedSC15 |
0.0022639 |
0.04758 |
0.047469 |
5.4246 |
References
Edited by Shahil and Henal
S11172483@student.usp.ac.fj
S11085370@student.usp.ac.fj











Comments
Post a Comment