Gradient descent algorithm & Back propogation

发表于 2024-08-18 更新于 2025-11- 5

作者左辰

14~18 分钟 阅读

1. Introduction and Context: The introduction is clear, but you could add a brief overview of what Gradient Descent and Backpropagation are before diving into the details. This would help readers who might be new to these concepts.
2. Clarity in Mathematical Concepts: When introducing mathematical symbols like y_n and J_{y_n}, ensure that each symbol and its role in the algorithm are explicitly defined. Consider using bullet points or breaking down equations to make them easier to follow.
3. Flow and Transitions: Consider adding smoother transitions between sections, especially when moving from the introduction to the technical content. You could add a sentence or two that explains why the next section is important in the context of the overall discussion.
4. Detailed Examples: Adding a simple, concrete example of how gradients are calculated and how they influence weight updates could help solidify understanding.
5. Conclusion: Adding a brief conclusion summarizing the key points discussed and how they tie together could reinforce the material for the reader.

Gradient Descent Algorithm & Backpropagation

This article focuses on helping to understand the relationship between various concepts and the rationale behind constructing these algorithms. Note that some of the mathematical expressions may not be rigorously formal; the author expresses them in a way that is understandable for those without a strong mathematical background.

Introduction

Recently, I've been working with the backpropagation method in the context of model training. This technique allows us to propagate errors backward through the network, adjusting the parameters of each layer to optimize the model. The algorithm used for this optimization is known as the Gradient Descent Algorithm.

First, let's refresh our understanding of the concept of gradients from calculus, as it is crucial for understanding the gradient descent algorithm.

Prerequisite Knowledge

Gradient

The gradient can be represented as a matrix or vector, pointing in the direction of the steepest increase of the function at a particular point.

Loss

In essence, model training is about fine-tuning parameters to better fit the data. To quantify how well our model fits the data, we need a metric to evaluate the quality of our predictions, which we call Loss.

There are many ways to define Loss, with Mean Squared Error (MSE) being one of the most commonly used metrics.

Text Part

Now that we have defined the goal of minimizing Loss, let’s delve into how it is calculated:

1. Model Output:

- The final Loss value is directly tied to the model’s output, which we’ll denote as y_n.

- y_ncan be a matrix, typically used to store numerical values representing the output of the n-th layer.

2. Defining the Loss Function:

- Using a standard Loss function, such as Mean Squared Error (MSE), we define a function J_n that calculates the Loss based on the outputs.

3. Calculating the Gradient:

- To optimize the model, we calculate the gradient of J with respect to the parameters of the function, which in this case are the outputs of the n-th layer.

- This gradient indicates how we should adjust these outputs to reduce the Loss.

4. Propagation of the Gradient:

- The calculated gradient is then distributed (or propagated) to each neuron in the n-th layer, guiding how each neuron's output should be modified.

5. Updating Weights and Biases:

- The difference between the actual output and the optimized output is used to update the weights and biases in the network. This step optimizes the parameters of the n-th layer.

- This process is repeated layer by layer across the entire network.

深度学习

许可协议: CC BY 4.0