We provided the proof of convergence of gradient descent algorithm for smooth and nonconvex case in our article on Convergence of Gradient Descent for Smooth and Nonconvex Case.
In this article, we will provide a different flavor and provide the convergence bound when the function is smooth and convex.
A quick recap:
Gradient descent is a widely used optimization algorithm that aims to find the minimum of a given function by iteratively updating its parameters in the direction of the negative gradient. In recent years, gradient descent has become particularly popular in the context of training deep neural networks for machine learning tasks. However, many of these problems are characterized by smooth, convex and nonconvex functions, which present unique challenges for convergence.
Recall that the gradient descent algorithm is given as:
Algorithm 1 - Gradient Descent (GD)
Input: An arbitrary vector and a decaying step size ;
1. for do
2. Evaluate the gradient mapping at , i.e., evaluate ;
3. Do the following update:
4. end for
Pre-requisite for convergence:
L-smooth: A differentiable function
Descent Lemma: Let
Subgradient Inequality: A differentiable and convex function
Theorem for smooth and convex case
Let
Proof:
We now use the Lipschitz continuity and Descent lemma to prove the above result.
Where, in
We will now use the subgradient inequality in
Where in
Now summing over
Note that,
Hence, we have:
This concludes the proof of convergence of the gradient descent algorithm when a function is L-smooth and convex.