We provided the proof of convergence of gradient descent algorithm for smooth and nonconvex case in our article on Convergence of Gradient Descent for Smooth and Nonconvex Case and for smooth and convex case in Convergence of gradient descent for smooth and convex case. This article is continuation of the convergence of gradient descent algorithm when the function is smooth and strongly convex.
A quick recap:
Gradient descent is a widely used optimization algorithm that aims to find the minimum of a given function by iteratively updating its parameters in the direction of the negative gradient. In recent years, gradient descent has become particularly popular in the context of training deep neural networks for machine learning tasks. However, many of these problems are characterized by smooth, convex and nonconvex functions, which present unique challenges for convergence.
Algorithm 1 - Gradient Descent (GD)
Input: An arbitrary vector and a decaying step size ;
1. for do
2. Evaluate the gradient mapping at , i.e., evaluate ;
3. Do the following update:
4. end for
Pre-requisite for convergence analysis
Before jumping to convergence analysis for smooth and strongly convex cases for gradient descent algorithm, we first define some basic inequalities that will be used:
Lipschitz smoothness: By definition, a differentiable function
Strong Convexity: Consider a continuously differentiable function
By some algebraic manipulation of the above expression, we can say that for a strong convex function, the following also holds:
Theorem for smooth and strongly convex function:
Let
Proof:
Where in
By recursion, we obtain:
This concludes the proof of convergence of gradient descent when the function is smooth and strongly convex.