AI Glossary

Stochastic Gradient Descent (SGD)

A variation of Gradient Descent that updates a model using smaller subsets of training data.

Overview

As datasets grow larger, training a machine learning model can become increasingly computationally expensive.

One solution is to avoid processing the entire dataset at every learning step.

This idea forms the foundation of Stochastic Gradient Descent, often abbreviated as SGD.

SGD is a variation of Gradient Descent that updates model parameters using smaller portions of data rather than the entire training dataset. These smaller updates often allow learning to occur more efficiently.

A helpful way to think about SGD is learning from small practice sessions rather than reviewing an entire textbook before making adjustments.

By making frequent updates based on smaller samples of data, the model can often learn more quickly while still moving toward better performance.

Because of its efficiency and scalability, SGD has become one of the most widely used optimization methods in machine learning and deep learning.

Why It Matters

SGD helps large machine learning models train efficiently on massive datasets.

Real-World Example

A social media platform training recommendation models may use SGD to efficiently process millions of user interactions.

Related Concepts

Gradient Descent
Backpropagation
Batch Size
Epoch
Neural Network