# Does SGD in Tensorflow make a move with each data point?

## Issue

I assumed the "stochastic" in Stochastic Gradient Descent came from the random selection of samples within each batch. But the articles I have read on the topic seem to indicate that SGD makes a small move (weight change) with every data point. How does Tensorflow implement it?

## Solution

Yes, SGD is indeed randomly sampled, but the point here is a little different.

SGD itself doesn’t do the sampling. *You* do the sampling by batching and hopefully shuffling between each epoch.

GD means you generate gradients for each weight after forward propping the entire dataset (batchsize = cardinality, and steps per epoch = 1). If your batch size is less than the cardinality of the dataset, then *you* are the one doing sampling, and you are running SGD not GD.

The implementation is pretty simple, and something like

- Forward prop a batch / step.
- Find the gradients.
- Update weights with those gradients
- Back to step 1

