# What is num_units in tensorflow BasicLSTMCell?

## Issue

In MNIST LSTM examples, I don’t understand what “hidden layer” means. Is it the imaginary-layer formed when you represent an unrolled RNN over time?

Why is the `num_units = 128`

in most cases ?

## Solution

The number of hidden units is a direct representation of the learning capacity of a neural network — it reflects the number of *learned parameters*. The value `128`

was likely selected arbitrarily or empirically. You can change that value experimentally and rerun the program to see how it affects the training accuracy (you can get better than 90% test accuracy with *a lot* fewer hidden units). Using more units makes it more likely to perfectly memorize the complete training set (although it will take longer, and you run the risk of over-fitting).

The key thing to understand, which is somewhat subtle in the famous Colah’s blog post (find *“each line carries an entire vector”*), is that ** X is an array of data** (nowadays often called a

*tensor*) — it is not meant to be a

*scalar*value. Where, for example, the

`tanh`

function is shown, it is meant to imply that the function is *broadcast*across the entire array (an implicit

`for`

loop) — and not simply performed once per time-step.As such, the *hidden units* represent tangible storage within the network, which is manifest primarily in the size of the *weights* array. And because an LSTM actually does have a bit of it’s own internal storage separate from the learned model parameters, it has to know how many units there are — which ultimately needs to agree with the size of the weights. In the simplest case, an RNN has no internal storage — so it doesn’t even need to know in advance how many “hidden units” it is being applied to.

- A good answer to a similar question here.
- You can look at the source for BasicLSTMCell in TensorFlow to see exactly how this is used.

*Side note: This notation is very common in statistics and machine-learning, and other fields that process large batches of data with a common formula (3D graphics is another example). It takes a bit of getting used to for people who expect to see their for loops written out explicitly.*

Answered By – Brent Bradburn

**This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **