## Stats: Mean, Variance, and Standard Deviation

These basic calculations are the building blocks of advanced machine learning methods.

Suppose I have three dogs whose age is one, three, and five.

Mean

What is the value of \(\mu_{age}\)?

$$\mu_{age} = \frac{1+3+5}{3}=3$$

Yep. That’s simple.

Variance

Variance measures how far a data point from the mean, which can be calculated with this formula.

$$\sigma^2_{age}=\frac{\sum_{i=1}^{n}(x_{i}-\mu_{age})^2}{n}$$

So, in our case, variance is calculated as follows:

$$\sigma^2_{age}=\frac{(1-3)^2+(3-3)^2+(5-3)^2}{3}=2.7$$

Standard Deviation

Standard deviation is calculated the same as variance but with a square root.

$$\sigma_{age}=\sqrt{\frac{\sum_{i=1}^n(x_{i}-\mu_{age})^2}{n}}$$

Therefore, \(\sigma_{age} is calculated as follows:

$$\sigma_{age}=\sqrt{\frac{(1-3)^2+(3-3)^2+(5-3)^2}{3}}=1.6$$

Population Or Sample

If I were to say, “I have five dogs and pick those 3 to represent all the dogs I have,” the calculation would be a bit different.

Mean

Although the calculation is the same, mean notation is \(\bar{x}\) instead of \(\mu\)

Variance

The denominator is \((n-1)\) instead of \(n\) as an adjustment.

$$\sigma^2_{age}=\frac{\sum_{i=1}^n(x_{i}-\bar{x})^2}{n-1}$$

$$\sigma^2_{age}=\frac{(1-3)^2+(3-3)^2+(5-3)^2}{2}=4$$

Standard Deviation

Yep, SD also changes.

$$\sigma_{age}=\sqrt{\frac{\sum_{i=1}^n(x_{i}-\bar{x})^2}{n-1}}$$

$$\sigma_{age}=\sqrt{\frac{(1-3)^2+(3-3)^2+(5-3)^2}{2}}=2$$