How to Calculate Variance in Python with numpy and 2 Other Ways

In this short tutorial, I will show you how to calculate variance in Python with numpy, statistics modules and with built-in functions.

Doing statistics is fun for me so I hope you will find this exercise enjoyable as well. Before jumping into the calculation, let’s refresh our memory with Variance definition.

What is Variance?

Variance in statistics is a measurement of the spread between numbers in a data set. Here is an example to explain it.

Everyone knows that Curry led the NBA in free-throw percentage with 90.31%. His free-throw is very consistent so I expect the variance to be small and close to the mean. As for me, some day I make 10 free-throws in a row. Another day, I make none. So I am very inconsistent so the variance is very large.

In other words, without knowing Curry free-throw skills, you can tell that he has an amazing basketball free-throw skills by just looking at the variance.

Calculate Variance of a List in Python

Read below examples to see how to find variance in python in different ways.

Using numpy module

The first and most popular way to calculate variance is to use numpy because it’s the fundamental package needed for scientific computing with Python. You will find calculating variance super easy! Look at the code block below.

199
200
201
202
203
204
205
206
207
208
209
# using numpy
import numpy as np

number_list = [-4.803, -1.247, -1.579, -11.603, -0.339,
          2.587, 1.396, -1.692, -4.519, -0.247]

print(np.var(number_list))

# if you prefer to calculate variance n-1, use ddof parameter
np.var(number_list, ddof=1)  # ddof default value is 0
#

Using built-in functions

Sometimes, you cannot install any external python module or package to calculate variance. So you must rely on built in function. In example below, you will learn how to use variance formula in Python

210
211
212
213
214
215
216
217
# calculate mean
mean_of_number_list = sum(number_list) / len(number_list)

# calculate variance 
variance_of_number_list = sum((xi - mean_of_number_list) ** 2 for xi in number_list) / len(number_list)

print(variance_of_number_list)
#

Using statistics module

If you are using Python 3.4 or later, you can use variance function from statistics module that comes with Python installation by default.

218
219
220
221
222
223
224
225
226
# adding variance function from statistics module
from statistics import variance
variance(number_list)

# to calculate population variance, use pvariance function
from statistics import pvariance

print(pvariance(number_list))
#

Conclusion

And this is how to compute the variance of a data set in Python with 3 different ways including the numpy module. I hope you find this short tutorial useful. If you know any other way to calculate the variance in Python, please share in the comment section below. Have a great day!