September 11, 2024

Python statistics module

The statistics module in Python provides functions for calculating mathematical statistics of numeric data. It includes functions for calculating measures of central tendency, measures of spread, and other statistical properties. This module is part of the Python standard library, so no additional installation is required.

1. Importing the statistics Module

To use the functions provided by the statistics module, you need to import it first:

import statistics

2. Measures of Central Tendency

Measures of central tendency describe the center of a data set. The statistics module provides functions to calculate the mean, median, and mode.

2.1. Mean (Average)

statistics.mean(data) returns the arithmetic mean (average) of the data.

import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
mean_value = statistics.mean(data)
print("Mean:", mean_value)

2.2. Median

statistics.median(data) returns the median (middle value) of the data.

import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
median_value = statistics.median(data)
print("Median:", median_value)

2.3. Mode

statistics.mode(data) returns the mode (most common value) of the data.

import statistics

data = [1, 2, 2, 3, 4, 4, 4, 5, 6]
mode_value = statistics.mode(data)
print("Mode:", mode_value)

3. Measures of Spread

Measures of spread describe how much the data varies. The statistics module provides functions to calculate variance and standard deviation.

3.1. Variance

statistics.variance(data) returns the variance of the data, which is a measure of how much the data varies from the mean.

import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
variance_value = statistics.variance(data)
print("Variance:", variance_value)

3.2. Standard Deviation

statistics.stdev(data) returns the standard deviation of the data, which is the square root of the variance and provides a measure of the spread of the data around the mean.

import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
stdev_value = statistics.stdev(data)
print("Standard Deviation:", stdev_value)

4. Other Statistical Functions

The statistics module also includes several other useful functions for statistical analysis.

4.1. Median Low and Median High

  • statistics.median_low(data): Returns the low median (the smaller of the two middle values) when the data has an even number of elements.
  • statistics.median_high(data): Returns the high median (the larger of the two middle values) when the data has an even number of elements.
import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8]
median_low_value = statistics.median_low(data)
median_high_value = statistics.median_high(data)
print("Median Low:", median_low_value)
print("Median High:", median_high_value)

4.2. Median Grouped

statistics.median_grouped(data, interval=1) returns the median of grouped continuous data, calculated as the 50th percentile.

import statistics

data = [1, 2, 2, 2, 3, 4, 4, 5, 6]
median_grouped_value = statistics.median_grouped(data)
print("Median Grouped:", median_grouped_value)

4.3. Harmonic Mean

statistics.harmonic_mean(data) returns the harmonic mean of the data, which is the reciprocal of the arithmetic mean of the reciprocals of the data values.

import statistics

data = [40, 60, 80]
harmonic_mean_value = statistics.harmonic_mean(data)
print("Harmonic Mean:", harmonic_mean_value)

4.4. Geometric Mean

statistics.geometric_mean(data) returns the geometric mean of the data, which is the nth root of the product of n numbers. This is particularly useful for data that grows exponentially.

import statistics

data = [1, 2, 3, 4, 5]
geometric_mean_value = statistics.geometric_mean(data)
print("Geometric Mean:", geometric_mean_value)

5. Handling Data with Multiple Modes

If your data set has multiple modes, you can use statistics.multimode(data) to return a list of all the modes:

import statistics

data = [1, 2, 2, 3, 3, 4, 4]
modes = statistics.multimode(data)
print("Modes:", modes)