Measures of Central tendency

When dealing with numbers, it is important to find a single value that best represents the entire dataset. In statistics, this is done using measures of central tendency. The three most widely used measures are Mean, Median, and Mode. Each one describes the “center” of data in a different way, and together, they help us understand the nature of the dataset more clearly.

Mean (The Average)

The mean, often called the average, is the most common measure of central tendency. It is calculated by adding up all the values in a dataset and dividing the sum by the number of values. The mean gives us an idea of the overall level of the data.

Example:
Suppose five students scored the following marks in a test:
50, 60, 70, 80, 90

Step 1: Add all scores → 50 + 60 + 70 + 80 + 90 = 350
Step 2: Divide by total number of students (5) → 350 ÷ 5 = 70

The mean score is 70, which represents the average performance of the group.

However, one drawback of the mean is that it is sensitive to extremely high or low values (outliers). For example, if one student scored 0, the average would drop significantly, even though most students scored high.

Median (The Middle Value)

The median is the middle value of an ordered dataset. Unlike the mean, the median is not affected by very large or very small numbers, which makes it a better measure in cases where data is skewed.

Steps to find the Median:

  1. Arrange the data in ascending order.

  2. If the number of values is odd, the middle one is the median.

  3. If the number of values is even, the median is the average of the two middle numbers.

Example 1 (Odd case):
Data = 5, 7, 8, 12, 14
Median = 8 (the middle number)

Example 2 (Even case):
Data = 2, 4, 6, 8
Median = (4 + 6) ÷ 2 = 5

The median is especially useful in income or property data, where a few very high values can distort the mean.

Mode (Most Frequent Value)

The mode represents the value that occurs most often in a dataset. Unlike the mean and median, which are based on calculations, the mode is based on frequency.

Example 1:
Data = 2, 3, 3, 5, 7
Mode = 3 (appears twice)

Example 2:
Data = 1, 1, 2, 2, 3, 3
Modes = 1, 2, and 3 (all appear with equal frequency → multimodal dataset)

Example 3:
Data = 10, 20, 30, 40
No mode (all numbers appear once)

The mode is particularly useful in categorical data, like finding the most popular brand, product, or preference in a survey.

Conclusion

  • Mean shows the average value but is affected by outliers.

  • Median gives the true middle value and is more reliable for skewed data.

  • Mode highlights the most frequent occurrence and works best for categorical data.

Together, these three measures of central tendency provide a comprehensive understanding of any dataset.

Share:

More Posts

What is Statistics?

Statistics is the branch of science which deals with the collection, presentation, and analysis of data, and making conclusions about the population on the basis

Linear Regression in Python

IntroductionLinear regression is a fundamental statistical and machine learning technique used to model the relationship between a dependent variable and one or more independent variables.

Linear Regression in R

IntroductionLinear regression is one of the most widely used statistical techniques. It helps understand the relationship between a dependent variable and one or more independent

Send Us A Message