Have you ever worked with data that had gaps or missing values? I certainly have, and it's a common challenge in the world of data analysis. Let me share with you a simple way to handle missing data using the mean imputation technique in Python.
DATASET
In this thread, we will be working with a dataset called "Fill_na_in_panda" which contains data such as name, height, and weight.
Step 1: Import Libraries
First things first, import the necessary libraries. You'll need pandas for handling data and calculations. If you haven't installed it yet, just use:
#Import Libraries
import pandas as pd
Step 2: Load Your Data
Load your dataset into a data frame using pd.read_csv() or any other appropriate function. Let's call it data.
#Load Your Data
data = pd.read_csv('/content/Fill_na_in_panda.csv')
data.head()
Step 3: Checking Dataset for Missing Value
Use isnull() function to check the number of missing values in each column before we proceed to fill the missing value with the mean.
The result shows that our height column has 5 missing values.
#Checking Dataset for Missing Value
data.isnull().sum()
Step 4: Calculate Column 'height' Mean
Calculate the mean of the column 'height' containing missing values. You can use the .mean() method on the specific column.Â
The result shows that the mean value for the 'height' column is 4.53
#Calculate Column 'height' Mean
mean_height = data['height'].mean()
mean_height
Step 5: Fill Missing Values
Now comes the magic part! Use the .fillna() method to fill the missing values in your column with the calculated mean.
#Fill Missing Values
data['height'].fillna(mean_height, inplace=True)
Step 6: Checking Dataset for Missing Value Again
Use isnull() function to check the number of missing values in each column for confirmation.
This time the result shows that no value is missing in our dataset.
That's it! You've successfully replaced the missing values with the mean of the column. Your dataset is now more complete for analysis.
Optional - Save Your Changes
If you'd like to save your modified DataFrame, you can do so using to_csv() or any other relevant method.
#Save Your Changes
data.to_csv('filled_dataset.csv', index=False)
Relevant Link: GitHub
For those of you new to this newsletter - stay connected on LinkedIn, follow on Twitter, and follow on Facebook Page.
To support Data with Vividus in delivering weekly content, Click HERE to support a coffee. Your support means the world!
This post is public so feel free to share it, Thank you for reading…