This project demonstrates data cleaning and treatment using the Pandas library. The project utilizes JSON data from a telecommunications company, including customer churn history. The data is normalized, missing or empty string data is identified and filled or removed, and duplicate data is handled. Techniques are applied to identify outliers graphically and analytically, and these values are treated. Finally, One Hot Encoder is used to separate categorical data.
The project focuses on the following:
- Understanding the importance of cleaning a dataset before applying machine learning models
- Identifying impurities in a dataset
- Discovering ways to clean and treat impurities in a dataset
- Understanding how to perform data cleaning and treatment
- Building a step-by-step process to prepare the dataset for machine learning models