Machine Learning: Dealing with High Dimensional Data

This project focus on using machine learning techniques to handle high-dimensional data. The dataset includes results from various medical exams used to determine whether a specific type of cancer is malignant or benign. The data is scaled, correlations are evaluated, and a classifier is used to identify the main parameters for assessing cancer type and to develop the best model for this problem.

The goal of this project is to explore methods for handling and analyzing high-dimensional data to build an effective machine learning model for cancer classification.

The project focuses on the following:
- Understand high-dimensional data.
- Create a machine learning model for classification.
- Construct a correlation matrix using Pandas and Seaborn.
- Learn how to select features using data visualizations.
- Use Scikit-learn to create automatic feature selection models.
- Apply dimensionality reduction techniques (PCA and T-SNE).

Developed: sep, 2023

Published: jul 15, 2024