Статья с понятными шагами для исследовательского анализа данных:
1. What question are you trying to solve (or prove wrong)?
Start with the simplest hypothesis possible. Add complexity as needed.

2. What kind of data do you have?
Is your data numerical, categorical or something else? How do you deal with each kind?

3. What’s missing from the data and how do you deal with?
Why is the data missing? Missing data can be a sign in itself. You’ll never be able to replace it with anything as good as the original but you can try.

4. Where are the outliers and why should pay attention to them?
Distribution. Distribution. Distribution. Three times is enough for the summary. Where are the outliers in your data? Do you need them or are they damaging your model?

5. How can you add, change or remove features to get more out of your data?
The default rule of thumb is more data = good. And following this works well quite often. But is there anything you can remove get the same results? Start simple. Less but better.

По этим шагам наглядно разбирается датасет по пассажирам Титаника.
Спойлер: в конце побеждает CatBoost.