deeceeo

joined 10 months ago
[–] deeceeo@alien.top 1 points 10 months ago

It likely won't matter. Most models (e.g. I'm guessing something like xgboost) can deal robustly with these types of correlations.

If you like, you can combine the two into a single variable and may get slightly improved performance (0 for male, 1 for female and 2 for pregnant female) assuming the dataset can fit the rule (e.g. trans men). This way, a tree-based model could draw a boundary between 0 and 1 based on gender or 1 and 2 based on pregnancy.