deeceeo

joined 11 months ago
[–] deeceeo@alien.top 1 points 11 months ago

It likely won't matter. Most models (e.g. I'm guessing something like xgboost) can deal robustly with these types of correlations.

If you like, you can combine the two into a single variable and may get slightly improved performance (0 for male, 1 for female and 2 for pregnant female) assuming the dataset can fit the rule (e.g. trans men). This way, a tree-based model could draw a boundary between 0 and 1 based on gender or 1 and 2 based on pregnancy.