Mon 20 Nov

In the economics indicators dataset there are a lot of variables so i performed feature selection which is a critical step in refining datasets for modelling or analysis. Also correlation analysis is utilized to identify redundant or highly correlated features, indicating potential multicollinearity.
So i calculated the correlation coefficients between features and set a threshold (e.g., 0.7 or 0.8), paired with high correlations were identified. From these pairs, one feature was retained.
Additionally, addressing multicollinearity is vital. Therefore, I used Variance Inflation Factor (VIF) to assess how much the variance of a feature is inflated by correlations with other features. Normally, high VIF scores (> 5 or 10) signify multicollinearity.
Features with high VIF scores, from which few were dropped and others were combined into composite variables to mitigate multicollinearity.
Next, i will use these refined set of  features for modelling and evaluate the model’s performance using the same.

Leave a Reply

Your email address will not be published. Required fields are marked *