After analysing the “food establishment inspections” dataset, it was determined that the decision tree technique would work best because it can be used for both regression and classification, can handle both numerical and categorical data, implicitly perform feature selection, robust to outliers and missing values.
They use a structure akin to a tree to represent decisions and their possible outcomes. The nodes in the tree stand for features, the branches for decision rules, and the leaves for the result or goal variable.
Decision trees determine the optimal feature to split the dataset at each node based on a variety of parameters, such as information gain, entropy, and Gini index.
The algorithm recursively splits the dataset based on the selected features until a stopping criterion is met. This could be a maximum depth limit, minimum samples at a node, or others to prevent overfitting.
After the tree is constructed, each new instance moves through the tree according to the feature values until it reaches a leaf node, which yields the expected result.
Next, i’ll analyse few more datasets and perform models accordingly.