Mon 6 nov
Hierarchical clustering is a valuable technique for analyzing geospatial data that includes latitude and longitude variables.
Since our data has these, I calculated the distance between them using distance metric, such as Euclidean to measure the dissimilarity between locations. Using this metric, calculated a pairwise distance matrix representing the differences between all pairs of locations. Then applied an agglomerative hierarchical clustering algorithm, coupled with a linkage method of choice to the distance matrix. The outcome was a dendrogram, visually displaying the hierarchical structure of clusters.
interpreted and analysed the results of spatial patterns within the identified clusters and investigated the geospatial implications of the clustering.
Fri 3 Nov
K-Means and DBSCAN are two clustering algorithms:
K-Means:
Partition-based clustering.
Requires the number of clusters (K) to be specified beforehand.
Assigns data points to the nearest cluster centroid.
Sensitive to initial centroid placement.
Performs hard clustering (each point belongs to one cluster).
Assumes spherical clusters.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
Density-based clustering.
Automatically finds clusters based on data density, no need to specify K.
Identifies clusters of arbitrary shapes.
Handles noise/outliers.
Accommodates variable cluster sizes.
Primarily performs hard clustering, but soft clustering can be achieved using extensions.
For the project i have used DBSCAN clustering algorithm.
Wed 1 Nov
Logistic regression is a statistical modelling technique used to analyse the relationship between a binary outcome variable and one or more predictor variables.
I have taken “manner_of_death” as the binary outcome variable and the other columns as predictor variables. The “manner_of_death” column indicates whether the death was “shot” or “shot and tasered,” and other columns, like “armed,” “age,” “gender,” and “race,” may be used as predictor variables to estimate the risk of a particular mode of death.
Then i have explored the data and preprocessed it by handling missing values, encoding the categorical variables. Later built a logistic regression model by fitting the data. The binary variable is the dependent variable and the predictor variable is the independent data.
Evaluated the model’s performance using accuracy, precision, recall, F1 score, R2 score which were all satisfactory.