vignettes/a05-UnsupervisedClustering.Rmd
a05-UnsupervisedClustering.Rmd
Embeddings are learned feature representations from CNNs. Instead of doing classification, the CNNs can also be used as feature extractors for spectrogram images.
Below we use the pre-trained model to extract embeddings and use unsupervised clustering to identify signals.
When unsupervised is set to TRUE, the function assigns clusters to the extracted embeddings using the HDBSCAN clustering algorithm. It then identifies the cluster with the most observations of the target_class.
ModelPath <- "inst/extdata/trainedresnetbinary/_imagesmalaysia_5_resnet18_model.pt"
result <- extract_embeddings(
test_input = "inst/extdata/multiclass/test/",
model_path = ModelPath,
target_class = "female.gibbon",
unsupervised = "TRUE"
)
result$EmbeddingsCombined
Here we can see the Normalize Mutual Information score. The function calculates the Normalized Mutual Information (NMI) score between the clustering results and the ground truth labels, and it generates a confusion matrix comparing the unsupervised clusters with the known class labels. This allows evaluation of how well the unsupervised clustering aligns with the true class labels.
result$NMI
The confusion matrix results when we use ‘hdbscan’ to match the target class to the cluster with the largest number of observations.