-
Dear SLEAP team @talmo @roomrys, I have a similar dataset to the one in mice_of. I have downloaded it and was wondering what was the strategy followed for selecting and labeling frames for training. From what I can understand the initial frames were selected using Thanks! A |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
Hi @auesro, Apologies for the late response. I don't believe we have it documented how the suggested labels were generated. @talmo might have the intermediate datasets from initial labeling to the final dataset which could answer your final question. Sorry I can't be of more help |
Beta Was this translation helpful? Give feedback.
-
I believe we used image features for the first round (though I don't think we stored which parameters were used), and for subsequent rounds we just labeled using the prediction score as a guide. We made sure to include many different kinds of data in the annotations with varying numbers of animals and social interactions. |
Beta Was this translation helpful? Give feedback.
-
Hi again, More questions regarding the training of this dataset (considering it a good benchmark for our own data, very similar only that we have 1 mouse per video): Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @auesro, I don't believe we documented how many and what (labeled, suggested or random) frames were predicted during the following training iterations... ...but, here is a brief description of each parameter: Brisk Keypoint ThresholdThe brisk keypoint threshold determines the sensitivity of the keypoint detection process. If the threshold is set too low, the algorithm may include many keypoints that are not distinctive. If the threshold is set too high, the algorithm may miss some keypoints that are actually distinctive. Bag of Features Vocab SizeThe term "bag of features" refers to a method of representing an image as a collection of local features. The "vocab size" of a bag of features refers to the number of visual features that are used to represent an image. A larger vocab size allows the bag of features to capture more detailed information about the visual features of the training images, but also increases the computational complexity of the algorithm. PCA ComponentsPrincipal component analysis (PCA) is a technique for reducing the dimensionality of a dataset by identifying the underlying patterns in the data and projecting the data onto a lower-dimensional space. The PCA components of an image are the vectors that represent the most important features of the image in the lower-dimensional space. The number of PCA components used to represent an image is a hyperparameter that can be adjusted to trade off between accuracy and efficiency. A larger number of components can capture more detailed information about the image, but it can also increase the computational complexity of the algorithm. K-Means ClustersK-means clustering is a method for grouping a set of data points into a specified number of clusters. The algorithm works by first randomly selecting a set of K cluster centers, and then iteratively assigning each data point to the closest cluster center and updating the cluster centers to the mean of the points in each cluster. The number of K-means clusters, K, is an important parameter in the K-means algorithm. A larger value of K will result in more, smaller clusters, while a smaller value of K will result in fewer, larger clusters. Samples per ClusterThe samples per cluster is the number of data points in each cluster after the clustering algorithm has been applied. In general, a larger number of samples per cluster can result in more detailed and accurate cluster assignments, but it can also increase the computational complexity of the algorithm. Thanks, |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, Liezl! That should be enough to get us going. In the reference dataset, I assume you used K-means cluster=40 since there are 40 different groups of frames, right? How do you come up with that number? What would be a good strategy to follow here? Go for a high enough number of clusters that would capture the different poses found in your dataset? Is there any possibility to check whether you fell short of it or, instead, went for too large a number of clusters? |
Beta Was this translation helpful? Give feedback.
I believe we used image features for the first round (though I don't think we stored which parameters were used), and for subsequent rounds we just labeled using the prediction score as a guide. We made sure to include many different kinds of data in the annotations with varying numbers of animals and social interactions.