You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to use BoT-SORT with reid on a simple video in which single person is walking on the road, first gets occluded by small tree and then by a billboard. Also this is a drone footage though not at very high altitude and
I am using yolox for detection model with weights from bytetrack_x_mot17.pth.tar and reid model (which is mot17_sbs_S50.pth). This is what paper and code base also uses by default.
BoT-SORT was able to correctly able to recognise same person when he emerges out of tree. However, when he emerges out of bill board, he gets new ID assigned. I tried by increasing track_buffer, proximity_threshold, appearance_threshold as well as match_threshold, but no luck.
So, I tried to debug the code. Here are my observations: For long occlusions (like billboard) iou similarity inside matching.iou_distance() method is [0] (single zero for single person detection). This makes ious_dists = [1] (line 6 in below code excerpt from official BotSORT repo). For long occlusions, appearance similarity is also turns out to be [0], making emb_dists = [1] (line 13). This makes overall dists = [1]. Now this dists is passed to matching function on line 30. Since I set match_thresh to 0.6 which was less than 1, it did not match / associate any existing tracklet with the detection bounding box corresponding to person arising out of bill board, thus assigning new ID to the person.
1 class BoTSORT():
2 def update(self, output_results, img):
3 // ...
4
5 # Associate with high score detection boxes
6 ious_dists = matching.iou_distance(strack_pool, detections) # this is all 1s for long occlusions
7 ious_dists_mask = (ious_dists > self.proximity_thresh)
8
9 if not self.args.mot20:
10 ious_dists = matching.fuse_score(ious_dists, detections)
11
12 if self.args.with_reid:
13 emb_dists = matching.embedding_distance(strack_pool, detections) / 2.0 # this is all 1s for long occlusions
14 raw_emb_dists = emb_dists.copy()
15 emb_dists[emb_dists > self.appearance_thresh] = 1.0
16 emb_dists[ious_dists_mask] = 1.0
17 dists = np.minimum(ious_dists, emb_dists)
18
19 # Popular ReID method (JDE / FairMOT)
20 # raw_emb_dists = matching.embedding_distance(strack_pool, detections)
21 # dists = matching.fuse_motion(self.kalman_filter, raw_emb_dists, strack_pool, detections)
22 # emb_dists = dists
23
24 # IoU making ReID
25 # dists = matching.embedding_distance(strack_pool, detections)
26 # dists[ious_dists_mask] = 1.0
27 else:
28 dists = ious_dists
29
30 matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.args.match_thresh)
So I increased match_thresh to 1.1 and it started working. However this is just the hack, since the thresholds are meant range between 0 to 1 and setting it anything bigger than 1.1 effectively means: if all dists have values 1s, match existing tracks with anything that appears in the scene. If a new person appears in the scene before the occluded person comes out of billboard, that new person gets assigned with older persons ID !
I observed same when there are multiple people in the scene occluded by some object. If a new person comes in the scene before any of occluded person gets un-occluded, that person gets assigned occluded person's ID.
I have following questions:
Q1. Why ious_dists = [1]? Because there is no overlaps between bounding boxes of before and after occlusions?
Q2. Why emb_dists = [1]? Because, the reid model is able to generate similar features for same person before and after occlusion?
Q3. If answer to Q2 is Yes, then do I need use reid model fine tuned on my dataset? Just for reference, BoTSORT paper says this:
For the feature extractor, we trained FastReID’s [19] SBS-50 model for MOT17 and MOT20 with their default training strategy, for 60 epochs.
while fast-reid paper says this:
We propose a cross-domain method FastReIDMLT that adopts mixture label transport to learn pseudo label by multi-granularity strategy. We first train a model with a source-domain dataset and then finetune on the pre-trained model with pseudo labels of the target-domain dataset.
Q4. If answer to Q3, is Yes, then is there any approach / model that allows us to do re-identification without fine tuning re-id model?
Q5. I am also doubt ful about above part of code. If ious_dists is all 1s (line 6), ious_dists_mask will become all True (line 7), which will make emb_dists all 1s on line 16, making dists all 1s on line 17. My understanding was that we should be using appearance similarity for long occlusions, but here zero IoU similarity is nullifying appearance similarity for long occlusions. Isnt it wrong? Or am missing something?
The text was updated successfully, but these errors were encountered:
I was trying to use BoT-SORT with reid on a simple video in which single person is walking on the road, first gets occluded by small tree and then by a billboard. Also this is a drone footage though not at very high altitude and
I am using yolox for detection model with weights from
bytetrack_x_mot17.pth.tar
and reid model (which ismot17_sbs_S50.pth
). This is what paper and code base also uses by default.BoT-SORT was able to correctly able to recognise same person when he emerges out of tree. However, when he emerges out of bill board, he gets new ID assigned. I tried by increasing
track_buffer
,proximity_threshold
,appearance_threshold
as well asmatch_threshold
, but no luck.So, I tried to debug the code. Here are my observations: For long occlusions (like billboard) iou similarity inside
matching.iou_distance()
method is[0]
(single zero for single person detection). This makesious_dists = [1]
(line 6 in below code excerpt from official BotSORT repo). For long occlusions, appearance similarity is also turns out to be[0]
, makingemb_dists = [1]
(line 13). This makes overalldists = [1]
. Now thisdists
is passed to matching function on line 30. Since I setmatch_thresh
to 0.6 which was less than 1, it did not match / associate any existing tracklet with the detection bounding box corresponding to person arising out of bill board, thus assigning new ID to the person.So I increased
match_thresh
to 1.1 and it started working. However this is just the hack, since the thresholds are meant range between 0 to 1 and setting it anything bigger than 1.1 effectively means: if alldists
have values1
s, match existing tracks with anything that appears in the scene. If a new person appears in the scene before the occluded person comes out of billboard, that new person gets assigned with older persons ID !I observed same when there are multiple people in the scene occluded by some object. If a new person comes in the scene before any of occluded person gets un-occluded, that person gets assigned occluded person's ID.
I have following questions:
Q1. Why
ious_dists = [1]
? Because there is no overlaps between bounding boxes of before and after occlusions?Q2. Why
emb_dists = [1]
? Because, the reid model is able to generate similar features for same person before and after occlusion?Q3. If answer to Q2 is Yes, then do I need use reid model fine tuned on my dataset? Just for reference, BoTSORT paper says this:
while fast-reid paper says this:
Q4. If answer to Q3, is Yes, then is there any approach / model that allows us to do re-identification without fine tuning re-id model?
Q5. I am also doubt ful about above part of code. If
ious_dists
is all1
s (line 6),ious_dists_mask
will become allTrue
(line 7), which will makeemb_dists
all 1s on line 16, makingdists
all 1s on line 17. My understanding was that we should be using appearance similarity for long occlusions, but here zero IoU similarity is nullifying appearance similarity for long occlusions. Isnt it wrong? Or am missing something?The text was updated successfully, but these errors were encountered: