You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What you get here is no error. It just works. I just wondered in my log that it doesn't seem to converge, but otherwise, all looked reasonable.
The bug is: loss here is [B,T+1,T], because T != T+1, so it gets broadcasted.
For other functions like rf.combine, rf.compare, rf.clip_by_value, rf.where, we have the arg allow_broadcast_all_sources. Or rf.concat has allow_broadcast. The default is False.
Note, the implementation of rf.cross_entropy in the general case just uses rf.gather or rf.matmul, which both don't have such an argument. For both matmul/gather, there are also many valid cases where broadcasting would be wanted (maybe broadcasting is also the wrong term, not sure). So if we would add such an argument, maybe the default would be True.
I'm not sure if there are many valid cases for rf.cross_entropy where broadcasting would make sense. So here the default should rather be False. I think it would break almost no setups, except my bugged code above, where it was unintentionally wrong anyway.
But I'm not sure.
The text was updated successfully, but these errors were encountered:
I had this bug:
What you get here is no error. It just works. I just wondered in my log that it doesn't seem to converge, but otherwise, all looked reasonable.
The bug is:
loss
here is [B,T+1,T], because T != T+1, so it gets broadcasted.For other functions like
rf.combine
,rf.compare
,rf.clip_by_value
,rf.where
, we have the argallow_broadcast_all_sources
. Orrf.concat
hasallow_broadcast
. The default is False.Note, the implementation of
rf.cross_entropy
in the general case just usesrf.gather
orrf.matmul
, which both don't have such an argument. For bothmatmul
/gather
, there are also many valid cases where broadcasting would be wanted (maybe broadcasting is also the wrong term, not sure). So if we would add such an argument, maybe the default would be True.I'm not sure if there are many valid cases for
rf.cross_entropy
where broadcasting would make sense. So here the default should rather be False. I think it would break almost no setups, except my bugged code above, where it was unintentionally wrong anyway.But I'm not sure.
The text was updated successfully, but these errors were encountered: