You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def softmax_sample(distribution, temperature: float):
if temperature == 0:
temperature = 1
distribution = numpy.array(distribution)**(1/temperature)
p_sum = distribution[:,0].sum()
sample_temp = distribution[:,0]/p_sum
action = distribution[int(numpy.argmax(numpy.random.multinomial(1, sample_temp, 1)))][1]
return 0, int(action)
because distribution is a 2d array, every element in it has 2 values. like this
[[ 0. 0.]
[ 4. 1.]
[ 1. 2.]
[ 0. 3.]
[ 0. 4.]
[ 0. 5.]
[ 0. 6.]
[ 0. 7.]
[ 0. 8.]
[ 0. 9.]
[ 0. 10.]
[ 0. 11.]........
The first value is the visit times and the second value is an action index.
p_sum should be calculated based on the first value so we use distribution[:,0].
when choose action index we should return the second value so we use distribution[int(numpy.argmax(numpy.random.multinomial(1, sample_temp, 1)))][1]
The text was updated successfully, but these errors were encountered:
function softmax_sample shoule be changed to:
because distribution is a 2d array, every element in it has 2 values. like this
[[ 0. 0.]
[ 4. 1.]
[ 1. 2.]
[ 0. 3.]
[ 0. 4.]
[ 0. 5.]
[ 0. 6.]
[ 0. 7.]
[ 0. 8.]
[ 0. 9.]
[ 0. 10.]
[ 0. 11.]........
The first value is the visit times and the second value is an action index.
p_sum should be calculated based on the first value so we use
distribution[:,0]
.when choose action index we should return the second value so we use
distribution[int(numpy.argmax(numpy.random.multinomial(1, sample_temp, 1)))][1]
The text was updated successfully, but these errors were encountered: