Chapter 8: Training loop and min_progress #34

frank-roesler · 2021-12-08T14:20:58Z

Unless I'm mistaken, there is something odd about the main training loop (Listing 8.13) for the Super Mario game in Chapter 8. The way that the current x-position is checked against the min_progress parameter makes no sense to me.
More precisely: in line 23 of the main training loop, the environment step is taken (6 times) and last_x_pos is set to the current x-position:

state2, e_reward_, done, info = env.step(action)
last_x_pos = info['x_pos']

In the following lines of code, neither last_x_pos nor info['x_pos'] are changed. Then in line 33 the two are compared to one another:

if episode_length > params['max_episode_len']:
     if (info['x_pos'] - last_x_pos) < params['min_progress']:
          done = True
     else:
          last_x_pos = info['x_pos']

Isn't info['x_pos'] - last_x_pos always going to be zero here? This would always reset the environment as soon as episode_length > params['max_episode_len'].
What is the min_progress parameter meant to be intuitively? The progress from beginning till the end of one episode? The progress from time 0 till max_episode_len? Or the progress against a certain checkpoint in a certain amount of time? If so, how are these checkpoints chosen?
This has not become clear to me yet, neither from the book nor from the code.

The text was updated successfully, but these errors were encountered:

frank-roesler · 2021-12-26T17:28:49Z

Addon:
This also explains why in figure 8.19 in the book the training time for each episode is always exactly the same (i.e. the horizontal distance between consecutive peaks is always identical). The training loop always runs for params['max_episode_len'] and then resets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 8: Training loop and min_progress #34

Chapter 8: Training loop and min_progress #34

frank-roesler commented Dec 8, 2021

frank-roesler commented Dec 26, 2021

Chapter 8: Training loop and min_progress #34

Chapter 8: Training loop and min_progress #34

Comments

frank-roesler commented Dec 8, 2021

frank-roesler commented Dec 26, 2021