Q-Learning pseudocode | Mathematical notation #432

fardinafdideh · 2023-12-09T15:05:22Z

Hi,
My remark is about the mathematical notation of Q-Learning pseudocode in unit2.ipynb.
I found the following notation a little bit confusing:
Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
Maximization should be taken over all possible values for the action variable (second variable) of the two-variable function Q, while the above expression, i.e., max Q(s',a'), maximizes the Q at the specified points of s' and a' as its first and second variable. It can become clearer if the general variables and specified points are represented with small and capital letters, respectively, e.g., Q(s, a) function at the specified points s=S and a=A can be represented as Q(S, A).
So:

Current version: max Q(s',a') implies maximization of the two-variable function Q at the specifief points of s' and a' (since s' has been defined to be a specified point).
Suggested version: max_a Q(S',a) implies maximization of the Q function at the specific point of S' (as its first variable) and over its second variable, i.e., a.

Q-Learning pseudocode | Mathematical notation

604e5da

simoninithomas mentioned this pull request Dec 12, 2023

January Update #439

Closed

11 tasks

simoninithomas mentioned this pull request Mar 1, 2024

MARCH 2024 Big Update #496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q-Learning pseudocode | Mathematical notation #432

Q-Learning pseudocode | Mathematical notation #432

fardinafdideh commented Dec 9, 2023

Q-Learning pseudocode | Mathematical notation #432

Are you sure you want to change the base?

Q-Learning pseudocode | Mathematical notation #432

Conversation

fardinafdideh commented Dec 9, 2023