Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stock_NeurIPS2018_2_Train.ipynb: clarification on state space, action space #92

Open
ra9hur opened this issue Aug 5, 2024 · 6 comments

Comments

@ra9hur
Copy link

ra9hur commented Aug 5, 2024

Have couple of questions, should be trivial, but somehow not getting.

  1. Action Space
    As mentioned in the description, for a single share, action space should just be [buy, sell, hold] or {-1, 0, 1}. For multiple shares say 10, action space = {-10 ... -1, 0, 1 ... 10}
    which should be equal to,
    action_space = 2 * stock_dimension + 1

However, referring to env_kwargs in the workbook, "action_space": stock_dimension is being considered.
Can you please clarify ?

  1. State Space
    Also, can you help as to how you arrived at state_space ?
    state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension

I could understand state variables corresponding to - len(INDICATORS)stock_dimension.
Why (1 + 2
stock_dimension) is being added ?

@ZiyiXia
Copy link
Collaborator

ZiyiXia commented Aug 6, 2024

  1. stock_dimension is being considered in the action space because we need the agent to make a decision in [buy, sell, hold] with certain amount for each ticker (30 for the Dow Johns Index).
  2. state_space = 1 (remaining balance in the account) + 2stock_dimension (prices of 30 stocks and the share holdings of the 30 stocks, so totally 2stock_dimension) + len(INDICATORS)*stock_dimension

I'm not sure if I understand your question correctly, just lmk

@ra9hur
Copy link
Author

ra9hur commented Aug 6, 2024

Thanks for the response !! understand state_space now.

@ra9hur
Copy link
Author

ra9hur commented Aug 6, 2024

Here is the description provided for action space in the notebook.

Action: The action space describes the allowed actions that the agent interacts with the environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

Going by the above description,
If we are trading only 1 stock, possible number of actions is : [buy, sell, hold]
action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 1 stock) = 3

if there are 30 stocks, possible number of actions should be : 30 sells, 30 buys and a hold.
So the formula for action space as I understand should be,
action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 30 stocks) = 61


However, in the notebook, action space is considered equal to stock dimension. So, for 30 stocks,
action_space = stock_dimension = 30

Can you please clarify, why "action_space = stock_dimension" is considered ?

@ZiyiXia
Copy link
Collaborator

ZiyiXia commented Aug 8, 2024

Because for each stock, the action is a scalar with in a continuous space instead of a discrete space like {-1, 0, 1}. Or say we just need one action for each stock, and that action is from a continuous space. Thus the action is always a vector with dimension of 30. The amount of each element directly represents buy(+)/hold(0)/sell(-) for that stock.

@ra9hur
Copy link
Author

ra9hur commented Aug 9, 2024

Excellent !! Thanks for clarifying !!

@ven7782
Copy link

ven7782 commented Aug 17, 2024

Thanks for opening up this issue. I will use this thread rather than create a new one. I am studying the notebook Stock_NeurIPS2018.ipynb and experimenting. I notice several issues in it. I will list them here in no particular order.

1.) I am using one stock to train (Say AAPL). Here I notice that the DDPG agent is not learning at all. I notice that the action space used to call the step function converges quickly to -1 (All sell) or 1 (All buy) and the reward calculated is 0 which probably explains the convergence to one specific action. Has anyone observed this?

2.) I see that the agent only buys and sells and does not hold at all during learning. Negative action values are sell and positive is buy. Shouldn't the action space be divided equally between sell, hold and buy
Sell: [-1, -0.33]
Hold: (-0.33, 0.33)
Buy: (0.33, 1]

3.) If the initial action is sell and the number of shares at the start will naturally be 0. In this case the agent is barred from selling and sell_num_shares: 0 until some buy actions are generated. Later sell action goes through because we have shares to sell. I feel this is too restrictive during the initial learning process. The agent should be allowed to sell or buy provided we have the funds.

Any comments or suggestions will be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants