#IFT4055 - Journal
- Learned about MDP and Q-function (see MDP.pdf)
- SMiRL paper up to page 6 (see Smirl.pdf).
Questions I need to answer :
- Auxiliary objective, what is this exactly?
- Minimizing the R.H.S to get maximum reward
- Estimate of state marginal (cannot seem to find reference for that)
- How / how fast can we find the distribution that fits our p_{\theta_t}(s)
- Maximum likelihood estimation : OK. Maximum likelihood state density estimation process???
- We can't assume independence of states like what I've seen. What is used for Maximum likelihood?
What I (think) I need to do next :
- More reading/watching on maximum likelihood in machine learning context
- Read paper about DQN algorithm : https://arxiv.org/pdf/1312.5602.pdf
- Read paper about TRPO algorithm
- Part with Density estimation with learned representations?