Some basic examples for reinforcement learning

Installing Anaconda and Gymnasium

Download and install Anaconda here
Create conda env for managing dependencies and activate the conda env

conda create -n conda_env
conda activate conda_env

Install gymnasium (Dependencies installed by pip will also go to the conda env)

pip install gymnasium[all]
pip install gymnasium[accept-rom-license]

# Try the next line if box2d-py fails to install.
conda install swig

Install ai2thor if you want to run navigation_agent.py

pip install ai2thor==2.4.10

Install torch with either conda or pip

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

pip install torch torchvision torchaudio

Install other dependencies

pip install numpy pandas matplotlib

Examples

Play with the environment and visualize the agent behaviour

import gymnasium as gym
render = True # switch if visualize the agent
if render:
    env = gym.make('CartPole-v0', render_mode='human')
else:
    env = gym.make('CartPole-v0')
env.reset(seed=0)
for _ in range(1000):
    env.step(env.action_space.sample()) # take a random action
env.close()

Random play with CartPole-v0

import gymnasium as gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        print(observation)
        action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(action)
        done = np.logical_or(terminated, truncated)
env.close()

Example code for random playing (Pong-ram-v0,Acrobot-v1,Breakout-v0)

python my_random_agent.py Pong-ram-v0

Very naive learnable agent playing CartPole-v0 or Acrobot-v1

python my_learning_agent.py CartPole-v0

Playing Pong on CPU (with a great blog). One pretrained model is pong_model_bolei.p(after training 20,000 episodes), which you can load in by replacing save_file in the script.

python pg-pong.py

Random navigation agent in AI2THOR

python navigation_agent.py

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
MDP		MDP
RLalgorithm		RLalgorithm
bandits		bandits
derivativefree		derivativefree
modelfree		modelfree
policygradient		policygradient
project_template		project_template
.gitignore		.gitignore
README.md		README.md
_policies.py		_policies.py
my_learning_agent.py		my_learning_agent.py
my_random_agent.py		my_random_agent.py
navigation_agent.py		navigation_agent.py
pg-pong.py		pg-pong.py
pong_model_bolei.p		pong_model_bolei.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Some basic examples for reinforcement learning

Installing Anaconda and Gymnasium

Examples

About

Releases

Packages

Contributors 4

Languages

ucla-rlcourse/RLexample

Folders and files

Latest commit

History

Repository files navigation

Some basic examples for reinforcement learning

Installing Anaconda and Gymnasium

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages