Skip to content

Project examles

justheuristic edited this page Mar 9, 2017 · 2 revisions

Here are some of the examples for course projects

Hearthstone/MTG/* bot

There is a popular kind of video&board games that involve collecting and playing cards in a turn-based strategy way.

  • Hearthstone
  • Good old Magic the Gathering
  • Gwent
  • you'll fing 100+ others if you google

Most of those games have a replay system and even an API, e.g. this.

The first challenge is to pre-train your bot on a human expert sessions. The first problem to solve is how to efficiently generalize over large action spaces of possible cards, only a fractions of which are currently available. Luckily for us, there's an article for this.

The things your bot may learn to do include:

  • building a perfect deck to counter your opponent (may include heavy deep learning stuff, metric learning / DSSM)
  • pre-training on human replay sessions
  • playing & training Vs oneself or human 'expert'

[ysda/hse] Should you wish, we can offer extensive assistance with theory and coding for this project.

Scalable RL

There have been several successful attempts to speed up RL by introducing many parallel computation nodes.

The grand-quest is to reproduce the thing for the newly developed PGQ and it's continuous version. Also, trying out the diffrent kinds of parameter server would be really nice.

  • The architecture is "many nodes that play, few nodes that train"
  • Tech stack: redis as a DB, any DL framework you want.

Taken by: udimus,bestxolodec

Benchmarking q-learning with double/duelling/bootstrap/prioritized_er/constrained/soft_targetnet/PGQ on doom

This project aims to figure out whether the numerous articles about improving DQN actually improve it :) We cover an array of those guys in week5 lecture.

Most of them are trained on Atari envs, so it makes a sence to consider non-atari problems as a "private test set".

Luckily, there's a set of such problems called VizDoom. The simplest of doom problems was already covered in assignment 4.2.

So the goal is to reproduce those articles and see how well they generalize to doom envs with minimal tuning.

It is not neccessary to implement all of those, just some of them you like most. The project milestones will include one or a few methods at a time.

Deep attribution

The task is simple: given the episodic reward R(z), try to deduce the per-step r(s,a) so that maximizing them is equal to maximizing R(z), but easier :). We have the baseline that does so on tabular envs, the goal is to generalize to "deep RL" case.