introduction:-
In this article we will see that how Starcraft 2 environment is used in Reinforcement Learning Coach. We will setup the Starcraft2 environment first with installing the game setup. We will see how exactly the Deepmind’s model works. We will use coach to create a bot that is supposed to play the game perfectly. As we train the environments we will save the checkpoints and these models will be the basis for using the Open VINO Toolkit optimizer. From the optimizer we will generate both the xml and bin files for inference. We have also saved the gif files while training and inference is done in order to show the optimized simulation for the training process. We will touch on RL Coach dashboard.
The flow for Starcraft 2 RL coach and Open VINO toolkitIn the next figure we will see the flow for RL Coach and Open VINO toolkit for Starcraft 2.
Let us discuss the flow:
- According to the flow, we will first setup Starcraft2 IDE for the Reinforcement Learning process.
- Next, we need to connect the reinforcement learning Coach with Starcraft 2 API. We find the algorithms that are supportive for the usage of reinforcement learning coach they are A3C and Dueling DDQN.
- The general motive for the game is to mine minerals.
- After that we start the training process in Reinforcement Learning Coach, saving your model as checkpoints.
- Now from the saved model, we initialize the Optimizer and from that, we create an xml and bin file for further inference.
- The inference is done on the training we did and the last training process that we performed.
We have this Starcraft2 model released which is an open-source application that helps in training Reinforcement Learning models.
With the release of the open-source tool we can use various scenarios in StarCraft 2 as Reinforcement Learning Environments.
Starcraft 2 AI model is atesting scenario for training the AI, it uses the concepts of Deep Reinforcement Learning.
We will work on training the Starcraft 2 model using Reinforcement Learning Coach and optimize it using the Open VINO toolkit.
We will be training Deep Q Model with Dueling DQN and A3C on the collect minerals Shard Game of Deepmind’s Starcraft 2 AI environment.
The general scenario for the game is to collect mineral shards.
How the Deep Q learning approach evolvedDeepmind’s first attempts to run the simulations were based on Atari Games.
Deepmind’s team created an algorithm that is known as Deep Q Learner and used this algorithm to beat any Atari Game.More details are shown below.
https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning
Deepmind combined two different ideas of Machine Learning. Deepmind combined the idea of deep learning which in this case is learning about features. It will learn necessary features which are labels. We then use CNN to know about dense features from game scene windows. From the game screen windows they got were pixels and learned dense representations. After training it got converted to an output that was either up, down, left, or right arrow keys values.
We can get more details of Deepmind in the following link below(https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/)
The flow of the process as per Deepmind for Atari Games is shown in the figure below.
It not only took sensory input(movements) from the game and also used what is called Q learning. Q Learning is a type of Reinforcement Learning which generalizes into creating a Q matrix. The Q learning algorithm describes how to pick an action from strategy related to game policy. We pick the action and observe it in the game. We will see that if we get a reward, +1, and -1 for not getting the reward, and based on the results we then update the Q matrix.
Alpha GoThe following figure shows how Deepmind worked on an AI that has the capability to beat grandmasters for the Chinese game Go.
More details on Alpha Go can be found here
The game is so complex that the search space for the game is very vast for an AI to brute force through all the options. There are so many combinations of game states that it was too hard for an AI to compute.
Deepmind’s algorithm was so good that it was successful in beating the grandmaster for the game Go.
The game Alpha GO uses 2 different Neural networks.
One was for policy network
One was for value network
Both of them computed two different values.
Using the reinforcement Learning approach for policy and value the algorithm had faster access through gigantic tree search.
The tree search is called a Monte Carlo tree search.
Monte Carlo Tree search simulates a search tree and Artificial Intelligence selects and action at each time step based on the action value and prior probability which is the output of the policy network and some exploration parameter. So it uses the values of policy network and the value network as guides to help it search through the tree of possible moves that the game can play in every time steps and then trained Alpha Go to compete at its best to become an expert.
StarCaft2 principlesWhen we are building the strategy for Starcraft 2 to make an AI obviously we are thinking in terms of building a world-class Starcraft 2 player.
The key considerations for the game are:
- When we should speed our wealth gaining process
- How we should build our army
Deepmind in collaboration with Blizzard’s entertainment gave rise to this opensource version of Starcraft 2 known as PySC2. PySC2 has the following components:
It has an API wrapper completely written in python. It comprises of datasets of anonymous replays and consists of minimum Reinforcement learning games.
Steps required to install PySC2Installing PySC2 is shown in the following link
https://github.com/deepmind/pysc2
After the entire installation process is complete we have to test the installation
From the terminal, we will write this command.
python -m pysc2.bin.agent --map CollectMineralShards
It will start the ide and the process.
We get the following response in the terminal.
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
I0116 22:15:50.053973 140673773569792 sc_process.py:110] Launching SC2: /home/abhi/StarCraftII/Versions/Base55958/SC2_x64 -listen 127.0.0.1 -port 21071 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-bmdze23u/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50
I0116 22:15:50.056916 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/abhi/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 21071 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-bmdze23u/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50'
Starting up...
Startup Phase 1 complete
I0116 22:15:51.060845 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 1, running: True
Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:21071 (21071)
Startup Phase 3 complete. Ready for commands.
I0116 22:15:52.063097 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 2, running: True
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface enabled
Configure: score interface enabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
After closing the application or terminal now we will look at the list of RL coach presets for the environment that is available for Starcraft 2, they are
Starcraft_CollectMinerals_A3C
Starcraft_CollectMinerals_Dueling_DDQN
RL Coach and OpenVINOIn this section, we will see how the RL Coach and StarCraft 2 environment will work together for the optimization process
Let us start with the A3C algorithm option
We will pass the following command for training and saving the mode(We are saving the model only so that the metafile is generated and is accessed for the model optimization process as well as dumping gif files for the training process).
coach -r -p Starcraft_CollectMinerals_A3C -s 300 -dg
The below content is obtained in the terminal window.
Creating graph - name: BasicRLGraphManager
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/abhi/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 17180 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-jn9_1iuh/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50'
Starting up...
Startup Phase 1 complete
crStartup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:17180 (17180)
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface enabled
Configure: score interface enabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
We see in the folder that the checkpoint files are saved according to the progress.
For the model optimization process, we will again go to the deployment folder of OpenVINO and use the mo_tf.py for creating the XML and the bin file.
python mo_tf.py --input_meta_graph ~/experiments/16_01_2019-22_27/checkpoint/40_Step-40715Model Optimizer arguments:Common parameters:- Path to the Input Model: None- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/.- IR output name: 40_Step-407151.ckpt- Log level: SUCCESS- Batch: Not specified, inherited from the model- Input layers: Not specified, inherited from the model- Output layers: Not specified, inherited from the model- Input shapes: Not specified, inherited from the model- Mean values: Not specified- Scale values: Not specified- Scale factor: Not specified- Precision of IR: FP32- Enable fusing: True- Enable grouped convolutions fusing: True- Move mean values to preprocess section: False- Reverse input channels: FalseTensorFlow specific parameters:- Input model in text protobuf format: False- Offload unsupported operations: False- Path to model dump for TensorBoard: None- List of shared libraries with TensorFlow custom layers implementation: None- Update the configuration file with input/output node names: None- Use configuration file used to generate the model with Object Detection API: None- Operations to offload: None- Patterns to offload: None- Use the config file: NoneModel Optimizer version: 1.4.292.6ef7232d
The XML and the and bin file generated is being used for the inference part for OpenVINO.
We have successfully created the necessary files needed to implement in the inference part.
Inferring using our model
As we have generated the XML and the bin for final inference we have to share them with parameter –m the path to the XML bin file generated as well as using the Algorithm for the Reinforcement learning approach using –i option so that we can run the simulation with best-performing checkpoints from the Reinforcement Learning coach that is generated with build target setup for CPU.
./rl_coach -m <xmlbin path> -i <algorithm> -d CPU
./rl_coach -m 0060.xml -i A3C -d CPU
As we run the inference we will be able to pull up the best possible result for the StarCraft 2 game.We are saving a gif for each result that we get so in this case the best result of all the gifs file is shown.
The gifs below show the progress before training and after training.
Now we will look at additional features of RL Coach known as dashboard.
Debugging RL algorithms is a tough process. But RL Coach comes in with a built-in tool named dashboard that helps visualize the training signals. The important point on using the dashboard is that it is dynamically updated during Agent training.
It also allows comparing signals such as overlaying one on another.
Let us see how the dashboard works
We can also open up dashboard files directly from the terminal using the following command.
dashboard -f ~/experiments/Cartpole_A3C/21_05_2019-00_05/worker_0.simple_rl_graph.main_level.main_level.agent_0.csv
It directly opens the CSV in GUI mode.
When have trained or it is an ongoing process for the environment a CSV and JSON file is saved in the experiments folder.
When we initialize the RL dashboard using the dashboard command in the terminal we first have to select the CSV file for more visualization and how the algorithm works
We connect to the anaconda environment where we have installed the couch and put the command as discussed above.
As soon the coach command start working the tool will open in a browser.
We will go to the experiments folder and select the CSV file as shown in the figure.
After that, we will be in the interactive dashboard window.
The dashboard is internally built with Bokeh interactive visualization library.
We can select from data options to see the training signals.
We can also compare two signals while using the control key from the keyboard and selecting more than one signal with it.
In the second part of the article, we have covered how StarCraft 2 works
We have seen how StarCraft 2 environment is initialized for the training process
We have touched on some interesting algorithms for Deepmind.
We then installed Starcraft 2 AI client
We configured the Starcraft 2 client to work with RL Coach.
We initiated the training process with checkpoint files as well as gif-files generated together at periodic intervals
The generated checkpoint files which were later optimized with Intel OpenVINO Toolkit.
For inferencing, we passed on the XML file and the latest simulation is being shown.
Finally, we touched on the dashboard which is a visual tool for the coach. We touched upon important features of it.
This article guides us onhow to get started with StarCraft 2 environment with RL Coach and OpenVINO.
Comments