"Ju, father of LSTM ̈ Rgen schmidhuber has built a worldview model for machines by learning from the model of human cognition of the world. This article takes you to read a new masterpiece by the father of LSTM. At the same time, it teaches you how to train an AI racer with a simple worldview.
Ju, father of LSTM ̈ Rgen schmidhuber makes another new work!
This time, he drew lessons from the model of human cognition of the world and built a worldview model for the machine.
Many evidences show that in order to deal with the massive information in daily life, the human brain has learned to abstract these spatiotemporal information. Thus, we can make rapid and accurate analysis in the face of complex information around us. The world we "see" at present is also affected by the brain's prediction of the future world.
For example, a baseball player can easily hit a baseball with a speed of 100 miles per hour, thanks to his brain's accurate judgment of the trajectory of the baseball.
So, can we let machines learn such a world view? What kind of capabilities will machines have when they have a world view?
Today, we will take you to read a new masterpiece by the father of LSTM. At the same time, it will also teach you to train an AI racer with a simple world view. Just try to know how powerful it is!
Ask questions
Let's explore this problem through a specific case: how to make machines have a world view?
Suppose we want to train an AI racer to be good at driving cars on a 2D track. An example is shown below.
At each time node, the AI racer will observe its surroundings (64) × 64 pixel color image), and then decide and perform operations - setting direction (- 1 to 1), acceleration (0 to 1), or braking (0 to 1). After it performs an operation, its environment returns the next observation. By analogy, this process is repeated.
Its goal is to finish the track in the shortest possible time.
Solution
We give a three part solution.
Variational self encoder (VAE)
When you make a decision while driving, you don't actively analyze every "pixel" in your view - on the contrary, your brain will condense visual information into a small number of "hidden" entities, such as the straightness of the road, the upcoming curve and your relative position in the road, so as to judge the next action you need to operate.
This is the essence of VAE - compressing 64x64x3 (RGB) input image into a feature vector (z) with a length of 32.
Thus, our AI racers can use less information to represent the surrounding environment, so as to improve learning efficiency.
Recurrent neural network (RNN)
AI racers without recurrent neural networks may drive like this...
Think back. When you drive, you actually make a continuous prediction of what may happen in the next second.
RNN can simulate this forward-looking thinking.
Similar to VAE, RNN attempts to capture the hidden characteristics of the current state of the vehicle in its environment, but this time the purpose is to predict the appearance of the next "Z" based on the previous "Z" and previous actions.
Controller
So far, we haven't mentioned anything about choosing actions. Because these choices are made by the controller.
The controller is a densely connected neural network. The inputs are a series of Z (current hidden state of VAE - length 32) and H (hidden state of RNN - length 256). Three output neurons correspond to three actions and are scaled to an appropriate range.
In order to understand the different roles of these three components and how they work together, we can imagine a dialogue between them:
World model architecture diagram
VAE: (pay attention to the latest observations of 64 * 64 * 3) it looks like a straight road, with a slight bend to the left in front and the car facing the road (z).
RNN: Based on this description (z) and the acceleration selected by the controller at the last time node (action), I will update my hidden state (H) to predict that the next observation result is still a straight road, but turn a little left.
Controller: Based on the description of VAE (z) and the current hidden state fed back by RNN (H), the next output action of my neural network is [0.34,0.8,0].
Then, this operation will be passed to the environment, which will return the updated observation results and restart the cycle.
Now, let's practice!
Here comes the implementation code
If you are using a high specification laptop, you can run this solution locally, but I suggest you run it on a more powerful computer on Google cloud computing platform, so as to complete it in a short time.
The following steps have been tested on Linux (Ubuntu 16.04) - on MAC or windows, you only need to change the commands related to the package installation.
Step 1: download the code
On the command line, enter the following:
git clone https://github.com/AppliedDataSciencePartners/WorldModels.git
Step 2: create a virtual environment
Create a python 3 virtual environment (virtualenv and virtualenvwrapper are used here):
sudo apt-get install python-pipsudo pip install virtualenvsudo pip install virtualenvwrapperexport WORKON_ HOME=~/.virtualenvssource /usr/local/bin/virtualenvwrapper.shmkvirtualenv --python=/usr/bin/python3 worldmodels
Step 3: install the package
sudo apt-get install cmake swig python3-dev zlib1g-dev python-openglmpich xvfb xserver-xephyr vnc4servercd WorldModelspip install -r requirements.txt
Step 4: generate random training data
For this traffic jam problem, both VAE and RNN can use randomly generated training data - that is, the observation data generated by randomly taking actions at each time node. In fact, we can use pseudo-random action to make the car accelerate away from the starting line in the initial state.
Since VAE and RNN are independent of the decision controller, we need to ensure that we encounter a variety of observations, choose different actions to deal with them, and save the results as training data.
To generate a random policy, run the following command from the command line:
python 01_ generate_ data.py car_ racing --total_ episodes 2000 –start_ batch 0 --time_ steps 300
If your server does not display results, you can run the following command:
xvfb-run -a -s ""-screen 0 1400x900x24"" python 01_ generate_ data.pycar_ racing --total_ episodes 2000 --start_ batch 0 --time_ steps 300
The above command will generate 2000 policies, which are saved in 200 batches (10 for each batch).
In the. / data folder, you will see the following files (* is the batch number):
obs_ data_*. NPY (this file stores 64 * 64 * 3 images as a numpy array)
action_ data_*. NPY (this file stores 3D actions)
Step 5: train VAE
Here we just need to use obs_ data_*. NPY can train VAE. Make sure you have completed step 4, otherwise the file is not in the. / data folder.
Run the following statement on the command line:
python 02_ train_ vae.py --start_ batch 0 --max_ batch 9 --new_ model
A new variational self encoder VAE will be trained in each batch of data from 0 to 9. The weights of the model are saved in. / VAE / weights.h5. "-- new_ The "model" parameter indicates that the model is trained from scratch.
If weights.h5 already exists in the folder, it does not declare "-- new"_ Model "parameter, the script will directly import the weights in this file and continue to train the existing model. In this way, you can realize the iterative training of the model without re running each batch of data.
The relevant parameters of VAE architecture are declared in. / VAE / arch.py file.
Step 6: generate RNN data of cyclic neural network
Now we can use the trained VAE model to generate the training set of RNN model.
The RNN model requires the image data (z) and action (a) encoded by VAE as inputs, and the image data encoded by VAE model before a time step as output.
Run this command to generate these data:
python 03_ generate_ rnn_ data.py --start_ batch 0 --max_ batch 9
This step requires the OBS of batches 0 to 9_ data_*. NPY and action_ data_*. The NPY file is converted to the format required for training in RNN.
These two sets of files are saved in. / data (* is the batch number)
rnn_ input_*. NPY (stored [Z A] concatenation vector)
rnn_ output_*. NPY (the Z vector of the previous time step is stored)
Step 7: train RNN model
Training RNN only requires RNN_ input_*. NPY and RNN_ output_*. NPY file. Make sure you have completed step 6, otherwise the file is not in the. / data folder.
Run on the command line:
python 04_ train_ rnn.py --start_ batch 0 --max_ batch 9 --new_ model
A new VAE will be trained in each batch of data from 0 to 9. The weights of the model are saved in. / RNN / weights. H5. "-- new_ "Model" means to train the model from scratch.
Similar to VAE training, if weights.h5 already exists in the folder, it does not declare "- new"_ With the "model" flag, the script will directly import the weights in the file and continue to train the existing model. In this way, you can realize the iterative training of RNN model without re running each batch of data.
The specific parameters of RNN cyclic neural network model are declared in. / RNN / arch.py file.
Step 8: train the controller
It's the most interesting part!
So far, we have built VAE model and RNN model using deep learning. VAE can reduce high-dimensional images to low-dimensional hidden data, and RNN is used to predict the temporal changes of data in hidden space. Because we can use randomly selected data to create training sets for each model, the model can achieve the desired effect.
In order to train the controller, we will adopt the reinforcement learning method, which uses an evolutionary algorithm called cma-es (adaptive covariance matrix evolutionary algorithm).
The input is a 288 (32 + 256) dimensional vector and the output is a 3-dimensional vector. Therefore, we need to train 288 * 3 + 1 (bias) = 867 parameters.
Cma-es algorithm first randomly generates copies of 867 parameters (i.e. a group), then tests each group member variable in the environment, and records its average score. Like the law of natural selection, the weight variable that produces the highest score allows it to continue "breeding" and produce the next generation.
Running the following code will start the process on your machine and select the appropriate value for the variable.
python 05_ train_ controller.py car_ racing --num_ worker 16 –num_ worker_ trial 4 --num_ episode 16 --max_ length 1000 --eval_ steps 25
Or run on the server without displaying the results:
xvfb-run -s ""-screen 0 1400x900x24"" python 05_ train_ controller.py car_ racing --num_ worker 16 --num_ worker_ trial 2 --num_ episode 4 –max_ length 1000 --eval_ steps 25
--num_ Worker 16: the number of workers should not exceed the number of available cores
--num_ work_ Trial 2: number of group members tested by each worker (Num)_ worker * num_ work_ Trial refers to the total size of each generation group)
--num_ Episode 4: the number of times each member of the group is scored (the score will be the average score of the score)
--max_ length 1000
Our other product: