Development blog for TORCS project.
by Jimi
Although we successfully trained our baseline PPO model to race, the process was painfully slow. Reaching 1 million training steps and a lap time of 1:41.66 took several hours. This is largely because PPO is a sample-hungry reinforcement learning algorithm; it needs a huge amount of driving experience to improve, and can only learn from its most recent experiences before discarding them.
To speed up the sample collection, I (Jimi) have branched off from the team to attempt to develop a parallel training environment.
Blog Overview:
This diagram shows the system at a high level.
So the system consists of:
At a high level, the system spins up multiple isolated TORCS instances, connects them all to a central algorithm, and manages their lifecycle.
The training script is the central brain of the system. Instead of watching over one car at a time, it uses a vectorised environment to oversee the behaviour of multiple cars at the same time.
This creates a parallel architecture where the main Python process holds the central AI. Whilst individual background workers talk to the different network ports of our Docker containers. These workers collect batches of driving data and send them all back to the central AI to update the global network weights simultaneously.
Since TORCS was not originally designed for Reinforcement Learning, we have had to implement suboptimal workarounds to turn it into an AI training ground. Most notably, having to kill the TORCS process when the car either crashes or runs out of time.
The main functionalities of the orchestration script are to manage the Docker containers’ lifecycle and reset individual TORCS instances when required by the training script.
If you try to open five instances of TORCS natively on your computer, they will be fighting for your system resources and network ports until they crash. By containerising the environment, we can spin up completely isolated TORCS instances which don’t know each other exist.
Inside every Docker container sits an independent installation of TORCS. As TORCS was originally built with visuals, running it in the container could be problematic. However, by launching it in “results mode”, we can bypass the graphical rendering pipeline entirely.
This forces the game to run “headless” with zero graphical output. Not only does this solve the problem of needing a physical monitor, but it also saves a massive amount of CPU overhead, allowing the physics engine to simulate the race as fast as the processor allows.
This is paralell architechture is still very much an experimental setup. While it successfully bypasses the speed bottleneck we currently face, it still needs more work to try to perfect the communication between the many TORCS instances and the training script. So this is what I will be working on in the future, alongside the development of a better driver.
tags: