23 April 2026

Week 7: The Parallel Pit Crew

by Jimi

Although we successfully trained our baseline PPO model to race, the process was painfully slow. Reaching 1 million training steps and a lap time of 1:41.66 took several hours. This is largely because PPO is a sample-hungry reinforcement learning algorithm; it needs a huge amount of driving experience to improve, and can only learn from its most recent experiences before discarding them.

To speed up the sample collection, I (Jimi) have branched off from the team to attempt to develop a parallel training environment.

Blog Overview:

System Overview
Training Script
Orchestrate Script
Docker Containers Running TORCS
Conclusion

System Overview

New Architecture Diagram This diagram shows the system at a high level.

So the system consists of:

A Training Script
A Orchestration Script
Docker Containers Running TORCS

At a high level, the system spins up multiple isolated TORCS instances, connects them all to a central algorithm, and manages their lifecycle.

Training Script

The training script is the central brain of the system. Instead of watching over one car at a time, it uses a vectorised environment to oversee the behaviour of multiple cars at the same time.

This creates a parallel architecture where the main Python process holds the central AI. Whilst individual background workers talk to the different network ports of our Docker containers. These workers collect batches of driving data and send them all back to the central AI to update the global network weights simultaneously.

Orchestration Script

Since TORCS was not originally designed for Reinforcement Learning, we have had to implement suboptimal workarounds to turn it into an AI training ground. Most notably, having to kill the TORCS process when the car either crashes or runs out of time.

The main functionalities of the orchestration script are to manage the Docker containers’ lifecycle and reset individual TORCS instances when required by the training script.

Docker Containers

If you try to open five instances of TORCS natively on your computer, they will be fighting for your system resources and network ports until they crash. By containerising the environment, we can spin up completely isolated TORCS instances which don’t know each other exist.

Inside every Docker container sits an independent installation of TORCS. As TORCS was originally built with visuals, running it in the container could be problematic. However, by launching it in “results mode”, we can bypass the graphical rendering pipeline entirely.

This forces the game to run “headless” with zero graphical output. Not only does this solve the problem of needing a physical monitor, but it also saves a massive amount of CPU overhead, allowing the physics engine to simulate the race as fast as the processor allows.

Conclusion

This is paralell architechture is still very much an experimental setup. While it successfully bypasses the speed bottleneck we currently face, it still needs more work to try to perfect the communication between the many TORCS instances and the training script. So this is what I will be working on in the future, alongside the development of a better driver.

tags: