A readable and well-documented implementation of the Proximal Policy Optimization (PPO-Clip) algorithm in PyTorch.
- Generalized Advantage Estimation (GAE): to reduce variance in policy updates.
- Vectorized Environments: Uses
gym.vector.SyncVectorEnvfor efficient, parallel data collection. - KL-Divergence Early Stopping: to help prevent destructive policy updates but get the most out of each rollout.
- Modern PyTorch Features: Leverages
torch.compilefor just-in-time compilation of the actor and critic networks. - Training Stability: Includes standard techniques like advantage normalization, gradient clipping and entropy regularization.
- Experiment Tracking: Logs key metrics to Weights and Biases
-
Install Python & PyTorch:
- python >= 3.7 (3.11 tested)
- pytorch >= 2.0 (2.6 tested)
-
System Dependencies (for video rendering):
sudo apt-get install -y xvfb ffmpeg
-
Python Dependencies:
pip install -r requirements.txt
To start training on the default HalfCheetah-v5 environment, run:
python ppo.py