6th November 2023

nanoPPO v0.15 Release, bringing significant enhancements to the Proximal Policy Optimization (PPO) algorithm tailored for reinforcement learning tasks.

What's New in v0.15?

  • Actor/Critic Causal Attention Policy: A new policy framework to enhance decision-making processes.
  • Custom Learning Rate Scheduler: Introducing a version number and a custom scheduler for fine-tuning the learning rate during agent training.
  • Gradient and Weight Inf/Nan Checks: Added safeguards against infinite and NaN values in gradients and weights to improve stability.
  • Enhanced Training Mechanism: The training script now utilizes average rewards and includes a new cosine learning rate scheduler for iterative adjustment.

Additional Improvements:

  • Debug flag for NAN detection in model parameters.
  • Use of torch.nn.utils.clip_grad_norm_ for gradient clipping.

Documentation:

For a full overview of the new features and improvements, please refer to the GitHub README and the detailed Changelog.