AlphaCheckers-Zero
A implementation of the AlphaZero algorithm for Checkers (Brazilian/International rules), featuring a specialized Battle Arena against Large Language Models (LLMs) and classical Minimax algorithms.
About the Project
This repository contains a Deep Reinforcement Learning engine built from scratch using PyTorch and Monte Carlo Tree Search (MCTS). The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.
The project goes beyond standard implementation by including a unique Arena Mode, where the AlphaZero agent is benchmarked against:
- Classical Algorithms: Minimax with Alpha-Beta Pruning.
- Generative AI: State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.
Benchmarks & Performance
The trained model (checkers_master_final.pth) was subjected to rigorous testing.
| Opponent | Type | Result | Notes |
|---|---|---|---|
| Human Player | Biological | โ Win | Surpassed the creator. |
| Llama 3.3 70b | LLM (Groq) | โ Win | Exploited the LLM's lack of spatial board consistency. |
| Llama-4-maverick-17b-128e | LLM | โ Win | Consistent tactical superiority. |
| Kimi k2 | LLM | โ Win | The LLM failed to maintain long-term strategy. |
| Minimax (Depth 8) | Classical Algo | ๐ค Draw | Crucial Result: Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |
Technical Architecture
- Neural Network: A ResNet-like architecture with a dual head:
- Policy Head: Outputs move probabilities ($p$).
- Value Head: Estimates the win probability ($v$) of the current state.
- Inference: Uses MCTS guided by the neural network to simulate future outcomes.
- Training: Continuous self-play loops with Replay Buffer and data augmentation.
Project Structure
Please rename the source files to match the structure below for better organization:
AlphaCheckerTrainer.pyMain training loop, MCTS logic, and Network Architecture.eval.py: Interface to play against the AI locally.evalLLM.py: Script to battle against LLMs using Groq API.evalminimax.py: Script to battle against the Minimax algorithm.checkers_master_final.pth: The AlphaChecker Weight Trained
Testing
https://huggingface.co/spaces/Madras1/AlphaCherckerZero
Getting Started
Developed by Gabriel Yogi. This project is for research purposes in the field of Reinforcement Learning.