DQN Agent for French Solitaire (7×7) — Single-Solution Deterministic Policy
Model Description
This is a Deep Q-Network (DQN) agent trained to solve the French Solitaire puzzle (Peg Solitaire, 7×7 European variant) via a single deterministic trajectory learned during training. The published checkpoint represents a policy that, when evaluated in greedy mode (ε = 0), follows a canonical route to victory (32 → 1 peg in the center). It does not attempt to enumerate or diversify multiple winning solutions.
Game Rules
- Board: 7×7 grid with 32 valid positions (European cross shape)
- Initial state: All positions filled except the center (3,3)
- Objective: Jump pegs over adjacent pegs to remove them, leaving only 1 peg in the center
- Valid move: Jump horizontally or vertically over an adjacent peg into an empty space
Initial board:
O O O
O O O
O O O O O O O
O O O . O O O ← Center empty
O O O O O O O
O O O
O O O
Goal:
. . .
. . .
. . . . . . .
. . . O . . . ← One peg in center
. . . . . . .
. . .
. . .
Model Architecture
- Algorithm: Double DQN with Experience Replay and Target Network
- Network: 3-layer fully connected neural network
- Input: 49-dimensional state (7×7 board flattened)
- Hidden layers: 128 neurons each with ReLU activation
- Output: 100 Q-values (action space)
- Framework: PyTorch 2.x
- Action masking: Only valid moves are considered during action selection
Hyperparameters
- Learning rate: 5e-4
- Gamma (discount factor): 0.99
- Epsilon decay: 0.995 (start: 1.0, end: 0.01)
- Batch size: 64
- Replay buffer size: 10,000
- Target network update frequency: 100 steps
Reward Function
- +100: Victory (1 peg in center)
- +50: 1 peg remaining (but not in center)
- +1: Progress (peg removed)
- -10: Invalid move
- -50: No valid moves left (defeat)
Training Details
- Episodes: 10,000
- Training time: ~20 minutes on NVIDIA GPU (CUDA 12.1)
- Win rate: 100.0% (1 peg remaining)
- Center win rate: 100.0% (perfect victories)
- Average pegs remaining: 1.0
Training was logged with MLflow and tracked in ./mlruns.
Determinism / Single-Solution Note
During evaluation we force epsilon = 0.0, yielding a greedy policy. Given fixed weights and the initial board, action selection is deterministic (tie-breaking handled by argmax). If you need multiple trajectories or stochastic solution sampling, you should train or evaluate with a modified script (e.g. softmax over Q-values or ε > 0) — not included in this single-solution release.
To keep the repository focused, tags include single-solution and deterministic.
Usage
Quick Start (Standalone from Hugging Face Hub)
# 1. Install Miniconda (if not already installed)
# Download from: https://docs.conda.io/en/latest/miniconda.html
# 2. Download model and code from Hugging Face Hub
hf download emiliodavola/french-solitaire-dqn-single-solution --local-dir ./french-solitaire-model
cd french-solitaire-model
# 3. Create conda environment from included environment.yml
conda env create -f environment.yml
conda activate french-solitaire
# 4. Evaluate model (100 episodes, no rendering)
python code/eval.py --checkpoint pytorch_model.pt --episodes 100
# 5. With visual rendering (shows all steps)
python code/eval.py --checkpoint pytorch_model.pt --episodes 1 --render
Note: Execute from the repository root. The script code/eval.py will correctly find pytorch_model.pt in the same root directory. The environment.yml file is included in the Hugging Face repo and installs all dependencies automatically (PyTorch with CUDA 12.1, Gymnasium, NumPy, etc.).
Alternative: Clone from GitHub (for development)
# 1. Clone the full repository with all development tools
git clone https://github.com/emiliodavola/french-solitaire.git
cd french-solitaire
# 2. Create conda environment from environment.yml (recommended for full dev setup)
conda env create -f environment.yml
conda activate french-solitaire
# 3. Download checkpoint from Hugging Face
hf download emiliodavola/french-solitaire-dqn-single-solution pytorch_model.pt --local-dir ./checkpoints
# 4. Run evaluation with visual rendering (shows all steps)
python eval.py --checkpoint checkpoints/pytorch_model.pt --episodes 1 --render
Load in Python (minimal example)
import torch
import numpy as np
from huggingface_hub import hf_hub_download
import sys
from pathlib import Path
# Download model from Hugging Face Hub
model_path = hf_hub_download(
repo_id="emiliodavola/french-solitaire-dqn-single-solution",
filename="pytorch_model.pt"
)
# Download code directory (environment and agent)
code_dir = Path(model_path).parent / "code"
if not code_dir.exists():
# If code/ doesn't exist, download the full repo
from huggingface_hub import snapshot_download
repo_path = snapshot_download(repo_id="emiliodavola/french-solitaire-dqn-single-solution")
code_dir = Path(repo_path) / "code"
# Add code directory to Python path
sys.path.insert(0, str(code_dir))
# Import from downloaded code
from envs.french_solitaire_env import FrenchSolitaireEnv
from agent.dqn import DQNAgent
# Create environment and agent
env = FrenchSolitaireEnv()
agent = DQNAgent(state_dim=49, action_dim=100)
# Load checkpoint
agent.load(model_path, load_optimizer=False)
agent.epsilon = 0.0 # Greedy (no exploration)
# Play one episode
state, info = env.reset()
done = False
truncated = False
while not (done or truncated):
mask = info.get("action_mask")
action = agent.select_action(state, action_mask=mask, training=False)
state, reward, done, truncated, info = env.step(action)
print(f"Pegs remaining: {info['pegs_remaining']}")
print(f"Victory: {info.get('center_win', False)}")
Performance
| Metric | Value |
|---|---|
| Win rate (1 peg) | 100.0% |
| Center win rate (perfect) | 100.0% |
| Avg. reward per episode | 130.0 |
| Avg. pegs remaining | 1.0 |
| Avg. steps per episode | 31.0 |
Limitations
This release purposely reflects a single deterministic solution path:
- Trained specifically for the 7×7 French Solitaire variant.
- Does not expose multiple diverse winning trajectories.
- Action space is fixed at 100 pre-computed geometric moves.
- Performance metrics reported here correspond to greedy (ε=0) evaluation only.
Future Improvements / Multi-Solution Roadmap
Potential extensions (not part of this checkpoint):
- Stochastic evaluation (softmax over Q or ε-sampling) to collect multiple solution paths.
- Exact DFS solver for enumerating all distinct winning trajectories (with symmetry pruning).
- Dueling DQN architecture.
- Prioritized Experience Replay.
- Curriculum learning.
- Board symmetry data augmentation.
- Extended training (50k–100k episodes) and hyperparameter tuning (Ray Tune).
Citation
If you use this model or code, please cite:
@misc{french-solitaire-dqn-single-solution,
author = {Emilio Davola},
title = {DQN Agent for French Solitaire - Single-Solution Deterministic Policy},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\url{https://huggingface.co/emiliodavola/french-solitaire-dqn-single-solution}}
}
License
MIT License - See LICENSE for details.
References
- Playing Atari with Deep Reinforcement Learning (Mnih et al., 2013)
- Human-level control through deep reinforcement learning (Mnih et al., 2015)
- Deep Reinforcement Learning with Double Q-learning (van Hasselt et al., 2015)
Repository
Full code and training scripts: https://github.com/emiliodavola/french-solitaire