DQN Agent for French Solitaire (7×7) — Single-Solution Deterministic Policy

Model Description

This is a Deep Q-Network (DQN) agent trained to solve the French Solitaire puzzle (Peg Solitaire, 7×7 European variant) via a single deterministic trajectory learned during training. The published checkpoint represents a policy that, when evaluated in greedy mode (ε = 0), follows a canonical route to victory (32 → 1 peg in the center). It does not attempt to enumerate or diversify multiple winning solutions.

Game Rules

  • Board: 7×7 grid with 32 valid positions (European cross shape)
  • Initial state: All positions filled except the center (3,3)
  • Objective: Jump pegs over adjacent pegs to remove them, leaving only 1 peg in the center
  • Valid move: Jump horizontally or vertically over an adjacent peg into an empty space
Initial board:
      O O O
      O O O
  O O O O O O O
  O O O . O O O  ← Center empty
  O O O O O O O
      O O O
      O O O

Goal:
      . . .
      . . .
  . . . . . . .
  . . . O . . .  ← One peg in center
  . . . . . . .
      . . .
      . . .

Model Architecture

  • Algorithm: Double DQN with Experience Replay and Target Network
  • Network: 3-layer fully connected neural network
    • Input: 49-dimensional state (7×7 board flattened)
    • Hidden layers: 128 neurons each with ReLU activation
    • Output: 100 Q-values (action space)
  • Framework: PyTorch 2.x
  • Action masking: Only valid moves are considered during action selection

Hyperparameters

  • Learning rate: 5e-4
  • Gamma (discount factor): 0.99
  • Epsilon decay: 0.995 (start: 1.0, end: 0.01)
  • Batch size: 64
  • Replay buffer size: 10,000
  • Target network update frequency: 100 steps

Reward Function

  • +100: Victory (1 peg in center)
  • +50: 1 peg remaining (but not in center)
  • +1: Progress (peg removed)
  • -10: Invalid move
  • -50: No valid moves left (defeat)

Training Details

  • Episodes: 10,000
  • Training time: ~20 minutes on NVIDIA GPU (CUDA 12.1)
  • Win rate: 100.0% (1 peg remaining)
  • Center win rate: 100.0% (perfect victories)
  • Average pegs remaining: 1.0

Training was logged with MLflow and tracked in ./mlruns.

Determinism / Single-Solution Note

During evaluation we force epsilon = 0.0, yielding a greedy policy. Given fixed weights and the initial board, action selection is deterministic (tie-breaking handled by argmax). If you need multiple trajectories or stochastic solution sampling, you should train or evaluate with a modified script (e.g. softmax over Q-values or ε > 0) — not included in this single-solution release.

To keep the repository focused, tags include single-solution and deterministic.

Usage

Quick Start (Standalone from Hugging Face Hub)

# 1. Install Miniconda (if not already installed)
# Download from: https://docs.conda.io/en/latest/miniconda.html

# 2. Download model and code from Hugging Face Hub
hf download emiliodavola/french-solitaire-dqn-single-solution --local-dir ./french-solitaire-model
cd french-solitaire-model

# 3. Create conda environment from included environment.yml
conda env create -f environment.yml
conda activate french-solitaire

# 4. Evaluate model (100 episodes, no rendering)
python code/eval.py --checkpoint pytorch_model.pt --episodes 100

# 5. With visual rendering (shows all steps)
python code/eval.py --checkpoint pytorch_model.pt --episodes 1 --render

Note: Execute from the repository root. The script code/eval.py will correctly find pytorch_model.pt in the same root directory. The environment.yml file is included in the Hugging Face repo and installs all dependencies automatically (PyTorch with CUDA 12.1, Gymnasium, NumPy, etc.).

Alternative: Clone from GitHub (for development)

# 1. Clone the full repository with all development tools
git clone https://github.com/emiliodavola/french-solitaire.git
cd french-solitaire

# 2. Create conda environment from environment.yml (recommended for full dev setup)
conda env create -f environment.yml
conda activate french-solitaire

# 3. Download checkpoint from Hugging Face
hf download emiliodavola/french-solitaire-dqn-single-solution pytorch_model.pt --local-dir ./checkpoints

# 4. Run evaluation with visual rendering (shows all steps)
python eval.py --checkpoint checkpoints/pytorch_model.pt --episodes 1 --render

Load in Python (minimal example)

import torch
import numpy as np
from huggingface_hub import hf_hub_download
import sys
from pathlib import Path

# Download model from Hugging Face Hub
model_path = hf_hub_download(
    repo_id="emiliodavola/french-solitaire-dqn-single-solution",
    filename="pytorch_model.pt"
)

# Download code directory (environment and agent)
code_dir = Path(model_path).parent / "code"
if not code_dir.exists():
    # If code/ doesn't exist, download the full repo
    from huggingface_hub import snapshot_download
    repo_path = snapshot_download(repo_id="emiliodavola/french-solitaire-dqn-single-solution")
    code_dir = Path(repo_path) / "code"

# Add code directory to Python path
sys.path.insert(0, str(code_dir))

# Import from downloaded code
from envs.french_solitaire_env import FrenchSolitaireEnv
from agent.dqn import DQNAgent

# Create environment and agent
env = FrenchSolitaireEnv()
agent = DQNAgent(state_dim=49, action_dim=100)

# Load checkpoint
agent.load(model_path, load_optimizer=False)
agent.epsilon = 0.0  # Greedy (no exploration)

# Play one episode
state, info = env.reset()
done = False
truncated = False

while not (done or truncated):
    mask = info.get("action_mask")
    action = agent.select_action(state, action_mask=mask, training=False)
    state, reward, done, truncated, info = env.step(action)

print(f"Pegs remaining: {info['pegs_remaining']}")
print(f"Victory: {info.get('center_win', False)}")

Performance

Metric Value
Win rate (1 peg) 100.0%
Center win rate (perfect) 100.0%
Avg. reward per episode 130.0
Avg. pegs remaining 1.0
Avg. steps per episode 31.0

Limitations

This release purposely reflects a single deterministic solution path:

  • Trained specifically for the 7×7 French Solitaire variant.
  • Does not expose multiple diverse winning trajectories.
  • Action space is fixed at 100 pre-computed geometric moves.
  • Performance metrics reported here correspond to greedy (ε=0) evaluation only.

Future Improvements / Multi-Solution Roadmap

Potential extensions (not part of this checkpoint):

  • Stochastic evaluation (softmax over Q or ε-sampling) to collect multiple solution paths.
  • Exact DFS solver for enumerating all distinct winning trajectories (with symmetry pruning).
  • Dueling DQN architecture.
  • Prioritized Experience Replay.
  • Curriculum learning.
  • Board symmetry data augmentation.
  • Extended training (50k–100k episodes) and hyperparameter tuning (Ray Tune).

Citation

If you use this model or code, please cite:

@misc{french-solitaire-dqn-single-solution,
  author = {Emilio Davola},
  title = {DQN Agent for French Solitaire - Single-Solution Deterministic Policy},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/emiliodavola/french-solitaire-dqn-single-solution}}
}

License

MIT License - See LICENSE for details.

References

Repository

Full code and training scripts: https://github.com/emiliodavola/french-solitaire

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading