ChessFormer-SL / README.md
kaupane's picture
Update README.md
cf347c8 verified
metadata
license: mit
tags:
  - chess
  - transformer
  - reinforcement-learning
  - game-playing
library_name: pytorch

ChessFormer-SL

ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.

Model Description

  • Model type: Transformer for chess position evaluation and move prediction
  • Language(s): Chess (FEN notation)
  • License: MIT
  • Parameters: 100.7M

Architecture

ChessFormer uses a custom transformer architecture optimized for chess:

  • Blocks: 20 transformer layers
  • Hidden size: 640
  • Attention heads: 8
  • Intermediate size: 1728
  • Features: RMSNorm, SwiGLU activation, custom FEN tokenizer

Input Format

The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:

  • 64 board square tokens (pieces + positional embeddings)
  • 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
  • 2 special tokens (action, value)

Output Format

  • Policy head: Logits over 1,969 structurally valid chess moves
  • Value head: Position evaluation from current player's perspective

Training Details

Training Data

  • Dataset: kaupane/lichess-2023-01-stockfish-annotated (depth18 split)
  • Size: 56M positions with Stockfish evaluations
  • Validation: depth27 split

Training Procedure

  • Method: Supervised learning on Stockfish move recommendations and evaluations
  • Objective: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
  • Hardware: RTX 4060Ti 16GB
  • Duration: ~2 weeks
  • Checkpoints: 20 total, this model is the final checkpoint

Training Metrics

  • Action Loss: 1.6985
  • Value Loss: 0.0407
  • Invalid Loss: 0.0303

Performance

Capabilities

  • βœ… Reasonable opening and endgame play
  • βœ… Fast inference without search
  • βœ… Better than next-token prediction chess models
  • βœ… Can defeat Stockfish occasionally with search enhancement

Limitations

  • ❌ Frequent tactical blunders in midgame
  • ❌ Estimated ELO ~1500 (informal assessment)
  • ❌ Struggles with complex tactical combinations
  • ❌ Tends to give away pieces ("free captures")

Usage

Installation

pip install torch transformers huggingface_hub chess
# Download model.py from this repository

Basic Usage

import torch
from model import ChessFormerModel

# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()

# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])

with torch.no_grad():
    move_logits, position_value = model(fens, repetitions)
    
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")

With Chess Engine Interface

from engine import Engine, ChessformerConfig
import chess

# Create engine
config = ChessformerConfig(
    chessformer=model,
    temperature=0.5,
    depth=2  # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)

# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")

Limitations and Bias

Technical Limitations

  • Tactical weakness: Prone to hanging pieces and missing simple tactics
  • Computational inefficiency: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical

Potential Biases

  • Trained exclusively on Stockfish evaluations, may inherit engine biases
  • May not generalize to unconventional openings or endgames

Known Issues

  • Piece embeddings have consistently lower norms than positional embeddings
  • Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
  • Performance degrades without search enhancement

Ethical Considerations

This model is intended for:

  • βœ… Educational purposes and chess learning
  • βœ… Research into neural chess architectures
  • βœ… Developing chess training tools

Not recommended for:

  • ❌ Competitive chess tournaments
  • ❌ Production chess engines without extensive testing
  • ❌ Applications requiring reliable tactical calculation

Additional Information