metadata
license: mit
tags:
- chess
- transformer
- reinforcement-learning
- game-playing
library_name: pytorch
ChessFormer-SL
ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
Model Description
- Model type: Transformer for chess position evaluation and move prediction
- Language(s): Chess (FEN notation)
- License: MIT
- Parameters: 100.7M
Architecture
ChessFormer uses a custom transformer architecture optimized for chess:
- Blocks: 20 transformer layers
- Hidden size: 640
- Attention heads: 8
- Intermediate size: 1728
- Features: RMSNorm, SwiGLU activation, custom FEN tokenizer
Input Format
The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
- 64 board square tokens (pieces + positional embeddings)
- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
- 2 special tokens (action, value)
Output Format
- Policy head: Logits over 1,969 structurally valid chess moves
- Value head: Position evaluation from current player's perspective
Training Details
Training Data
- Dataset:
kaupane/lichess-2023-01-stockfish-annotated(depth18 split) - Size: 56M positions with Stockfish evaluations
- Validation: depth27 split
Training Procedure
- Method: Supervised learning on Stockfish move recommendations and evaluations
- Objective: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
- Hardware: RTX 4060Ti 16GB
- Duration: ~2 weeks
- Checkpoints: 20 total, this model is the final checkpoint
Training Metrics
- Action Loss: 1.6985
- Value Loss: 0.0407
- Invalid Loss: 0.0303
Performance
Capabilities
- β Reasonable opening and endgame play
- β Fast inference without search
- β Better than next-token prediction chess models
- β Can defeat Stockfish occasionally with search enhancement
Limitations
- β Frequent tactical blunders in midgame
- β Estimated ELO ~1500 (informal assessment)
- β Struggles with complex tactical combinations
- β Tends to give away pieces ("free captures")
Usage
Installation
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
Basic Usage
import torch
from model import ChessFormerModel
# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()
# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])
with torch.no_grad():
move_logits, position_value = model(fens, repetitions)
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")
With Chess Engine Interface
from engine import Engine, ChessformerConfig
import chess
# Create engine
config = ChessformerConfig(
chessformer=model,
temperature=0.5,
depth=2 # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)
# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")
Limitations and Bias
Technical Limitations
- Tactical weakness: Prone to hanging pieces and missing simple tactics
- Computational inefficiency: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
Potential Biases
- Trained exclusively on Stockfish evaluations, may inherit engine biases
- May not generalize to unconventional openings or endgames
Known Issues
- Piece embeddings have consistently lower norms than positional embeddings
- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
- Performance degrades without search enhancement
Ethical Considerations
This model is intended for:
- β Educational purposes and chess learning
- β Research into neural chess architectures
- β Developing chess training tools
Not recommended for:
- β Competitive chess tournaments
- β Production chess engines without extensive testing
- β Applications requiring reliable tactical calculation
Additional Information
- Repository: GitHub link
- Demo: HuggingFace Space Demo
- Related: ChessFormer-RL (RL training experiment)