File size: 4,736 Bytes

2a42699
cf347c8
2a42699
cf347c8
 
 
 
 
2a42699
 
cf347c8

---
license: mit
tags:
- chess
- transformer
- reinforcement-learning
- game-playing
library_name: pytorch
---

# ChessFormer-SL

ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.

## Model Description

- **Model type**: Transformer for chess position evaluation and move prediction
- **Language(s)**: Chess (FEN notation)
- **License**: MIT
- **Parameters**: 100.7M

## Architecture

ChessFormer uses a custom transformer architecture optimized for chess:

- **Blocks**: 20 transformer layers
- **Hidden size**: 640
- **Attention heads**: 8  
- **Intermediate size**: 1728
- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer

### Input Format

The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:

- 64 board square tokens (pieces + positional embeddings)
- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
- 2 special tokens (action, value)

### Output Format

- **Policy head**: Logits over 1,969 structurally valid chess moves
- **Value head**: Position evaluation from current player's perspective

## Training Details

### Training Data

- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
- **Size**: 56M positions with Stockfish evaluations
- **Validation**: depth27 split

### Training Procedure

- **Method**: Supervised learning on Stockfish move recommendations and evaluations
- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
- **Hardware**: RTX 4060Ti 16GB
- **Duration**: ~2 weeks
- **Checkpoints**: 20 total, this model is the final checkpoint

### Training Metrics

- **Action Loss**: 1.6985
- **Value Loss**: 0.0407
- **Invalid Loss**: 0.0303

## Performance

### Capabilities

- ✅ Reasonable opening and endgame play
- ✅ Fast inference without search
- ✅ Better than next-token prediction chess models
- ✅ Can defeat Stockfish occasionally with search enhancement

### Limitations  

- ❌ Frequent tactical blunders in midgame
- ❌ Estimated ELO ~1500 (informal assessment)
- ❌ Struggles with complex tactical combinations
- ❌ Tends to give away pieces ("free captures")

## Usage

### Installation

```bash
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
```

### Basic Usage

```python
import torch
from model import ChessFormerModel

# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()

# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])

with torch.no_grad():
    move_logits, position_value = model(fens, repetitions)
    
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")
```

### With Chess Engine Interface

```python
from engine import Engine, ChessformerConfig
import chess

# Create engine
config = ChessformerConfig(
    chessformer=model,
    temperature=0.5,
    depth=2  # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)

# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")
```

## Limitations and Bias

### Technical Limitations

- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical

### Potential Biases

- Trained exclusively on Stockfish evaluations, may inherit engine biases
- May not generalize to unconventional openings or endgames

### Known Issues

- Piece embeddings have consistently lower norms than positional embeddings
- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
- Performance degrades without search enhancement

## Ethical Considerations

This model is intended for:

- ✅ Educational purposes and chess learning
- ✅ Research into neural chess architectures
- ✅ Developing chess training tools

Not recommended for:

- ❌ Competitive chess tournaments
- ❌ Production chess engines without extensive testing
- ❌ Applications requiring reliable tactical calculation

## Additional Information

- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)