File size: 4,736 Bytes
2a42699 cf347c8 2a42699 cf347c8 2a42699 cf347c8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
license: mit
tags:
- chess
- transformer
- reinforcement-learning
- game-playing
library_name: pytorch
---
# ChessFormer-SL
ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
## Model Description
- **Model type**: Transformer for chess position evaluation and move prediction
- **Language(s)**: Chess (FEN notation)
- **License**: MIT
- **Parameters**: 100.7M
## Architecture
ChessFormer uses a custom transformer architecture optimized for chess:
- **Blocks**: 20 transformer layers
- **Hidden size**: 640
- **Attention heads**: 8
- **Intermediate size**: 1728
- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
### Input Format
The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
- 64 board square tokens (pieces + positional embeddings)
- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
- 2 special tokens (action, value)
### Output Format
- **Policy head**: Logits over 1,969 structurally valid chess moves
- **Value head**: Position evaluation from current player's perspective
## Training Details
### Training Data
- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
- **Size**: 56M positions with Stockfish evaluations
- **Validation**: depth27 split
### Training Procedure
- **Method**: Supervised learning on Stockfish move recommendations and evaluations
- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
- **Hardware**: RTX 4060Ti 16GB
- **Duration**: ~2 weeks
- **Checkpoints**: 20 total, this model is the final checkpoint
### Training Metrics
- **Action Loss**: 1.6985
- **Value Loss**: 0.0407
- **Invalid Loss**: 0.0303
## Performance
### Capabilities
- β
Reasonable opening and endgame play
- β
Fast inference without search
- β
Better than next-token prediction chess models
- β
Can defeat Stockfish occasionally with search enhancement
### Limitations
- β Frequent tactical blunders in midgame
- β Estimated ELO ~1500 (informal assessment)
- β Struggles with complex tactical combinations
- β Tends to give away pieces ("free captures")
## Usage
### Installation
```bash
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
```
### Basic Usage
```python
import torch
from model import ChessFormerModel
# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()
# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])
with torch.no_grad():
move_logits, position_value = model(fens, repetitions)
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")
```
### With Chess Engine Interface
```python
from engine import Engine, ChessformerConfig
import chess
# Create engine
config = ChessformerConfig(
chessformer=model,
temperature=0.5,
depth=2 # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)
# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")
```
## Limitations and Bias
### Technical Limitations
- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
### Potential Biases
- Trained exclusively on Stockfish evaluations, may inherit engine biases
- May not generalize to unconventional openings or endgames
### Known Issues
- Piece embeddings have consistently lower norms than positional embeddings
- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
- Performance degrades without search enhancement
## Ethical Considerations
This model is intended for:
- β
Educational purposes and chess learning
- β
Research into neural chess architectures
- β
Developing chess training tools
Not recommended for:
- β Competitive chess tournaments
- β Production chess engines without extensive testing
- β Applications requiring reliable tactical calculation
## Additional Information
- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment) |