File size: 4,736 Bytes
2a42699
cf347c8
2a42699
cf347c8
 
 
 
 
2a42699
 
cf347c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
license: mit
tags:
- chess
- transformer
- reinforcement-learning
- game-playing
library_name: pytorch
---

# ChessFormer-SL

ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.

## Model Description

- **Model type**: Transformer for chess position evaluation and move prediction
- **Language(s)**: Chess (FEN notation)
- **License**: MIT
- **Parameters**: 100.7M

## Architecture

ChessFormer uses a custom transformer architecture optimized for chess:

- **Blocks**: 20 transformer layers
- **Hidden size**: 640
- **Attention heads**: 8  
- **Intermediate size**: 1728
- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer

### Input Format

The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:

- 64 board square tokens (pieces + positional embeddings)
- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
- 2 special tokens (action, value)

### Output Format

- **Policy head**: Logits over 1,969 structurally valid chess moves
- **Value head**: Position evaluation from current player's perspective

## Training Details

### Training Data

- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
- **Size**: 56M positions with Stockfish evaluations
- **Validation**: depth27 split

### Training Procedure

- **Method**: Supervised learning on Stockfish move recommendations and evaluations
- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
- **Hardware**: RTX 4060Ti 16GB
- **Duration**: ~2 weeks
- **Checkpoints**: 20 total, this model is the final checkpoint

### Training Metrics

- **Action Loss**: 1.6985
- **Value Loss**: 0.0407
- **Invalid Loss**: 0.0303

## Performance

### Capabilities

- βœ… Reasonable opening and endgame play
- βœ… Fast inference without search
- βœ… Better than next-token prediction chess models
- βœ… Can defeat Stockfish occasionally with search enhancement

### Limitations  

- ❌ Frequent tactical blunders in midgame
- ❌ Estimated ELO ~1500 (informal assessment)
- ❌ Struggles with complex tactical combinations
- ❌ Tends to give away pieces ("free captures")

## Usage

### Installation

```bash
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
```

### Basic Usage

```python
import torch
from model import ChessFormerModel

# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()

# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])

with torch.no_grad():
    move_logits, position_value = model(fens, repetitions)
    
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")
```

### With Chess Engine Interface

```python
from engine import Engine, ChessformerConfig
import chess

# Create engine
config = ChessformerConfig(
    chessformer=model,
    temperature=0.5,
    depth=2  # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)

# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")
```

## Limitations and Bias

### Technical Limitations

- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical

### Potential Biases

- Trained exclusively on Stockfish evaluations, may inherit engine biases
- May not generalize to unconventional openings or endgames

### Known Issues

- Piece embeddings have consistently lower norms than positional embeddings
- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
- Performance degrades without search enhancement

## Ethical Considerations

This model is intended for:

- βœ… Educational purposes and chess learning
- βœ… Research into neural chess architectures
- βœ… Developing chess training tools

Not recommended for:

- ❌ Competitive chess tournaments
- ❌ Production chess engines without extensive testing
- ❌ Applications requiring reliable tactical calculation

## Additional Information

- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)