--- license: mit tags: - chess - transformer - reinforcement-learning - game-playing library_name: pytorch --- # ChessFormer-SL ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks. ## Model Description - **Model type**: Transformer for chess position evaluation and move prediction - **Language(s)**: Chess (FEN notation) - **License**: MIT - **Parameters**: 100.7M ## Architecture ChessFormer uses a custom transformer architecture optimized for chess: - **Blocks**: 20 transformer layers - **Hidden size**: 640 - **Attention heads**: 8 - **Intermediate size**: 1728 - **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer ### Input Format The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing: - 64 board square tokens (pieces + positional embeddings) - 9 metadata tokens (turn, castling, en passant, clocks, repetitions) - 2 special tokens (action, value) ### Output Format - **Policy head**: Logits over 1,969 structurally valid chess moves - **Value head**: Position evaluation from current player's perspective ## Training Details ### Training Data - **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split) - **Size**: 56M positions with Stockfish evaluations - **Validation**: depth27 split ### Training Procedure - **Method**: Supervised learning on Stockfish move recommendations and evaluations - **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty - **Hardware**: RTX 4060Ti 16GB - **Duration**: ~2 weeks - **Checkpoints**: 20 total, this model is the final checkpoint ### Training Metrics - **Action Loss**: 1.6985 - **Value Loss**: 0.0407 - **Invalid Loss**: 0.0303 ## Performance ### Capabilities - ✅ Reasonable opening and endgame play - ✅ Fast inference without search - ✅ Better than next-token prediction chess models - ✅ Can defeat Stockfish occasionally with search enhancement ### Limitations - ❌ Frequent tactical blunders in midgame - ❌ Estimated ELO ~1500 (informal assessment) - ❌ Struggles with complex tactical combinations - ❌ Tends to give away pieces ("free captures") ## Usage ### Installation ```bash pip install torch transformers huggingface_hub chess # Download model.py from this repository ``` ### Basic Usage ```python import torch from model import ChessFormerModel # Load model model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL") model.eval() # Analyze position fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"] repetitions = torch.tensor([1]) with torch.no_grad(): move_logits, position_value = model(fens, repetitions) # Get best move (requires additional processing for legal moves) print(f"Position value: {position_value.item():.3f}") ``` ### With Chess Engine Interface ```python from engine import Engine, ChessformerConfig import chess # Create engine config = ChessformerConfig( chessformer=model, temperature=0.5, depth=2 # Enable search enhancement ) engine = Engine(type="chessformer", chessformer_config=config) # Play move board = chess.Board() move_uci, value = engine.move(board) print(f"Suggested move: {move_uci}, Value: {value:.3f}") ``` ## Limitations and Bias ### Technical Limitations - **Tactical weakness**: Prone to hanging pieces and missing simple tactics - **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical ### Potential Biases - Trained exclusively on Stockfish evaluations, may inherit engine biases - May not generalize to unconventional openings or endgames ### Known Issues - Piece embeddings have consistently lower norms than positional embeddings - Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty - Performance degrades without search enhancement ## Ethical Considerations This model is intended for: - ✅ Educational purposes and chess learning - ✅ Research into neural chess architectures - ✅ Developing chess training tools Not recommended for: - ❌ Competitive chess tournaments - ❌ Production chess engines without extensive testing - ❌ Applications requiring reliable tactical calculation ## Additional Information - **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer) - **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo) - **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)