|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
tags: |
|
|
- nethack |
|
|
- reinforcement-learning |
|
|
- variational-autoencoder |
|
|
- representation-learning |
|
|
- multimodal |
|
|
- world-modeling |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# MultiModalHackVAE |
|
|
|
|
|
A multi-modal Variational Autoencoder trained on NetHack game states for representation learning. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing: |
|
|
- Game character grids (21x79) |
|
|
- Color information |
|
|
- Game statistics (blstats) |
|
|
- Message text |
|
|
- Bag of glyphs |
|
|
- Hero information (role, race, gender, alignment) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: Multi-modal Variational Autoencoder |
|
|
- **Framework**: PyTorch |
|
|
- **Dataset**: NetHack Learning Dataset |
|
|
- **Latent Dimensions**: 96 |
|
|
- **Low-rank Dimensions**: 0 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from train import load_model_from_huggingface |
|
|
import torch |
|
|
|
|
|
# Load the model |
|
|
model, config = load_model_from_huggingface("CatkinChen/nethack-vae-hmm") |
|
|
|
|
|
# Example usage with synthetic data |
|
|
batch_size = 1 |
|
|
game_chars = torch.randint(32, 127, (batch_size, 21, 79)) |
|
|
game_colors = torch.randint(0, 16, (batch_size, 21, 79)) |
|
|
blstats = torch.randn(batch_size, 27) |
|
|
msg_tokens = torch.randint(0, 128, (batch_size, 256)) |
|
|
hero_info = torch.randint(0, 10, (batch_size, 4)) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model( |
|
|
glyph_chars=game_chars, |
|
|
glyph_colors=game_colors, |
|
|
blstats=blstats, |
|
|
msg_tokens=msg_tokens, |
|
|
hero_info=hero_info |
|
|
) |
|
|
latent_mean = output['mu'] |
|
|
latent_logvar = output['logvar'] |
|
|
lowrank_factors = output['lowrank_factors'] |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
This model was trained using adaptive loss weighting with: |
|
|
- Embedding warm-up for quick convergence |
|
|
- Gradual raw reconstruction focus |
|
|
- KL beta annealing for better latent structure |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please consider citing: |
|
|
|
|
|
```bibtex |
|
|
@misc{nethack-vae, |
|
|
title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack}, |
|
|
author={Xu Chen}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/CatkinChen/nethack-vae-hmm} |
|
|
} |
|
|
``` |
|
|
|