File size: 2,063 Bytes
13bd824
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ba9d36
13bd824
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: mit
language: en
tags:
- nethack
- reinforcement-learning
- variational-autoencoder
- representation-learning
- multimodal
- world-modeling
pipeline_tag: feature-extraction
---

# MultiModalHackVAE

A multi-modal Variational Autoencoder trained on NetHack game states for representation learning.

## Model Description

This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing:
- Game character grids (21x79)
- Color information
- Game statistics (blstats)
- Message text
- Bag of glyphs
- Hero information (role, race, gender, alignment)

## Model Details

- **Model Type**: Multi-modal Variational Autoencoder
- **Framework**: PyTorch
- **Dataset**: NetHack Learning Dataset
- **Latent Dimensions**: 96
- **Low-rank Dimensions**: 0

## Usage

```python
from train import load_model_from_huggingface
import torch

# Load the model
model, config = load_model_from_huggingface("CatkinChen/nethack-vae-hmm")

# Example usage with synthetic data
batch_size = 1
game_chars = torch.randint(32, 127, (batch_size, 21, 79))
game_colors = torch.randint(0, 16, (batch_size, 21, 79))
blstats = torch.randn(batch_size, 27)
msg_tokens = torch.randint(0, 128, (batch_size, 256))
hero_info = torch.randint(0, 10, (batch_size, 4))

with torch.no_grad():
    output = model(
        glyph_chars=game_chars,
        glyph_colors=game_colors,
        blstats=blstats,
        msg_tokens=msg_tokens,
        hero_info=hero_info
    )
    latent_mean = output['mu']
    latent_logvar = output['logvar']
    lowrank_factors = output['lowrank_factors']
```

## Training

This model was trained using adaptive loss weighting with:
- Embedding warm-up for quick convergence
- Gradual raw reconstruction focus
- KL beta annealing for better latent structure

## Citation

If you use this model, please consider citing:

```bibtex
@misc{nethack-vae,
  title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack},
  author={Xu Chen},
  year={2025},
  url={https://huggingface.co/CatkinChen/nethack-vae-hmm}
}
```