CatkinChen
/

nethack-vae-hmm

Feature Extraction

MultiModalHackVAE_with_StickyHDPHMM

reinforcement-learning

variational-autoencoder

representation-learning

Model card Files Files and versions

CatkinChen commited on Sep 4

Commit

13bd824

·

verified ·

1 Parent(s): a37587d

Add model card

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+license: mit
+language: en
+tags:
+- nethack
+- reinforcement-learning
+- variational-autoencoder
+- representation-learning
+- multimodal
+- world-modeling
+pipeline_tag: feature-extraction
+---
+# MultiModalHackVAE
+A multi-modal Variational Autoencoder trained on NetHack game states for representation learning.
+## Model Description
+This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing:
+- Game character grids (21x79)
+- Color information
+- Game statistics (blstats)
+- Message text
+- Bag of glyphs
+- Hero information (role, race, gender, alignment)
+## Model Details
+- **Model Type**: Multi-modal Variational Autoencoder
+- **Framework**: PyTorch
+- **Dataset**: NetHack Learning Dataset
+- **Latent Dimensions**: 96
+- **Low-rank Dimensions**: 0
+## Usage
+```python
+from train import load_model_from_huggingface
+import torch
+# Load the model
+model = load_model_from_huggingface("CatkinChen/nethack-vae-hmm")
+# Example usage with synthetic data
+batch_size = 1
+game_chars = torch.randint(32, 127, (batch_size, 21, 79))
+game_colors = torch.randint(0, 16, (batch_size, 21, 79))
+blstats = torch.randn(batch_size, 27)
+msg_tokens = torch.randint(0, 128, (batch_size, 256))
+hero_info = torch.randint(0, 10, (batch_size, 4))
+with torch.no_grad():
+    output = model(
+        glyph_chars=game_chars,
+        glyph_colors=game_colors,
+        blstats=blstats,
+        msg_tokens=msg_tokens,
+        hero_info=hero_info
+    )
+    latent_mean = output['mu']
+    latent_logvar = output['logvar']
+    lowrank_factors = output['lowrank_factors']
+```
+## Training
+This model was trained using adaptive loss weighting with:
+- Embedding warm-up for quick convergence
+- Gradual raw reconstruction focus
+- KL beta annealing for better latent structure
+## Citation
+If you use this model, please consider citing:
+```bibtex
+@misc{nethack-vae,
+  title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack},
+  author={Xu Chen},
+  year={2025},
+  url={https://huggingface.co/CatkinChen/nethack-vae-hmm}
+}
+```