--- license: mit language: en tags: - nethack - reinforcement-learning - variational-autoencoder - representation-learning - multimodal - world-modeling pipeline_tag: feature-extraction --- # MultiModalHackVAE A multi-modal Variational Autoencoder trained on NetHack game states for representation learning. ## Model Description This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing: - Game character grids (21x79) - Color information - Game statistics (blstats) - Message text - Bag of glyphs - Hero information (role, race, gender, alignment) ## Model Details - **Model Type**: Multi-modal Variational Autoencoder - **Framework**: PyTorch - **Dataset**: NetHack Learning Dataset - **Latent Dimensions**: 96 - **Low-rank Dimensions**: 0 ## Usage ```python from train import load_model_from_huggingface import torch # Load the model model, config = load_model_from_huggingface("CatkinChen/nethack-vae-hmm") # Example usage with synthetic data batch_size = 1 game_chars = torch.randint(32, 127, (batch_size, 21, 79)) game_colors = torch.randint(0, 16, (batch_size, 21, 79)) blstats = torch.randn(batch_size, 27) msg_tokens = torch.randint(0, 128, (batch_size, 256)) hero_info = torch.randint(0, 10, (batch_size, 4)) with torch.no_grad(): output = model( glyph_chars=game_chars, glyph_colors=game_colors, blstats=blstats, msg_tokens=msg_tokens, hero_info=hero_info ) latent_mean = output['mu'] latent_logvar = output['logvar'] lowrank_factors = output['lowrank_factors'] ``` ## Training This model was trained using adaptive loss weighting with: - Embedding warm-up for quick convergence - Gradual raw reconstruction focus - KL beta annealing for better latent structure ## Citation If you use this model, please consider citing: ```bibtex @misc{nethack-vae, title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack}, author={Xu Chen}, year={2025}, url={https://huggingface.co/CatkinChen/nethack-vae-hmm} } ```