Model Card for MergeVLA-LIBERO

MergeVLA β€” Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.

Model Details

Each uploaded model is a 0.68B-parameter VLA model (excluding the vision backbone) composed of:

  • Qwen2.5-0.5B as the Vision-Language Model (VLM)
  • A lightweight 0.18B Action Expert
  • A two-layer Proprioceptive Projector MLP

βœ”οΈ Performance (Success Rates on LIBERO)

Task Family Success Rate (%)
Spatial 98.0
Object 98.6
Goal 95.0
Long-10 95.0

🧠 Training Details

Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.

Category Value
LoRA Enabled (rank = 64)
Optimizer AdamW
Learning Rate 2e-4
Batch Size 8 (Γ—2 grad accumulation)
num_images_in_input 2

Training Steps

  • Spatial β€” 30,000
  • Object β€” 20,000
  • Goal β€” 30,000
  • Long-10 β€” 50,000

Citation instructions

@misc{fu2025mergevla,
      title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent}, 
      author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo},
      year={2025},
      eprint={2511.18810},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2511.18810}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FYX026/MergeVLA-LIBERO

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(501)
this model