rewardfm
/

ant-rfm-qwen-4gpu-bs64-pref-prog-2frames-uniform-20251216-003917

preference_comparisons

Model card Files Files and versions

rewardfm/ant-rfm-qwen-4gpu-bs64-pref-prog-2frames-uniform-20251216-003917

Model Details

Base Model: Qwen/Qwen3-VL-4B-Instruct
Model Type: qwen3_vl

Training Run

Wandb Run: ant_rfm_qwen_4gpu_bs64_pref_prog_2frames_uniform
Wandb ID: 987at1tm
Project: rfm
Notes: prog only training, uniform_sample strategy, 2 frames with absolute progress wrt total frames, all data

Citation

If you use this model, please cite:

Downloads last month: 135

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rewardfm/ant-rfm-qwen-4gpu-bs64-pref-prog-2frames-uniform-20251216-003917

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(137)

this model