Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_spell_backward_len_3_5_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 8 minutes ago
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_spell_backward_len_3_5_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 8 minutes ago
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_2048_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 3 days ago • 37
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_2048_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 3 days ago • 37
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_correct_run_1p0_0p0_1p0_grpo_42_uniform Updated 3 days ago
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 3 days ago • 57
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 3 days ago • 57
Kazuki1450/Qwen2.5-1.5B-Instruct_lightr1_stage1_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 4 days ago • 113
Kazuki1450/Qwen2.5-1.5B-Instruct_lightr1_stage1_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 4 days ago • 113
Kazuki1450/Qwen2.5-1.5B-Instruct_lightr1_stage1_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated 4 days ago • 113
Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_relerr0.1_0p0_1p0_1p0_grpo_42_targeted Text Generation • 2B • Updated 6 days ago • 14
Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_relerr0.1_1p0_0p5_1p0_grpo_42_targeted Text Generation • 2B • Updated 6 days ago • 10
Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_relerr0.1_1p0_0p5_1p0_grpo_42_targeted Text Generation • 2B • Updated 6 days ago • 10
Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_relerr0.1_0p0_1p0_1p0_grpo_42_targeted Text Generation • 2B • Updated 6 days ago • 14
Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_relerr0.1_1p0_1p0_1p0_grpo_42_targeted Text Generation • 2B • Updated 6 days ago • 14