arXiv:2410.15460
MZ
Shahradmz
·
AI & ML interests
LLMs, Graph Learning, Temporal Graph Learning, RL, Continual RL, Optimization
Organizations
models
115
Shahradmz/Qwen2.5-0.5B-Instruct_cppo-reward_REWARD_1
0.5B
•
Updated
Shahradmz/Qwen2.5-0.5B-Instruct_cppo-reward_REWARD_0
0.5B
•
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_CPPO_1
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_CPPO_0
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_PPO_1
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_PPO_0
Updated
Shahradmz/Qwen2-1.5B-Instruct_cppo-reward_REWARD_0
2B
•
Updated
Shahradmz/Qwen2-1.5B-Instruct_cppo-reward_REWARD_1
Updated
Shahradmz/Qwen2-0.5B-Reward_debug_mas
Text Classification
•
0.5B
•
Updated
•
1
Shahradmz/Qwen2-0.5B-Reward
Updated
datasets
12
Shahradmz/education_qna_hinted_qwen05
Viewer
•
Updated
•
1
•
3
Shahradmz/education_qna_hinted
Viewer
•
Updated
•
1
•
3
Shahradmz/education_summary_expert
Viewer
•
Updated
•
1
•
3
Shahradmz/education_qna_hinted_static
Viewer
•
Updated
•
1
•
3
Shahradmz/cppo_continual_dataset_rl_others
Viewer
•
Updated
•
75.7k
•
5
Shahradmz/cppo_continual_dataset_rl_relationships
Viewer
•
Updated
•
93.9k
•
12
Shahradmz/cppo_continual_dataset_reward_others
Viewer
•
Updated
•
78.5k
•
11
Shahradmz/cppo_continual_dataset_reward_relationships
Viewer
•
Updated
•
97.4k
•
3
Shahradmz/ca_constitution_1
Viewer
•
Updated
•
33.7k
•
6
Shahradmz/ca_constitution_2
Viewer
•
Updated
•
35.8k
•
4