caimz commited on
Commit
0ad0f1d
·
verified ·
1 Parent(s): 51745fd

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. 20250120235238/rank27.log +395 -0
  2. 20250120235238/rank30.log +395 -0
  3. 20250120235238/rank37.log +395 -0
  4. 20250120235238/rank50.log +395 -0
  5. 20250120235238/rank52.log +395 -0
  6. 20250121104251/rank12.log +294 -0
  7. 20250121104251/rank13.log +294 -0
  8. 20250121104251/rank15.log +294 -0
  9. 20250121104251/rank18.log +294 -0
  10. 20250121104251/rank19.log +294 -0
  11. 20250121104251/rank2.log +294 -0
  12. 20250121104251/rank20.log +294 -0
  13. 20250121104251/rank22.log +294 -0
  14. 20250121104251/rank26.log +294 -0
  15. 20250121104251/rank3.log +294 -0
  16. 20250121104251/rank32.log +294 -0
  17. 20250121104251/rank36.log +294 -0
  18. 20250121104251/rank39.log +294 -0
  19. 20250121104251/rank48.log +294 -0
  20. 20250121104251/rank53.log +294 -0
  21. 20250121104251/rank55.log +294 -0
  22. 20250121104251/rank58.log +294 -0
  23. 20250121104251/rank60.log +294 -0
  24. 20250121104251/rank63.log +294 -0
  25. 20250121165312/hf-593/generation_config.json +14 -0
  26. 20250121165312/hf-593/tokenizer_config.json +208 -0
  27. 20250121165312/rank11.log +0 -0
  28. 20250121165312/rank12.log +0 -0
  29. 20250121165312/rank15.log +0 -0
  30. 20250121165312/rank16.log +0 -0
  31. 20250121165312/rank19.log +0 -0
  32. 20250121165312/rank2.log +0 -0
  33. 20250121165312/rank20.log +0 -0
  34. 20250121165312/rank25.log +0 -0
  35. 20250121165312/rank26.log +0 -0
  36. 20250121165312/rank3.log +0 -0
  37. 20250121165312/rank30.log +0 -0
  38. 20250121165312/rank33.log +0 -0
  39. 20250121165312/rank38.log +0 -0
  40. 20250121165312/rank42.log +0 -0
  41. 20250121165312/rank43.log +0 -0
  42. 20250121165312/rank45.log +0 -0
  43. 20250121165312/rank48.log +0 -0
  44. 20250121165312/rank49.log +0 -0
  45. 20250121165312/rank5.log +0 -0
  46. 20250121165312/rank51.log +0 -0
  47. 20250121165312/rank55.log +0 -0
  48. 20250121165312/rank58.log +0 -0
  49. 20250121165312/rank6.log +0 -0
  50. 20250121165312/rank63.log +0 -0
20250120235238/rank27.log ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:52:42][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250120235238', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:52:42][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:53:37][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:54:31][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:55:25][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:56:18][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:57:14][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:58:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-20 23:59:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:00:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:00:05][INFO] [Dataset & Dataloader] Cost 443.15s
12
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:10:23][SUCCESS] [Parallelize LLM] Elapsed time 141.52 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:10:24][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:19:46][WARNING] [Step 0] The grad norm is NaN or Inf, skip this step. Skipped 1 steps in total.
257
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:19:46][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.258 loss(reduced): nan grad_norm: nan if_nan_skip: 1 max_memory: 33.1GB text_tokens: 31349.0 tgs: 57 data_time: 1.82s time: 547.03s eta: 3 days, 18:06:30
258
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:28:29][WARNING] [Step 1] The grad norm is NaN or Inf, skip this step. Skipped 2 steps in total.
259
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:28:29][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 2 max_memory: 33.1GB text_tokens: 32045.0 tgs: 61 data_time: 0.59s time: 523.25s eta: 3 days, 14:02:43
260
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:37:12][WARNING] [Step 2] The grad norm is NaN or Inf, skip this step. Skipped 3 steps in total.
261
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:37:12][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.267 loss(reduced): nan grad_norm: nan if_nan_skip: 3 max_memory: 33.0GB text_tokens: 31775.0 tgs: 60 data_time: 0.88s time: 522.85s eta: 3 days, 13:50:04
262
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:45:52][WARNING] [Step 3] The grad norm is NaN or Inf, skip this step. Skipped 4 steps in total.
263
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:45:52][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 4 max_memory: 33.0GB text_tokens: 31442.0 tgs: 60 data_time: 0.80s time: 520.30s eta: 3 days, 13:16:18
264
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:54:33][WARNING] [Step 4] The grad norm is NaN or Inf, skip this step. Skipped 5 steps in total.
265
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 00:54:33][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.252 loss(reduced): nan grad_norm: nan if_nan_skip: 5 max_memory: 33.0GB text_tokens: 31310.0 tgs: 60 data_time: 0.71s time: 520.99s eta: 3 days, 13:14:23
266
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:03:14][WARNING] [Step 5] The grad norm is NaN or Inf, skip this step. Skipped 6 steps in total.
267
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:03:14][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.276 loss(reduced): nan grad_norm: nan if_nan_skip: 6 max_memory: 33.1GB text_tokens: 31970.0 tgs: 61 data_time: 0.82s time: 520.89s eta: 3 days, 13:04:40
268
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:11:58][WARNING] [Step 6] The grad norm is NaN or Inf, skip this step. Skipped 7 steps in total.
269
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:11:58][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 7 max_memory: 33.1GB text_tokens: 32046.0 tgs: 61 data_time: 0.82s time: 523.39s eta: 3 days, 13:20:32
270
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:20:38][WARNING] [Step 7] The grad norm is NaN or Inf, skip this step. Skipped 8 steps in total.
271
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:20:38][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.245 loss(reduced): nan grad_norm: nan if_nan_skip: 8 max_memory: 33.1GB text_tokens: 32280.0 tgs: 61 data_time: 0.64s time: 520.67s eta: 3 days, 12:45:14
272
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:29:18][WARNING] [Step 8] The grad norm is NaN or Inf, skip this step. Skipped 9 steps in total.
273
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:29:18][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 9 max_memory: 33.1GB text_tokens: 32093.0 tgs: 61 data_time: 0.97s time: 520.18s eta: 3 days, 12:31:48
274
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:37:59][WARNING] [Step 9] The grad norm is NaN or Inf, skip this step. Skipped 10 steps in total.
275
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:37:59][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.286 loss(reduced): nan grad_norm: nan if_nan_skip: 10 max_memory: 33.0GB text_tokens: 32007.0 tgs: 61 data_time: 0.51s time: 520.37s eta: 3 days, 12:24:58
276
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:46:43][WARNING] [Step 10] The grad norm is NaN or Inf, skip this step. Skipped 11 steps in total.
277
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:46:43][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.277 loss(reduced): nan grad_norm: nan if_nan_skip: 11 max_memory: 33.1GB text_tokens: 32301.0 tgs: 61 data_time: 0.78s time: 524.54s eta: 3 days, 12:56:48
278
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:55:24][WARNING] [Step 11] The grad norm is NaN or Inf, skip this step. Skipped 12 steps in total.
279
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 01:55:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.285 loss(reduced): nan grad_norm: nan if_nan_skip: 12 max_memory: 32.3GB text_tokens: 30685.0 tgs: 58 data_time: 0.80s time: 520.66s eta: 3 days, 12:10:23
280
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:04:04][WARNING] [Step 12] The grad norm is NaN or Inf, skip this step. Skipped 13 steps in total.
281
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:04:04][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.282 loss(reduced): nan grad_norm: nan if_nan_skip: 13 max_memory: 33.1GB text_tokens: 30776.0 tgs: 59 data_time: 0.83s time: 519.93s eta: 3 days, 11:54:37
282
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:12:45][WARNING] [Step 13] The grad norm is NaN or Inf, skip this step. Skipped 14 steps in total.
283
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:12:45][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.328 loss(reduced): nan grad_norm: nan if_nan_skip: 14 max_memory: 33.0GB text_tokens: 31850.0 tgs: 61 data_time: 0.72s time: 521.30s eta: 3 days, 11:59:11
284
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:21:29][WARNING] [Step 14] The grad norm is NaN or Inf, skip this step. Skipped 15 steps in total.
285
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:21:29][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.318 loss(reduced): nan grad_norm: nan if_nan_skip: 15 max_memory: 33.0GB text_tokens: 32016.0 tgs: 61 data_time: 0.73s time: 524.13s eta: 3 days, 12:17:54
286
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:30:10][WARNING] [Step 15] The grad norm is NaN or Inf, skip this step. Skipped 16 steps in total.
287
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:30:10][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.269 loss(reduced): nan grad_norm: nan if_nan_skip: 16 max_memory: 33.0GB text_tokens: 31587.0 tgs: 60 data_time: 0.99s time: 520.55s eta: 3 days, 11:34:36
288
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:38:49][WARNING] [Step 16] The grad norm is NaN or Inf, skip this step. Skipped 17 steps in total.
289
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:38:49][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.237 loss(reduced): nan grad_norm: nan if_nan_skip: 17 max_memory: 32.9GB text_tokens: 30793.0 tgs: 59 data_time: 0.75s time: 518.80s eta: 3 days, 11:09:07
290
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:47:31][WARNING] [Step 17] The grad norm is NaN or Inf, skip this step. Skipped 18 steps in total.
291
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:47:31][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.245 loss(reduced): nan grad_norm: nan if_nan_skip: 18 max_memory: 33.0GB text_tokens: 31008.0 tgs: 59 data_time: 0.90s time: 522.11s eta: 3 days, 11:32:17
292
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:56:15][WARNING] [Step 18] The grad norm is NaN or Inf, skip this step. Skipped 19 steps in total.
293
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 02:56:15][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.283 loss(reduced): nan grad_norm: nan if_nan_skip: 19 max_memory: 33.0GB text_tokens: 32462.0 tgs: 61 data_time: 0.85s time: 523.84s eta: 3 days, 11:40:08
294
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:04:55][WARNING] [Step 19] The grad norm is NaN or Inf, skip this step. Skipped 20 steps in total.
295
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:04:55][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.288 loss(reduced): nan grad_norm: nan if_nan_skip: 20 max_memory: 33.0GB text_tokens: 32193.0 tgs: 61 data_time: 0.65s time: 520.44s eta: 3 days, 10:58:51
296
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:13:34][WARNING] [Step 20] The grad norm is NaN or Inf, skip this step. Skipped 21 steps in total.
297
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:13:34][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.236 loss(reduced): nan grad_norm: nan if_nan_skip: 21 max_memory: 33.0GB text_tokens: 31420.0 tgs: 60 data_time: 0.77s time: 518.44s eta: 3 days, 10:31:07
298
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:22:16][WARNING] [Step 21] The grad norm is NaN or Inf, skip this step. Skipped 22 steps in total.
299
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:22:16][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.343 loss(reduced): nan grad_norm: nan if_nan_skip: 22 max_memory: 33.0GB text_tokens: 31579.0 tgs: 60 data_time: 0.93s time: 522.85s eta: 3 days, 11:04:29
300
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:31:00][WARNING] [Step 22] The grad norm is NaN or Inf, skip this step. Skipped 23 steps in total.
301
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:31:00][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.220 loss(reduced): nan grad_norm: nan if_nan_skip: 23 max_memory: 33.0GB text_tokens: 31823.0 tgs: 60 data_time: 0.80s time: 523.52s eta: 3 days, 11:02:08
302
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:39:41][WARNING] [Step 23] The grad norm is NaN or Inf, skip this step. Skipped 24 steps in total.
303
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:39:41][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.307 loss(reduced): nan grad_norm: nan if_nan_skip: 24 max_memory: 32.8GB text_tokens: 30152.0 tgs: 57 data_time: 0.78s time: 520.97s eta: 3 days, 10:29:11
304
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:48:20][WARNING] [Step 24] The grad norm is NaN or Inf, skip this step. Skipped 25 steps in total.
305
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:48:20][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.252 loss(reduced): nan grad_norm: nan if_nan_skip: 25 max_memory: 32.8GB text_tokens: 31164.0 tgs: 60 data_time: 0.88s time: 519.11s eta: 3 days, 10:02:53
306
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:57:03][WARNING] [Step 25] The grad norm is NaN or Inf, skip this step. Skipped 26 steps in total.
307
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 03:57:03][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.259 loss(reduced): nan grad_norm: nan if_nan_skip: 26 max_memory: 32.9GB text_tokens: 31740.0 tgs: 60 data_time: 0.81s time: 523.30s eta: 3 days, 10:33:55
308
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:05:46][WARNING] [Step 26] The grad norm is NaN or Inf, skip this step. Skipped 27 steps in total.
309
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:05:46][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.280 loss(reduced): nan grad_norm: nan if_nan_skip: 27 max_memory: 33.1GB text_tokens: 32120.0 tgs: 61 data_time: 0.92s time: 522.91s eta: 3 days, 10:21:31
310
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:14:28][WARNING] [Step 27] The grad norm is NaN or Inf, skip this step. Skipped 28 steps in total.
311
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:14:28][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 28 max_memory: 32.5GB text_tokens: 30357.0 tgs: 58 data_time: 0.75s time: 521.32s eta: 3 days, 9:57:49
312
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:23:08][WARNING] [Step 28] The grad norm is NaN or Inf, skip this step. Skipped 29 steps in total.
313
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:23:08][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.269 loss(reduced): nan grad_norm: nan if_nan_skip: 29 max_memory: 32.9GB text_tokens: 32083.0 tgs: 61 data_time: 0.85s time: 520.65s eta: 3 days, 9:42:44
314
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:31:51][WARNING] [Step 29] The grad norm is NaN or Inf, skip this step. Skipped 30 steps in total.
315
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:31:51][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 30 max_memory: 32.9GB text_tokens: 31963.0 tgs: 61 data_time: 0.86s time: 522.39s eta: 3 days, 9:50:30
316
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:40:34][WARNING] [Step 30] The grad norm is NaN or Inf, skip this step. Skipped 31 steps in total.
317
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:40:34][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.298 loss(reduced): nan grad_norm: nan if_nan_skip: 31 max_memory: 33.1GB text_tokens: 31960.0 tgs: 61 data_time: 0.77s time: 523.76s eta: 3 days, 9:54:36
318
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:49:15][WARNING] [Step 31] The grad norm is NaN or Inf, skip this step. Skipped 32 steps in total.
319
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:49:15][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.269 loss(reduced): nan grad_norm: nan if_nan_skip: 32 max_memory: 33.0GB text_tokens: 31654.0 tgs: 60 data_time: 0.80s time: 520.44s eta: 3 days, 9:14:45
320
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:57:55][WARNING] [Step 32] The grad norm is NaN or Inf, skip this step. Skipped 33 steps in total.
321
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 04:57:55][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.210 loss(reduced): nan grad_norm: nan if_nan_skip: 33 max_memory: 33.0GB text_tokens: 31502.0 tgs: 60 data_time: 0.62s time: 520.69s eta: 3 days, 9:08:24
322
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:06:37][WARNING] [Step 33] The grad norm is NaN or Inf, skip this step. Skipped 34 steps in total.
323
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:06:37][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.241 loss(reduced): nan grad_norm: nan if_nan_skip: 34 max_memory: 32.9GB text_tokens: 31836.0 tgs: 61 data_time: 0.88s time: 521.26s eta: 3 days, 9:05:03
324
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:15:21][WARNING] [Step 34] The grad norm is NaN or Inf, skip this step. Skipped 35 steps in total.
325
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:15:21][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.285 loss(reduced): nan grad_norm: nan if_nan_skip: 35 max_memory: 32.9GB text_tokens: 31221.0 tgs: 59 data_time: 0.70s time: 524.12s eta: 3 days, 9:23:02
326
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:24:02][WARNING] [Step 35] The grad norm is NaN or Inf, skip this step. Skipped 36 steps in total.
327
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:24:02][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.269 loss(reduced): nan grad_norm: nan if_nan_skip: 36 max_memory: 33.0GB text_tokens: 31303.0 tgs: 60 data_time: 0.92s time: 520.97s eta: 3 days, 8:44:59
328
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:32:42][WARNING] [Step 36] The grad norm is NaN or Inf, skip this step. Skipped 37 steps in total.
329
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:32:42][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.307 loss(reduced): nan grad_norm: nan if_nan_skip: 37 max_memory: 33.1GB text_tokens: 31855.0 tgs: 61 data_time: 1.02s time: 520.16s eta: 3 days, 8:28:50
330
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:41:25][WARNING] [Step 37] The grad norm is NaN or Inf, skip this step. Skipped 38 steps in total.
331
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:41:25][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 38 max_memory: 32.8GB text_tokens: 30932.0 tgs: 59 data_time: 0.69s time: 522.57s eta: 3 days, 8:42:29
332
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:50:09][WARNING] [Step 38] The grad norm is NaN or Inf, skip this step. Skipped 39 steps in total.
333
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:50:09][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.339 loss(reduced): nan grad_norm: nan if_nan_skip: 39 max_memory: 33.1GB text_tokens: 31919.0 tgs: 60 data_time: 0.60s time: 524.34s eta: 3 days, 8:50:09
334
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:58:49][WARNING] [Step 39] The grad norm is NaN or Inf, skip this step. Skipped 40 steps in total.
335
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 05:58:49][INFO] [Train] (Epoch 1) Step 40/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 40 max_memory: 33.0GB text_tokens: 31714.0 tgs: 61 data_time: 0.61s time: 519.89s eta: 3 days, 8:00:21
336
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:07:29][WARNING] [Step 40] The grad norm is NaN or Inf, skip this step. Skipped 41 steps in total.
337
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:07:29][INFO] [Train] (Epoch 1) Step 41/593 lr: 0.000020 loss: 0.286 loss(reduced): nan grad_norm: nan if_nan_skip: 41 max_memory: 32.9GB text_tokens: 32069.0 tgs: 61 data_time: 0.71s time: 520.07s eta: 3 days, 7:53:19
338
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:16:11][WARNING] [Step 41] The grad norm is NaN or Inf, skip this step. Skipped 42 steps in total.
339
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:16:11][INFO] [Train] (Epoch 1) Step 42/593 lr: 0.000020 loss: 0.322 loss(reduced): nan grad_norm: nan if_nan_skip: 42 max_memory: 33.0GB text_tokens: 31550.0 tgs: 60 data_time: 0.60s time: 522.01s eta: 3 days, 8:02:26
340
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:24:55][WARNING] [Step 42] The grad norm is NaN or Inf, skip this step. Skipped 43 steps in total.
341
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:24:55][INFO] [Train] (Epoch 1) Step 43/593 lr: 0.000020 loss: 0.230 loss(reduced): nan grad_norm: nan if_nan_skip: 43 max_memory: 32.2GB text_tokens: 30417.0 tgs: 58 data_time: 0.51s time: 524.36s eta: 3 days, 8:15:23
342
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:33:36][WARNING] [Step 43] The grad norm is NaN or Inf, skip this step. Skipped 44 steps in total.
343
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:33:36][INFO] [Train] (Epoch 1) Step 44/593 lr: 0.000020 loss: 0.228 loss(reduced): nan grad_norm: nan if_nan_skip: 44 max_memory: 33.1GB text_tokens: 31874.0 tgs: 61 data_time: 0.84s time: 520.50s eta: 3 days, 7:31:13
344
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:42:14][WARNING] [Step 44] The grad norm is NaN or Inf, skip this step. Skipped 45 steps in total.
345
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:42:14][INFO] [Train] (Epoch 1) Step 45/593 lr: 0.000020 loss: 0.299 loss(reduced): nan grad_norm: nan if_nan_skip: 45 max_memory: 32.7GB text_tokens: 31865.0 tgs: 61 data_time: 0.59s time: 518.57s eta: 3 days, 7:04:56
346
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:50:57][WARNING] [Step 45] The grad norm is NaN or Inf, skip this step. Skipped 46 steps in total.
347
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:50:58][INFO] [Train] (Epoch 1) Step 46/593 lr: 0.000020 loss: 0.233 loss(reduced): nan grad_norm: nan if_nan_skip: 46 max_memory: 33.1GB text_tokens: 32131.0 tgs: 61 data_time: 0.93s time: 523.16s eta: 3 days, 7:38:13
348
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:59:41][WARNING] [Step 46] The grad norm is NaN or Inf, skip this step. Skipped 47 steps in total.
349
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 06:59:41][INFO] [Train] (Epoch 1) Step 47/593 lr: 0.000020 loss: 0.284 loss(reduced): nan grad_norm: nan if_nan_skip: 47 max_memory: 33.1GB text_tokens: 32443.0 tgs: 61 data_time: 0.65s time: 523.63s eta: 3 days, 7:33:47
350
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:08:21][WARNING] [Step 47] The grad norm is NaN or Inf, skip this step. Skipped 48 steps in total.
351
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:08:21][INFO] [Train] (Epoch 1) Step 48/593 lr: 0.000020 loss: 0.238 loss(reduced): nan grad_norm: nan if_nan_skip: 48 max_memory: 32.8GB text_tokens: 31045.0 tgs: 59 data_time: 0.72s time: 520.14s eta: 3 days, 6:53:16
352
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:17:01][WARNING] [Step 48] The grad norm is NaN or Inf, skip this step. Skipped 49 steps in total.
353
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:17:01][INFO] [Train] (Epoch 1) Step 49/593 lr: 0.000020 loss: 0.287 loss(reduced): nan grad_norm: nan if_nan_skip: 49 max_memory: 32.9GB text_tokens: 31846.0 tgs: 61 data_time: 0.73s time: 520.06s eta: 3 days, 6:43:54
354
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:25:45][WARNING] [Step 49] The grad norm is NaN or Inf, skip this step. Skipped 50 steps in total.
355
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:25:45][INFO] [Train] (Epoch 1) Step 50/593 lr: 0.000020 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 50 max_memory: 33.0GB text_tokens: 31945.0 tgs: 60 data_time: 0.94s time: 523.77s eta: 3 days, 7:08:50
356
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:34:29][WARNING] [Step 50] The grad norm is NaN or Inf, skip this step. Skipped 51 steps in total.
357
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:34:29][INFO] [Train] (Epoch 1) Step 51/593 lr: 0.000020 loss: 0.304 loss(reduced): nan grad_norm: nan if_nan_skip: 51 max_memory: 32.9GB text_tokens: 32011.0 tgs: 61 data_time: 1.05s time: 523.78s eta: 3 days, 7:00:11
358
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:43:09][WARNING] [Step 51] The grad norm is NaN or Inf, skip this step. Skipped 52 steps in total.
359
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:43:09][INFO] [Train] (Epoch 1) Step 52/593 lr: 0.000020 loss: 0.243 loss(reduced): nan grad_norm: nan if_nan_skip: 52 max_memory: 33.1GB text_tokens: 32547.0 tgs: 62 data_time: 0.95s time: 520.25s eta: 3 days, 6:19:34
360
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:51:48][WARNING] [Step 52] The grad norm is NaN or Inf, skip this step. Skipped 53 steps in total.
361
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 07:51:48][INFO] [Train] (Epoch 1) Step 53/593 lr: 0.000020 loss: 0.234 loss(reduced): nan grad_norm: nan if_nan_skip: 53 max_memory: 33.1GB text_tokens: 32049.0 tgs: 61 data_time: 0.84s time: 518.78s eta: 3 days, 5:57:42
362
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:00:31][WARNING] [Step 53] The grad norm is NaN or Inf, skip this step. Skipped 54 steps in total.
363
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:00:31][INFO] [Train] (Epoch 1) Step 54/593 lr: 0.000020 loss: 0.305 loss(reduced): nan grad_norm: nan if_nan_skip: 54 max_memory: 33.1GB text_tokens: 32269.0 tgs: 61 data_time: 0.82s time: 522.68s eta: 3 days, 6:24:04
364
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:09:13][WARNING] [Step 54] The grad norm is NaN or Inf, skip this step. Skipped 55 steps in total.
365
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:09:13][INFO] [Train] (Epoch 1) Step 55/593 lr: 0.000020 loss: 0.223 loss(reduced): nan grad_norm: nan if_nan_skip: 55 max_memory: 33.1GB text_tokens: 30865.0 tgs: 59 data_time: 1.05s time: 522.70s eta: 3 days, 6:15:37
366
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:17:54][WARNING] [Step 55] The grad norm is NaN or Inf, skip this step. Skipped 56 steps in total.
367
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:17:54][INFO] [Train] (Epoch 1) Step 56/593 lr: 0.000020 loss: 0.265 loss(reduced): nan grad_norm: nan if_nan_skip: 56 max_memory: 32.7GB text_tokens: 31114.0 tgs: 59 data_time: 1.07s time: 520.54s eta: 3 days, 5:47:28
368
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:26:34][WARNING] [Step 56] The grad norm is NaN or Inf, skip this step. Skipped 57 steps in total.
369
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:26:34][INFO] [Train] (Epoch 1) Step 57/593 lr: 0.000020 loss: 0.279 loss(reduced): nan grad_norm: nan if_nan_skip: 57 max_memory: 32.8GB text_tokens: 32034.0 tgs: 61 data_time: 0.81s time: 519.67s eta: 3 days, 5:31:00
370
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:35:15][WARNING] [Step 57] The grad norm is NaN or Inf, skip this step. Skipped 58 steps in total.
371
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:35:15][INFO] [Train] (Epoch 1) Step 58/593 lr: 0.000020 loss: 0.257 loss(reduced): nan grad_norm: nan if_nan_skip: 58 max_memory: 32.6GB text_tokens: 31575.0 tgs: 60 data_time: 1.23s time: 521.28s eta: 3 days, 5:36:47
372
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:43:58][WARNING] [Step 58] The grad norm is NaN or Inf, skip this step. Skipped 59 steps in total.
373
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:43:58][INFO] [Train] (Epoch 1) Step 59/593 lr: 0.000020 loss: 0.233 loss(reduced): nan grad_norm: nan if_nan_skip: 59 max_memory: 32.4GB text_tokens: 30677.0 tgs: 58 data_time: 0.76s time: 523.60s eta: 3 days, 5:48:45
374
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:52:38][WARNING] [Step 59] The grad norm is NaN or Inf, skip this step. Skipped 60 steps in total.
375
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 08:52:38][INFO] [Train] (Epoch 1) Step 60/593 lr: 0.000020 loss: 0.335 loss(reduced): nan grad_norm: nan if_nan_skip: 60 max_memory: 33.1GB text_tokens: 31925.0 tgs: 61 data_time: 0.91s time: 519.33s eta: 3 days, 5:02:03
376
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:01:18][WARNING] [Step 60] The grad norm is NaN or Inf, skip this step. Skipped 61 steps in total.
377
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:01:18][INFO] [Train] (Epoch 1) Step 61/593 lr: 0.000020 loss: 0.305 loss(reduced): nan grad_norm: nan if_nan_skip: 61 max_memory: 33.0GB text_tokens: 32154.0 tgs: 61 data_time: 0.61s time: 520.63s eta: 3 days, 5:04:53
378
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:10:00][WARNING] [Step 61] The grad norm is NaN or Inf, skip this step. Skipped 62 steps in total.
379
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:10:00][INFO] [Train] (Epoch 1) Step 62/593 lr: 0.000020 loss: 0.312 loss(reduced): nan grad_norm: nan if_nan_skip: 62 max_memory: 33.0GB text_tokens: 32123.0 tgs: 61 data_time: 0.80s time: 521.46s eta: 3 days, 5:03:36
380
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:18:44][WARNING] [Step 62] The grad norm is NaN or Inf, skip this step. Skipped 63 steps in total.
381
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:18:44][INFO] [Train] (Epoch 1) Step 63/593 lr: 0.000020 loss: 0.252 loss(reduced): nan grad_norm: nan if_nan_skip: 63 max_memory: 32.8GB text_tokens: 31246.0 tgs: 59 data_time: 0.89s time: 523.96s eta: 3 days, 5:17:00
382
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:27:23][WARNING] [Step 63] The grad norm is NaN or Inf, skip this step. Skipped 64 steps in total.
383
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:27:23][INFO] [Train] (Epoch 1) Step 64/593 lr: 0.000020 loss: 0.251 loss(reduced): nan grad_norm: nan if_nan_skip: 64 max_memory: 33.1GB text_tokens: 31119.0 tgs: 59 data_time: 0.84s time: 518.86s eta: 3 days, 4:23:15
384
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:36:02][WARNING] [Step 64] The grad norm is NaN or Inf, skip this step. Skipped 65 steps in total.
385
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:36:02][INFO] [Train] (Epoch 1) Step 65/593 lr: 0.000020 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 65 max_memory: 32.9GB text_tokens: 31196.0 tgs: 60 data_time: 0.58s time: 519.24s eta: 3 days, 4:17:55
386
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:44:43][WARNING] [Step 65] The grad norm is NaN or Inf, skip this step. Skipped 66 steps in total.
387
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:44:43][INFO] [Train] (Epoch 1) Step 66/593 lr: 0.000020 loss: 0.266 loss(reduced): nan grad_norm: nan if_nan_skip: 66 max_memory: 32.9GB text_tokens: 31947.0 tgs: 61 data_time: 0.67s time: 520.72s eta: 3 days, 4:22:21
388
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:53:27][WARNING] [Step 66] The grad norm is NaN or Inf, skip this step. Skipped 67 steps in total.
389
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 09:53:27][INFO] [Train] (Epoch 1) Step 67/593 lr: 0.000020 loss: 0.271 loss(reduced): nan grad_norm: nan if_nan_skip: 67 max_memory: 32.6GB text_tokens: 31639.0 tgs: 60 data_time: 0.82s time: 524.26s eta: 3 days, 4:44:46
390
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:02:07][WARNING] [Step 67] The grad norm is NaN or Inf, skip this step. Skipped 68 steps in total.
391
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:02:07][INFO] [Train] (Epoch 1) Step 68/593 lr: 0.000020 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 68 max_memory: 33.0GB text_tokens: 31322.0 tgs: 60 data_time: 1.03s time: 520.28s eta: 3 days, 4:01:06
392
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:10:47][WARNING] [Step 68] The grad norm is NaN or Inf, skip this step. Skipped 69 steps in total.
393
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:10:47][INFO] [Train] (Epoch 1) Step 69/593 lr: 0.000020 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 69 max_memory: 33.0GB text_tokens: 31116.0 tgs: 59 data_time: 0.80s time: 519.51s eta: 3 days, 3:45:40
394
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:19:29][WARNING] [Step 69] The grad norm is NaN or Inf, skip this step. Skipped 70 steps in total.
395
+ [XTuner][RANK 27][DP 6][SP 3][TP 0][2025-01-21 10:19:29][INFO] [Train] (Epoch 1) Step 70/593 lr: 0.000020 loss: 0.269 loss(reduced): nan grad_norm: nan if_nan_skip: 70 max_memory: 33.0GB text_tokens: 32184.0 tgs: 61 data_time: 0.63s time: 522.33s eta: 3 days, 4:01:39
20250120235238/rank30.log ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:52:42][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250120235238', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:52:42][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:53:37][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:54:31][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:55:25][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:56:18][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:57:14][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:58:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-20 23:59:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:00:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:00:05][INFO] [Dataset & Dataloader] Cost 443.12s
12
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:01][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:08:02][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:10:23][SUCCESS] [Parallelize LLM] Elapsed time 141.62 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:10:24][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:19:46][WARNING] [Step 0] The grad norm is NaN or Inf, skip this step. Skipped 1 steps in total.
257
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:19:46][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.315 loss(reduced): nan grad_norm: nan if_nan_skip: 1 max_memory: 33.1GB text_tokens: 32271.0 tgs: 58 data_time: 1.82s time: 547.03s eta: 3 days, 18:06:29
258
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:28:29][WARNING] [Step 1] The grad norm is NaN or Inf, skip this step. Skipped 2 steps in total.
259
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:28:29][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.239 loss(reduced): nan grad_norm: nan if_nan_skip: 2 max_memory: 33.1GB text_tokens: 32224.0 tgs: 61 data_time: 0.92s time: 523.25s eta: 3 days, 14:02:45
260
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:37:12][WARNING] [Step 2] The grad norm is NaN or Inf, skip this step. Skipped 3 steps in total.
261
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:37:12][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.300 loss(reduced): nan grad_norm: nan if_nan_skip: 3 max_memory: 32.8GB text_tokens: 31186.0 tgs: 59 data_time: 0.95s time: 522.85s eta: 3 days, 13:50:04
262
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:45:52][WARNING] [Step 3] The grad norm is NaN or Inf, skip this step. Skipped 4 steps in total.
263
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:45:52][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.267 loss(reduced): nan grad_norm: nan if_nan_skip: 4 max_memory: 33.1GB text_tokens: 32092.0 tgs: 61 data_time: 0.82s time: 520.30s eta: 3 days, 13:16:17
264
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:54:33][WARNING] [Step 4] The grad norm is NaN or Inf, skip this step. Skipped 5 steps in total.
265
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 00:54:33][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 5 max_memory: 32.8GB text_tokens: 31643.0 tgs: 60 data_time: 0.79s time: 520.99s eta: 3 days, 13:14:23
266
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:03:14][WARNING] [Step 5] The grad norm is NaN or Inf, skip this step. Skipped 6 steps in total.
267
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:03:14][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.317 loss(reduced): nan grad_norm: nan if_nan_skip: 6 max_memory: 33.0GB text_tokens: 31263.0 tgs: 60 data_time: 0.72s time: 520.89s eta: 3 days, 13:04:41
268
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:11:58][WARNING] [Step 6] The grad norm is NaN or Inf, skip this step. Skipped 7 steps in total.
269
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:11:58][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.205 loss(reduced): nan grad_norm: nan if_nan_skip: 7 max_memory: 32.7GB text_tokens: 30097.0 tgs: 57 data_time: 0.79s time: 523.39s eta: 3 days, 13:20:31
270
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:20:38][WARNING] [Step 7] The grad norm is NaN or Inf, skip this step. Skipped 8 steps in total.
271
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:20:38][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.377 loss(reduced): nan grad_norm: nan if_nan_skip: 8 max_memory: 33.0GB text_tokens: 32404.0 tgs: 62 data_time: 0.89s time: 520.67s eta: 3 days, 12:45:13
272
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:29:18][WARNING] [Step 8] The grad norm is NaN or Inf, skip this step. Skipped 9 steps in total.
273
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:29:18][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 9 max_memory: 32.9GB text_tokens: 30582.0 tgs: 58 data_time: 0.81s time: 520.19s eta: 3 days, 12:31:50
274
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:37:59][WARNING] [Step 9] The grad norm is NaN or Inf, skip this step. Skipped 10 steps in total.
275
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:37:59][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.212 loss(reduced): nan grad_norm: nan if_nan_skip: 10 max_memory: 33.1GB text_tokens: 32372.0 tgs: 62 data_time: 0.80s time: 520.38s eta: 3 days, 12:24:59
276
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:46:43][WARNING] [Step 10] The grad norm is NaN or Inf, skip this step. Skipped 11 steps in total.
277
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:46:43][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.267 loss(reduced): nan grad_norm: nan if_nan_skip: 11 max_memory: 33.1GB text_tokens: 32256.0 tgs: 61 data_time: 0.72s time: 524.54s eta: 3 days, 12:56:48
278
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:55:24][WARNING] [Step 11] The grad norm is NaN or Inf, skip this step. Skipped 12 steps in total.
279
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 01:55:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 12 max_memory: 33.0GB text_tokens: 31025.0 tgs: 59 data_time: 0.62s time: 520.66s eta: 3 days, 12:10:23
280
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:04:04][WARNING] [Step 12] The grad norm is NaN or Inf, skip this step. Skipped 13 steps in total.
281
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:04:04][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.285 loss(reduced): nan grad_norm: nan if_nan_skip: 13 max_memory: 33.1GB text_tokens: 31804.0 tgs: 61 data_time: 0.87s time: 519.93s eta: 3 days, 11:54:36
282
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:12:45][WARNING] [Step 13] The grad norm is NaN or Inf, skip this step. Skipped 14 steps in total.
283
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:12:45][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 14 max_memory: 33.1GB text_tokens: 32389.0 tgs: 62 data_time: 0.82s time: 521.30s eta: 3 days, 11:59:12
284
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:21:29][WARNING] [Step 14] The grad norm is NaN or Inf, skip this step. Skipped 15 steps in total.
285
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:21:29][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.314 loss(reduced): nan grad_norm: nan if_nan_skip: 15 max_memory: 33.0GB text_tokens: 30946.0 tgs: 59 data_time: 0.87s time: 524.13s eta: 3 days, 12:17:53
286
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:30:10][WARNING] [Step 15] The grad norm is NaN or Inf, skip this step. Skipped 16 steps in total.
287
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:30:10][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.292 loss(reduced): nan grad_norm: nan if_nan_skip: 16 max_memory: 33.1GB text_tokens: 31432.0 tgs: 60 data_time: 0.91s time: 520.55s eta: 3 days, 11:34:37
288
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:38:49][WARNING] [Step 16] The grad norm is NaN or Inf, skip this step. Skipped 17 steps in total.
289
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:38:49][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.340 loss(reduced): nan grad_norm: nan if_nan_skip: 17 max_memory: 33.1GB text_tokens: 32194.0 tgs: 62 data_time: 0.74s time: 518.80s eta: 3 days, 11:09:07
290
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:47:31][WARNING] [Step 17] The grad norm is NaN or Inf, skip this step. Skipped 18 steps in total.
291
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:47:31][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.367 loss(reduced): nan grad_norm: nan if_nan_skip: 18 max_memory: 33.1GB text_tokens: 31342.0 tgs: 60 data_time: 1.05s time: 522.11s eta: 3 days, 11:32:17
292
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:56:15][WARNING] [Step 18] The grad norm is NaN or Inf, skip this step. Skipped 19 steps in total.
293
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 02:56:15][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.300 loss(reduced): nan grad_norm: nan if_nan_skip: 19 max_memory: 33.0GB text_tokens: 32054.0 tgs: 61 data_time: 0.68s time: 523.84s eta: 3 days, 11:40:07
294
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:04:55][WARNING] [Step 19] The grad norm is NaN or Inf, skip this step. Skipped 20 steps in total.
295
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:04:55][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.276 loss(reduced): nan grad_norm: nan if_nan_skip: 20 max_memory: 33.1GB text_tokens: 32360.0 tgs: 62 data_time: 0.74s time: 520.44s eta: 3 days, 10:58:51
296
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:13:34][WARNING] [Step 20] The grad norm is NaN or Inf, skip this step. Skipped 21 steps in total.
297
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:13:34][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 21 max_memory: 33.1GB text_tokens: 31497.0 tgs: 60 data_time: 0.61s time: 518.44s eta: 3 days, 10:31:07
298
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:22:16][WARNING] [Step 21] The grad norm is NaN or Inf, skip this step. Skipped 22 steps in total.
299
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:22:16][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 22 max_memory: 33.1GB text_tokens: 32299.0 tgs: 61 data_time: 0.79s time: 522.85s eta: 3 days, 11:04:29
300
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:31:00][WARNING] [Step 22] The grad norm is NaN or Inf, skip this step. Skipped 23 steps in total.
301
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:31:00][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 23 max_memory: 33.1GB text_tokens: 32054.0 tgs: 61 data_time: 0.64s time: 523.52s eta: 3 days, 11:02:08
302
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:39:41][WARNING] [Step 23] The grad norm is NaN or Inf, skip this step. Skipped 24 steps in total.
303
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:39:41][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.230 loss(reduced): nan grad_norm: nan if_nan_skip: 24 max_memory: 33.1GB text_tokens: 31131.0 tgs: 59 data_time: 0.62s time: 520.97s eta: 3 days, 10:29:11
304
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:48:20][WARNING] [Step 24] The grad norm is NaN or Inf, skip this step. Skipped 25 steps in total.
305
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:48:20][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.320 loss(reduced): nan grad_norm: nan if_nan_skip: 25 max_memory: 33.0GB text_tokens: 32103.0 tgs: 61 data_time: 0.74s time: 519.11s eta: 3 days, 10:02:53
306
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:57:03][WARNING] [Step 25] The grad norm is NaN or Inf, skip this step. Skipped 26 steps in total.
307
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 03:57:03][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 26 max_memory: 33.0GB text_tokens: 32152.0 tgs: 61 data_time: 0.66s time: 523.30s eta: 3 days, 10:33:55
308
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:05:46][WARNING] [Step 26] The grad norm is NaN or Inf, skip this step. Skipped 27 steps in total.
309
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:05:46][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.282 loss(reduced): nan grad_norm: nan if_nan_skip: 27 max_memory: 32.9GB text_tokens: 31336.0 tgs: 59 data_time: 0.96s time: 522.91s eta: 3 days, 10:21:31
310
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:14:28][WARNING] [Step 27] The grad norm is NaN or Inf, skip this step. Skipped 28 steps in total.
311
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:14:28][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.265 loss(reduced): nan grad_norm: nan if_nan_skip: 28 max_memory: 33.0GB text_tokens: 31412.0 tgs: 60 data_time: 0.76s time: 521.32s eta: 3 days, 9:57:49
312
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:23:08][WARNING] [Step 28] The grad norm is NaN or Inf, skip this step. Skipped 29 steps in total.
313
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:23:08][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.293 loss(reduced): nan grad_norm: nan if_nan_skip: 29 max_memory: 33.1GB text_tokens: 32135.0 tgs: 61 data_time: 0.70s time: 520.65s eta: 3 days, 9:42:44
314
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:31:51][WARNING] [Step 29] The grad norm is NaN or Inf, skip this step. Skipped 30 steps in total.
315
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:31:51][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.314 loss(reduced): nan grad_norm: nan if_nan_skip: 30 max_memory: 32.9GB text_tokens: 31733.0 tgs: 60 data_time: 0.83s time: 522.39s eta: 3 days, 9:50:30
316
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:40:34][WARNING] [Step 30] The grad norm is NaN or Inf, skip this step. Skipped 31 steps in total.
317
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:40:34][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.215 loss(reduced): nan grad_norm: nan if_nan_skip: 31 max_memory: 33.1GB text_tokens: 31765.0 tgs: 60 data_time: 0.86s time: 523.76s eta: 3 days, 9:54:37
318
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:49:15][WARNING] [Step 31] The grad norm is NaN or Inf, skip this step. Skipped 32 steps in total.
319
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:49:15][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.313 loss(reduced): nan grad_norm: nan if_nan_skip: 32 max_memory: 33.1GB text_tokens: 31949.0 tgs: 61 data_time: 0.92s time: 520.43s eta: 3 days, 9:14:43
320
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:57:55][WARNING] [Step 32] The grad norm is NaN or Inf, skip this step. Skipped 33 steps in total.
321
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 04:57:55][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.282 loss(reduced): nan grad_norm: nan if_nan_skip: 33 max_memory: 32.9GB text_tokens: 31335.0 tgs: 60 data_time: 0.60s time: 520.69s eta: 3 days, 9:08:25
322
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:06:37][WARNING] [Step 33] The grad norm is NaN or Inf, skip this step. Skipped 34 steps in total.
323
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:06:37][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.304 loss(reduced): nan grad_norm: nan if_nan_skip: 34 max_memory: 33.0GB text_tokens: 31518.0 tgs: 60 data_time: 0.96s time: 521.26s eta: 3 days, 9:05:03
324
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:15:21][WARNING] [Step 34] The grad norm is NaN or Inf, skip this step. Skipped 35 steps in total.
325
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:15:21][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.272 loss(reduced): nan grad_norm: nan if_nan_skip: 35 max_memory: 33.0GB text_tokens: 31987.0 tgs: 61 data_time: 0.90s time: 524.12s eta: 3 days, 9:23:02
326
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:24:02][WARNING] [Step 35] The grad norm is NaN or Inf, skip this step. Skipped 36 steps in total.
327
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:24:02][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 36 max_memory: 33.0GB text_tokens: 32039.0 tgs: 61 data_time: 1.21s time: 520.97s eta: 3 days, 8:45:00
328
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:32:42][WARNING] [Step 36] The grad norm is NaN or Inf, skip this step. Skipped 37 steps in total.
329
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:32:42][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.227 loss(reduced): nan grad_norm: nan if_nan_skip: 37 max_memory: 33.0GB text_tokens: 32103.0 tgs: 61 data_time: 0.76s time: 520.16s eta: 3 days, 8:28:49
330
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:41:25][WARNING] [Step 37] The grad norm is NaN or Inf, skip this step. Skipped 38 steps in total.
331
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:41:25][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 38 max_memory: 33.1GB text_tokens: 32352.0 tgs: 61 data_time: 0.73s time: 522.57s eta: 3 days, 8:42:30
332
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:50:09][WARNING] [Step 38] The grad norm is NaN or Inf, skip this step. Skipped 39 steps in total.
333
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:50:09][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.294 loss(reduced): nan grad_norm: nan if_nan_skip: 39 max_memory: 33.0GB text_tokens: 31385.0 tgs: 59 data_time: 1.00s time: 524.34s eta: 3 days, 8:50:09
334
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:58:49][WARNING] [Step 39] The grad norm is NaN or Inf, skip this step. Skipped 40 steps in total.
335
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 05:58:49][INFO] [Train] (Epoch 1) Step 40/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 40 max_memory: 33.1GB text_tokens: 31598.0 tgs: 60 data_time: 0.72s time: 519.89s eta: 3 days, 8:00:20
336
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:07:29][WARNING] [Step 40] The grad norm is NaN or Inf, skip this step. Skipped 41 steps in total.
337
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:07:29][INFO] [Train] (Epoch 1) Step 41/593 lr: 0.000020 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 41 max_memory: 33.1GB text_tokens: 31223.0 tgs: 60 data_time: 0.75s time: 520.07s eta: 3 days, 7:53:19
338
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:16:11][WARNING] [Step 41] The grad norm is NaN or Inf, skip this step. Skipped 42 steps in total.
339
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:16:11][INFO] [Train] (Epoch 1) Step 42/593 lr: 0.000020 loss: 0.290 loss(reduced): nan grad_norm: nan if_nan_skip: 42 max_memory: 33.1GB text_tokens: 32577.0 tgs: 62 data_time: 1.08s time: 522.01s eta: 3 days, 8:02:27
340
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:24:55][WARNING] [Step 42] The grad norm is NaN or Inf, skip this step. Skipped 43 steps in total.
341
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:24:55][INFO] [Train] (Epoch 1) Step 43/593 lr: 0.000020 loss: 0.277 loss(reduced): nan grad_norm: nan if_nan_skip: 43 max_memory: 32.9GB text_tokens: 31503.0 tgs: 60 data_time: 0.76s time: 524.36s eta: 3 days, 8:15:23
342
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:33:36][WARNING] [Step 43] The grad norm is NaN or Inf, skip this step. Skipped 44 steps in total.
343
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:33:36][INFO] [Train] (Epoch 1) Step 44/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 44 max_memory: 32.9GB text_tokens: 31442.0 tgs: 60 data_time: 0.72s time: 520.50s eta: 3 days, 7:31:13
344
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:42:14][WARNING] [Step 44] The grad norm is NaN or Inf, skip this step. Skipped 45 steps in total.
345
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:42:14][INFO] [Train] (Epoch 1) Step 45/593 lr: 0.000020 loss: 0.258 loss(reduced): nan grad_norm: nan if_nan_skip: 45 max_memory: 33.0GB text_tokens: 31080.0 tgs: 59 data_time: 0.82s time: 518.57s eta: 3 days, 7:04:57
346
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:50:57][WARNING] [Step 45] The grad norm is NaN or Inf, skip this step. Skipped 46 steps in total.
347
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:50:58][INFO] [Train] (Epoch 1) Step 46/593 lr: 0.000020 loss: 0.311 loss(reduced): nan grad_norm: nan if_nan_skip: 46 max_memory: 33.1GB text_tokens: 31762.0 tgs: 60 data_time: 0.73s time: 523.16s eta: 3 days, 7:38:12
348
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:59:41][WARNING] [Step 46] The grad norm is NaN or Inf, skip this step. Skipped 47 steps in total.
349
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 06:59:41][INFO] [Train] (Epoch 1) Step 47/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 47 max_memory: 32.8GB text_tokens: 31859.0 tgs: 60 data_time: 0.84s time: 523.63s eta: 3 days, 7:33:47
350
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:08:21][WARNING] [Step 47] The grad norm is NaN or Inf, skip this step. Skipped 48 steps in total.
351
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:08:21][INFO] [Train] (Epoch 1) Step 48/593 lr: 0.000020 loss: 0.230 loss(reduced): nan grad_norm: nan if_nan_skip: 48 max_memory: 33.1GB text_tokens: 32052.0 tgs: 61 data_time: 0.62s time: 520.14s eta: 3 days, 6:53:16
352
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:17:01][WARNING] [Step 48] The grad norm is NaN or Inf, skip this step. Skipped 49 steps in total.
353
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:17:01][INFO] [Train] (Epoch 1) Step 49/593 lr: 0.000020 loss: 0.276 loss(reduced): nan grad_norm: nan if_nan_skip: 49 max_memory: 32.7GB text_tokens: 31795.0 tgs: 61 data_time: 0.76s time: 520.06s eta: 3 days, 6:43:54
354
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:25:45][WARNING] [Step 49] The grad norm is NaN or Inf, skip this step. Skipped 50 steps in total.
355
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:25:45][INFO] [Train] (Epoch 1) Step 50/593 lr: 0.000020 loss: 0.266 loss(reduced): nan grad_norm: nan if_nan_skip: 50 max_memory: 33.0GB text_tokens: 31424.0 tgs: 59 data_time: 0.75s time: 523.77s eta: 3 days, 7:08:49
356
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:34:29][WARNING] [Step 50] The grad norm is NaN or Inf, skip this step. Skipped 51 steps in total.
357
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:34:29][INFO] [Train] (Epoch 1) Step 51/593 lr: 0.000020 loss: 0.224 loss(reduced): nan grad_norm: nan if_nan_skip: 51 max_memory: 33.0GB text_tokens: 32138.0 tgs: 61 data_time: 0.70s time: 523.78s eta: 3 days, 7:00:10
358
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:43:09][WARNING] [Step 51] The grad norm is NaN or Inf, skip this step. Skipped 52 steps in total.
359
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:43:09][INFO] [Train] (Epoch 1) Step 52/593 lr: 0.000020 loss: 0.219 loss(reduced): nan grad_norm: nan if_nan_skip: 52 max_memory: 33.1GB text_tokens: 32221.0 tgs: 61 data_time: 0.91s time: 520.25s eta: 3 days, 6:19:34
360
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:51:48][WARNING] [Step 52] The grad norm is NaN or Inf, skip this step. Skipped 53 steps in total.
361
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 07:51:48][INFO] [Train] (Epoch 1) Step 53/593 lr: 0.000020 loss: 0.251 loss(reduced): nan grad_norm: nan if_nan_skip: 53 max_memory: 33.0GB text_tokens: 31787.0 tgs: 61 data_time: 0.87s time: 518.79s eta: 3 days, 5:57:42
362
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:00:31][WARNING] [Step 53] The grad norm is NaN or Inf, skip this step. Skipped 54 steps in total.
363
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:00:31][INFO] [Train] (Epoch 1) Step 54/593 lr: 0.000020 loss: 0.237 loss(reduced): nan grad_norm: nan if_nan_skip: 54 max_memory: 33.1GB text_tokens: 31727.0 tgs: 60 data_time: 0.65s time: 522.67s eta: 3 days, 6:24:03
364
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:09:13][WARNING] [Step 54] The grad norm is NaN or Inf, skip this step. Skipped 55 steps in total.
365
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:09:13][INFO] [Train] (Epoch 1) Step 55/593 lr: 0.000020 loss: 0.333 loss(reduced): nan grad_norm: nan if_nan_skip: 55 max_memory: 33.0GB text_tokens: 31831.0 tgs: 60 data_time: 0.89s time: 522.70s eta: 3 days, 6:15:36
366
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:17:54][WARNING] [Step 55] The grad norm is NaN or Inf, skip this step. Skipped 56 steps in total.
367
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:17:54][INFO] [Train] (Epoch 1) Step 56/593 lr: 0.000020 loss: 0.274 loss(reduced): nan grad_norm: nan if_nan_skip: 56 max_memory: 33.1GB text_tokens: 32054.0 tgs: 61 data_time: 0.94s time: 520.54s eta: 3 days, 5:47:28
368
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:26:34][WARNING] [Step 56] The grad norm is NaN or Inf, skip this step. Skipped 57 steps in total.
369
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:26:34][INFO] [Train] (Epoch 1) Step 57/593 lr: 0.000020 loss: 0.343 loss(reduced): nan grad_norm: nan if_nan_skip: 57 max_memory: 32.7GB text_tokens: 31492.0 tgs: 60 data_time: 0.83s time: 519.67s eta: 3 days, 5:31:00
370
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:35:15][WARNING] [Step 57] The grad norm is NaN or Inf, skip this step. Skipped 58 steps in total.
371
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:35:15][INFO] [Train] (Epoch 1) Step 58/593 lr: 0.000020 loss: 0.222 loss(reduced): nan grad_norm: nan if_nan_skip: 58 max_memory: 32.7GB text_tokens: 31059.0 tgs: 59 data_time: 0.69s time: 521.28s eta: 3 days, 5:36:47
372
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:43:58][WARNING] [Step 58] The grad norm is NaN or Inf, skip this step. Skipped 59 steps in total.
373
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:43:58][INFO] [Train] (Epoch 1) Step 59/593 lr: 0.000020 loss: 0.218 loss(reduced): nan grad_norm: nan if_nan_skip: 59 max_memory: 32.7GB text_tokens: 30232.0 tgs: 57 data_time: 0.77s time: 523.60s eta: 3 days, 5:48:45
374
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:52:38][WARNING] [Step 59] The grad norm is NaN or Inf, skip this step. Skipped 60 steps in total.
375
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 08:52:38][INFO] [Train] (Epoch 1) Step 60/593 lr: 0.000020 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 60 max_memory: 32.8GB text_tokens: 31767.0 tgs: 61 data_time: 0.79s time: 519.33s eta: 3 days, 5:02:03
376
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:01:18][WARNING] [Step 60] The grad norm is NaN or Inf, skip this step. Skipped 61 steps in total.
377
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:01:18][INFO] [Train] (Epoch 1) Step 61/593 lr: 0.000020 loss: 0.248 loss(reduced): nan grad_norm: nan if_nan_skip: 61 max_memory: 32.9GB text_tokens: 32058.0 tgs: 61 data_time: 0.58s time: 520.63s eta: 3 days, 5:04:54
378
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:10:00][WARNING] [Step 61] The grad norm is NaN or Inf, skip this step. Skipped 62 steps in total.
379
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:10:00][INFO] [Train] (Epoch 1) Step 62/593 lr: 0.000020 loss: 0.331 loss(reduced): nan grad_norm: nan if_nan_skip: 62 max_memory: 33.1GB text_tokens: 32164.0 tgs: 61 data_time: 0.65s time: 521.46s eta: 3 days, 5:03:36
380
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:18:44][WARNING] [Step 62] The grad norm is NaN or Inf, skip this step. Skipped 63 steps in total.
381
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:18:44][INFO] [Train] (Epoch 1) Step 63/593 lr: 0.000020 loss: 0.229 loss(reduced): nan grad_norm: nan if_nan_skip: 63 max_memory: 32.8GB text_tokens: 31753.0 tgs: 60 data_time: 0.43s time: 523.95s eta: 3 days, 5:16:59
382
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:27:23][WARNING] [Step 63] The grad norm is NaN or Inf, skip this step. Skipped 64 steps in total.
383
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:27:23][INFO] [Train] (Epoch 1) Step 64/593 lr: 0.000020 loss: 0.233 loss(reduced): nan grad_norm: nan if_nan_skip: 64 max_memory: 32.8GB text_tokens: 31116.0 tgs: 59 data_time: 0.67s time: 518.86s eta: 3 days, 4:23:16
384
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:36:02][WARNING] [Step 64] The grad norm is NaN or Inf, skip this step. Skipped 65 steps in total.
385
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:36:02][INFO] [Train] (Epoch 1) Step 65/593 lr: 0.000020 loss: 0.285 loss(reduced): nan grad_norm: nan if_nan_skip: 65 max_memory: 33.0GB text_tokens: 31233.0 tgs: 60 data_time: 0.68s time: 519.23s eta: 3 days, 4:17:54
386
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:44:43][WARNING] [Step 65] The grad norm is NaN or Inf, skip this step. Skipped 66 steps in total.
387
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:44:43][INFO] [Train] (Epoch 1) Step 66/593 lr: 0.000020 loss: 0.270 loss(reduced): nan grad_norm: nan if_nan_skip: 66 max_memory: 33.0GB text_tokens: 32074.0 tgs: 61 data_time: 0.71s time: 520.72s eta: 3 days, 4:22:22
388
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:53:27][WARNING] [Step 66] The grad norm is NaN or Inf, skip this step. Skipped 67 steps in total.
389
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 09:53:27][INFO] [Train] (Epoch 1) Step 67/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 67 max_memory: 33.0GB text_tokens: 31823.0 tgs: 60 data_time: 0.79s time: 524.26s eta: 3 days, 4:44:45
390
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:02:07][WARNING] [Step 67] The grad norm is NaN or Inf, skip this step. Skipped 68 steps in total.
391
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:02:07][INFO] [Train] (Epoch 1) Step 68/593 lr: 0.000020 loss: 0.307 loss(reduced): nan grad_norm: nan if_nan_skip: 68 max_memory: 33.1GB text_tokens: 32008.0 tgs: 61 data_time: 0.71s time: 520.28s eta: 3 days, 4:01:07
392
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:10:47][WARNING] [Step 68] The grad norm is NaN or Inf, skip this step. Skipped 69 steps in total.
393
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:10:47][INFO] [Train] (Epoch 1) Step 69/593 lr: 0.000020 loss: 0.238 loss(reduced): nan grad_norm: nan if_nan_skip: 69 max_memory: 33.0GB text_tokens: 32207.0 tgs: 61 data_time: 0.73s time: 519.50s eta: 3 days, 3:45:39
394
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:19:29][WARNING] [Step 69] The grad norm is NaN or Inf, skip this step. Skipped 70 steps in total.
395
+ [XTuner][RANK 30][DP 7][SP 2][TP 0][2025-01-21 10:19:29][INFO] [Train] (Epoch 1) Step 70/593 lr: 0.000020 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 70 max_memory: 32.7GB text_tokens: 31813.0 tgs: 60 data_time: 0.65s time: 522.33s eta: 3 days, 4:01:39
20250120235238/rank37.log ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:52:42][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250120235238', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:52:42][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:53:37][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:54:30][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:55:25][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:56:18][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:57:14][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:58:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-20 23:59:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:00:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:00:05][INFO] [Dataset & Dataloader] Cost 443.11s
12
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:10:23][SUCCESS] [Parallelize LLM] Elapsed time 147.29 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:10:24][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:19:46][WARNING] [Step 0] The grad norm is NaN or Inf, skip this step. Skipped 1 steps in total.
257
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:19:46][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.296 loss(reduced): nan grad_norm: nan if_nan_skip: 1 max_memory: 33.0GB text_tokens: 32281.0 tgs: 58 data_time: 2.27s time: 547.90s eta: 3 days, 18:15:04
258
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:28:29][WARNING] [Step 1] The grad norm is NaN or Inf, skip this step. Skipped 2 steps in total.
259
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:28:29][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.259 loss(reduced): nan grad_norm: nan if_nan_skip: 2 max_memory: 33.1GB text_tokens: 32206.0 tgs: 61 data_time: 0.74s time: 523.21s eta: 3 days, 14:02:22
260
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:37:12][WARNING] [Step 2] The grad norm is NaN or Inf, skip this step. Skipped 3 steps in total.
261
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:37:12][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.270 loss(reduced): nan grad_norm: nan if_nan_skip: 3 max_memory: 33.1GB text_tokens: 32213.0 tgs: 61 data_time: 1.05s time: 522.91s eta: 3 days, 13:50:39
262
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:45:52][WARNING] [Step 3] The grad norm is NaN or Inf, skip this step. Skipped 4 steps in total.
263
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:45:52][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.239 loss(reduced): nan grad_norm: nan if_nan_skip: 4 max_memory: 32.1GB text_tokens: 29331.0 tgs: 56 data_time: 0.93s time: 520.28s eta: 3 days, 13:16:07
264
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:54:33][WARNING] [Step 4] The grad norm is NaN or Inf, skip this step. Skipped 5 steps in total.
265
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 00:54:33][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.283 loss(reduced): nan grad_norm: nan if_nan_skip: 5 max_memory: 33.1GB text_tokens: 31870.0 tgs: 61 data_time: 0.88s time: 520.98s eta: 3 days, 13:14:15
266
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:03:14][WARNING] [Step 5] The grad norm is NaN or Inf, skip this step. Skipped 6 steps in total.
267
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:03:14][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.262 loss(reduced): nan grad_norm: nan if_nan_skip: 6 max_memory: 32.4GB text_tokens: 29472.0 tgs: 56 data_time: 0.70s time: 520.89s eta: 3 days, 13:04:45
268
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:11:58][WARNING] [Step 6] The grad norm is NaN or Inf, skip this step. Skipped 7 steps in total.
269
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:11:58][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.205 loss(reduced): nan grad_norm: nan if_nan_skip: 7 max_memory: 33.0GB text_tokens: 32120.0 tgs: 61 data_time: 0.71s time: 523.38s eta: 3 days, 13:20:23
270
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:20:38][WARNING] [Step 7] The grad norm is NaN or Inf, skip this step. Skipped 8 steps in total.
271
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:20:38][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.338 loss(reduced): nan grad_norm: nan if_nan_skip: 8 max_memory: 33.0GB text_tokens: 30854.0 tgs: 59 data_time: 0.62s time: 520.66s eta: 3 days, 12:45:04
272
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:29:18][WARNING] [Step 8] The grad norm is NaN or Inf, skip this step. Skipped 9 steps in total.
273
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:29:18][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.276 loss(reduced): nan grad_norm: nan if_nan_skip: 9 max_memory: 32.8GB text_tokens: 31900.0 tgs: 61 data_time: 0.72s time: 520.17s eta: 3 days, 12:31:40
274
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:37:59][WARNING] [Step 9] The grad norm is NaN or Inf, skip this step. Skipped 10 steps in total.
275
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:37:59][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.249 loss(reduced): nan grad_norm: nan if_nan_skip: 10 max_memory: 33.0GB text_tokens: 32109.0 tgs: 61 data_time: 0.74s time: 520.43s eta: 3 days, 12:25:30
276
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:46:43][WARNING] [Step 10] The grad norm is NaN or Inf, skip this step. Skipped 11 steps in total.
277
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:46:43][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.225 loss(reduced): nan grad_norm: nan if_nan_skip: 11 max_memory: 33.1GB text_tokens: 31161.0 tgs: 59 data_time: 0.63s time: 524.53s eta: 3 days, 12:56:39
278
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:55:24][WARNING] [Step 11] The grad norm is NaN or Inf, skip this step. Skipped 12 steps in total.
279
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 01:55:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 12 max_memory: 32.7GB text_tokens: 31536.0 tgs: 60 data_time: 0.76s time: 520.64s eta: 3 days, 12:10:14
280
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:04:04][WARNING] [Step 12] The grad norm is NaN or Inf, skip this step. Skipped 13 steps in total.
281
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:04:04][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.314 loss(reduced): nan grad_norm: nan if_nan_skip: 13 max_memory: 32.9GB text_tokens: 32004.0 tgs: 61 data_time: 0.76s time: 519.96s eta: 3 days, 11:54:58
282
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:12:45][WARNING] [Step 13] The grad norm is NaN or Inf, skip this step. Skipped 14 steps in total.
283
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:12:45][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.302 loss(reduced): nan grad_norm: nan if_nan_skip: 14 max_memory: 33.1GB text_tokens: 32359.0 tgs: 62 data_time: 0.74s time: 521.28s eta: 3 days, 11:59:03
284
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:21:29][WARNING] [Step 14] The grad norm is NaN or Inf, skip this step. Skipped 15 steps in total.
285
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:21:29][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.243 loss(reduced): nan grad_norm: nan if_nan_skip: 15 max_memory: 33.0GB text_tokens: 32408.0 tgs: 61 data_time: 0.97s time: 524.12s eta: 3 days, 12:17:45
286
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:30:10][WARNING] [Step 15] The grad norm is NaN or Inf, skip this step. Skipped 16 steps in total.
287
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:30:10][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.229 loss(reduced): nan grad_norm: nan if_nan_skip: 16 max_memory: 33.1GB text_tokens: 31413.0 tgs: 60 data_time: 0.92s time: 520.53s eta: 3 days, 11:34:27
288
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:38:49][WARNING] [Step 16] The grad norm is NaN or Inf, skip this step. Skipped 17 steps in total.
289
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:38:49][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.288 loss(reduced): nan grad_norm: nan if_nan_skip: 17 max_memory: 33.1GB text_tokens: 31857.0 tgs: 61 data_time: 0.76s time: 518.85s eta: 3 days, 11:09:33
290
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:47:31][WARNING] [Step 17] The grad norm is NaN or Inf, skip this step. Skipped 18 steps in total.
291
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:47:31][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.267 loss(reduced): nan grad_norm: nan if_nan_skip: 18 max_memory: 32.9GB text_tokens: 31570.0 tgs: 60 data_time: 0.71s time: 522.10s eta: 3 days, 11:32:09
292
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:56:15][WARNING] [Step 18] The grad norm is NaN or Inf, skip this step. Skipped 19 steps in total.
293
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 02:56:15][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.238 loss(reduced): nan grad_norm: nan if_nan_skip: 19 max_memory: 33.1GB text_tokens: 31566.0 tgs: 60 data_time: 0.64s time: 523.82s eta: 3 days, 11:39:58
294
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:04:55][WARNING] [Step 19] The grad norm is NaN or Inf, skip this step. Skipped 20 steps in total.
295
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:04:55][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 20 max_memory: 33.1GB text_tokens: 32203.0 tgs: 61 data_time: 0.78s time: 520.47s eta: 3 days, 10:59:09
296
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:13:34][WARNING] [Step 20] The grad norm is NaN or Inf, skip this step. Skipped 21 steps in total.
297
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:13:34][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.280 loss(reduced): nan grad_norm: nan if_nan_skip: 21 max_memory: 33.0GB text_tokens: 30573.0 tgs: 58 data_time: 0.63s time: 518.43s eta: 3 days, 10:30:58
298
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:22:16][WARNING] [Step 21] The grad norm is NaN or Inf, skip this step. Skipped 22 steps in total.
299
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:22:16][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.225 loss(reduced): nan grad_norm: nan if_nan_skip: 22 max_memory: 32.5GB text_tokens: 30164.0 tgs: 57 data_time: 0.67s time: 522.83s eta: 3 days, 11:04:20
300
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:31:00][WARNING] [Step 22] The grad norm is NaN or Inf, skip this step. Skipped 23 steps in total.
301
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:31:00][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.190 loss(reduced): nan grad_norm: nan if_nan_skip: 23 max_memory: 33.1GB text_tokens: 31721.0 tgs: 60 data_time: 0.76s time: 523.53s eta: 3 days, 11:02:17
302
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:39:41][WARNING] [Step 23] The grad norm is NaN or Inf, skip this step. Skipped 24 steps in total.
303
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:39:41][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.214 loss(reduced): nan grad_norm: nan if_nan_skip: 24 max_memory: 32.7GB text_tokens: 31235.0 tgs: 59 data_time: 0.75s time: 520.97s eta: 3 days, 10:29:10
304
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:48:20][WARNING] [Step 24] The grad norm is NaN or Inf, skip this step. Skipped 25 steps in total.
305
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:48:20][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.254 loss(reduced): nan grad_norm: nan if_nan_skip: 25 max_memory: 32.9GB text_tokens: 32083.0 tgs: 61 data_time: 0.67s time: 519.10s eta: 3 days, 10:02:45
306
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:57:03][WARNING] [Step 25] The grad norm is NaN or Inf, skip this step. Skipped 26 steps in total.
307
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 03:57:03][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.414 loss(reduced): nan grad_norm: nan if_nan_skip: 26 max_memory: 32.7GB text_tokens: 31111.0 tgs: 59 data_time: 0.88s time: 523.29s eta: 3 days, 10:33:46
308
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:05:46][WARNING] [Step 26] The grad norm is NaN or Inf, skip this step. Skipped 27 steps in total.
309
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:05:46][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.299 loss(reduced): nan grad_norm: nan if_nan_skip: 27 max_memory: 33.0GB text_tokens: 31722.0 tgs: 60 data_time: 0.81s time: 522.95s eta: 3 days, 10:21:54
310
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:14:28][WARNING] [Step 27] The grad norm is NaN or Inf, skip this step. Skipped 28 steps in total.
311
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:14:28][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.301 loss(reduced): nan grad_norm: nan if_nan_skip: 28 max_memory: 32.8GB text_tokens: 30903.0 tgs: 59 data_time: 0.76s time: 521.31s eta: 3 days, 9:57:40
312
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:23:08][WARNING] [Step 28] The grad norm is NaN or Inf, skip this step. Skipped 29 steps in total.
313
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:23:08][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.314 loss(reduced): nan grad_norm: nan if_nan_skip: 29 max_memory: 33.1GB text_tokens: 31566.0 tgs: 60 data_time: 0.99s time: 520.63s eta: 3 days, 9:42:36
314
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:31:51][WARNING] [Step 29] The grad norm is NaN or Inf, skip this step. Skipped 30 steps in total.
315
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:31:51][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.245 loss(reduced): nan grad_norm: nan if_nan_skip: 30 max_memory: 33.1GB text_tokens: 28740.0 tgs: 55 data_time: 0.78s time: 522.43s eta: 3 days, 9:50:50
316
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:40:34][WARNING] [Step 30] The grad norm is NaN or Inf, skip this step. Skipped 31 steps in total.
317
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:40:34][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.350 loss(reduced): nan grad_norm: nan if_nan_skip: 31 max_memory: 33.0GB text_tokens: 31943.0 tgs: 60 data_time: 0.60s time: 523.75s eta: 3 days, 9:54:32
318
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:49:15][WARNING] [Step 31] The grad norm is NaN or Inf, skip this step. Skipped 32 steps in total.
319
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:49:15][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.309 loss(reduced): nan grad_norm: nan if_nan_skip: 32 max_memory: 33.1GB text_tokens: 31226.0 tgs: 60 data_time: 0.81s time: 520.42s eta: 3 days, 9:14:36
320
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:57:55][WARNING] [Step 32] The grad norm is NaN or Inf, skip this step. Skipped 33 steps in total.
321
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 04:57:55][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.252 loss(reduced): nan grad_norm: nan if_nan_skip: 33 max_memory: 33.0GB text_tokens: 31930.0 tgs: 61 data_time: 0.64s time: 520.67s eta: 3 days, 9:08:16
322
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:06:37][WARNING] [Step 33] The grad norm is NaN or Inf, skip this step. Skipped 34 steps in total.
323
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:06:37][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.240 loss(reduced): nan grad_norm: nan if_nan_skip: 34 max_memory: 33.1GB text_tokens: 31763.0 tgs: 60 data_time: 0.88s time: 521.30s eta: 3 days, 9:05:30
324
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:15:21][WARNING] [Step 34] The grad norm is NaN or Inf, skip this step. Skipped 35 steps in total.
325
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:15:21][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 35 max_memory: 33.0GB text_tokens: 32155.0 tgs: 61 data_time: 0.80s time: 524.10s eta: 3 days, 9:22:54
326
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:24:02][WARNING] [Step 35] The grad norm is NaN or Inf, skip this step. Skipped 36 steps in total.
327
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:24:02][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.314 loss(reduced): nan grad_norm: nan if_nan_skip: 36 max_memory: 33.0GB text_tokens: 31819.0 tgs: 61 data_time: 0.87s time: 520.95s eta: 3 days, 8:44:51
328
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:32:42][WARNING] [Step 36] The grad norm is NaN or Inf, skip this step. Skipped 37 steps in total.
329
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:32:42][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.255 loss(reduced): nan grad_norm: nan if_nan_skip: 37 max_memory: 33.1GB text_tokens: 32555.0 tgs: 62 data_time: 0.62s time: 520.17s eta: 3 days, 8:28:52
330
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:41:25][WARNING] [Step 37] The grad norm is NaN or Inf, skip this step. Skipped 38 steps in total.
331
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:41:25][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 38 max_memory: 33.0GB text_tokens: 31507.0 tgs: 60 data_time: 0.73s time: 522.56s eta: 3 days, 8:42:21
332
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:50:09][WARNING] [Step 38] The grad norm is NaN or Inf, skip this step. Skipped 39 steps in total.
333
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:50:09][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 39 max_memory: 33.1GB text_tokens: 32485.0 tgs: 61 data_time: 0.85s time: 524.33s eta: 3 days, 8:50:00
334
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:58:49][WARNING] [Step 39] The grad norm is NaN or Inf, skip this step. Skipped 40 steps in total.
335
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 05:58:49][INFO] [Train] (Epoch 1) Step 40/593 lr: 0.000020 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 40 max_memory: 33.1GB text_tokens: 32286.0 tgs: 62 data_time: 0.90s time: 519.88s eta: 3 days, 8:00:12
336
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:07:29][WARNING] [Step 40] The grad norm is NaN or Inf, skip this step. Skipped 41 steps in total.
337
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:07:29][INFO] [Train] (Epoch 1) Step 41/593 lr: 0.000020 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 41 max_memory: 33.1GB text_tokens: 32270.0 tgs: 62 data_time: 0.74s time: 520.13s eta: 3 days, 7:53:51
338
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:16:11][WARNING] [Step 41] The grad norm is NaN or Inf, skip this step. Skipped 42 steps in total.
339
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:16:11][INFO] [Train] (Epoch 1) Step 42/593 lr: 0.000020 loss: 0.581 loss(reduced): nan grad_norm: nan if_nan_skip: 42 max_memory: 32.9GB text_tokens: 31886.0 tgs: 61 data_time: 0.92s time: 521.99s eta: 3 days, 8:02:19
340
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:24:55][WARNING] [Step 42] The grad norm is NaN or Inf, skip this step. Skipped 43 steps in total.
341
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:24:55][INFO] [Train] (Epoch 1) Step 43/593 lr: 0.000020 loss: 0.286 loss(reduced): nan grad_norm: nan if_nan_skip: 43 max_memory: 32.6GB text_tokens: 31139.0 tgs: 59 data_time: 0.73s time: 524.35s eta: 3 days, 8:15:14
342
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:33:36][WARNING] [Step 43] The grad norm is NaN or Inf, skip this step. Skipped 44 steps in total.
343
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:33:36][INFO] [Train] (Epoch 1) Step 44/593 lr: 0.000020 loss: 0.262 loss(reduced): nan grad_norm: nan if_nan_skip: 44 max_memory: 33.0GB text_tokens: 31827.0 tgs: 61 data_time: 0.66s time: 520.52s eta: 3 days, 7:31:28
344
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:42:14][WARNING] [Step 44] The grad norm is NaN or Inf, skip this step. Skipped 45 steps in total.
345
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:42:14][INFO] [Train] (Epoch 1) Step 45/593 lr: 0.000020 loss: 0.287 loss(reduced): nan grad_norm: nan if_nan_skip: 45 max_memory: 33.0GB text_tokens: 32014.0 tgs: 61 data_time: 0.84s time: 518.56s eta: 3 days, 7:04:48
346
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:50:57][WARNING] [Step 45] The grad norm is NaN or Inf, skip this step. Skipped 46 steps in total.
347
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:50:57][INFO] [Train] (Epoch 1) Step 46/593 lr: 0.000020 loss: 0.291 loss(reduced): nan grad_norm: nan if_nan_skip: 46 max_memory: 33.0GB text_tokens: 31950.0 tgs: 61 data_time: 0.97s time: 523.15s eta: 3 days, 7:38:04
348
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:59:41][WARNING] [Step 46] The grad norm is NaN or Inf, skip this step. Skipped 47 steps in total.
349
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 06:59:41][INFO] [Train] (Epoch 1) Step 47/593 lr: 0.000020 loss: 0.259 loss(reduced): nan grad_norm: nan if_nan_skip: 47 max_memory: 32.6GB text_tokens: 31159.0 tgs: 59 data_time: 0.58s time: 523.62s eta: 3 days, 7:33:38
350
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:08:21][WARNING] [Step 47] The grad norm is NaN or Inf, skip this step. Skipped 48 steps in total.
351
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:08:21][INFO] [Train] (Epoch 1) Step 48/593 lr: 0.000020 loss: 0.309 loss(reduced): nan grad_norm: nan if_nan_skip: 48 max_memory: 33.0GB text_tokens: 31876.0 tgs: 61 data_time: 0.93s time: 520.18s eta: 3 days, 6:53:40
352
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:17:01][WARNING] [Step 48] The grad norm is NaN or Inf, skip this step. Skipped 49 steps in total.
353
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:17:01][INFO] [Train] (Epoch 1) Step 49/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 49 max_memory: 33.1GB text_tokens: 31681.0 tgs: 60 data_time: 0.96s time: 520.05s eta: 3 days, 6:43:46
354
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:25:45][WARNING] [Step 49] The grad norm is NaN or Inf, skip this step. Skipped 50 steps in total.
355
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:25:45][INFO] [Train] (Epoch 1) Step 50/593 lr: 0.000020 loss: 0.317 loss(reduced): nan grad_norm: nan if_nan_skip: 50 max_memory: 33.0GB text_tokens: 31594.0 tgs: 60 data_time: 1.15s time: 523.75s eta: 3 days, 7:08:40
356
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:34:29][WARNING] [Step 50] The grad norm is NaN or Inf, skip this step. Skipped 51 steps in total.
357
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:34:29][INFO] [Train] (Epoch 1) Step 51/593 lr: 0.000020 loss: 0.239 loss(reduced): nan grad_norm: nan if_nan_skip: 51 max_memory: 32.9GB text_tokens: 31032.0 tgs: 59 data_time: 1.03s time: 523.80s eta: 3 days, 7:00:25
358
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:43:09][WARNING] [Step 51] The grad norm is NaN or Inf, skip this step. Skipped 52 steps in total.
359
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:43:09][INFO] [Train] (Epoch 1) Step 52/593 lr: 0.000020 loss: 0.294 loss(reduced): nan grad_norm: nan if_nan_skip: 52 max_memory: 33.0GB text_tokens: 31014.0 tgs: 59 data_time: 0.94s time: 520.23s eta: 3 days, 6:19:26
360
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:51:48][WARNING] [Step 52] The grad norm is NaN or Inf, skip this step. Skipped 53 steps in total.
361
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 07:51:48][INFO] [Train] (Epoch 1) Step 53/593 lr: 0.000020 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 53 max_memory: 33.1GB text_tokens: 32168.0 tgs: 62 data_time: 0.94s time: 518.77s eta: 3 days, 5:57:34
362
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:00:31][WARNING] [Step 53] The grad norm is NaN or Inf, skip this step. Skipped 54 steps in total.
363
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:00:31][INFO] [Train] (Epoch 1) Step 54/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 54 max_memory: 32.9GB text_tokens: 31813.0 tgs: 60 data_time: 0.72s time: 522.68s eta: 3 days, 6:24:07
364
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:09:13][WARNING] [Step 54] The grad norm is NaN or Inf, skip this step. Skipped 55 steps in total.
365
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:09:13][INFO] [Train] (Epoch 1) Step 55/593 lr: 0.000020 loss: 0.241 loss(reduced): nan grad_norm: nan if_nan_skip: 55 max_memory: 33.0GB text_tokens: 32389.0 tgs: 61 data_time: 0.90s time: 522.73s eta: 3 days, 6:15:52
366
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:17:54][WARNING] [Step 55] The grad norm is NaN or Inf, skip this step. Skipped 56 steps in total.
367
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:17:54][INFO] [Train] (Epoch 1) Step 56/593 lr: 0.000020 loss: 0.289 loss(reduced): nan grad_norm: nan if_nan_skip: 56 max_memory: 32.8GB text_tokens: 31439.0 tgs: 60 data_time: 0.67s time: 520.52s eta: 3 days, 5:47:21
368
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:26:33][WARNING] [Step 56] The grad norm is NaN or Inf, skip this step. Skipped 57 steps in total.
369
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:26:33][INFO] [Train] (Epoch 1) Step 57/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 57 max_memory: 33.1GB text_tokens: 31610.0 tgs: 60 data_time: 0.56s time: 519.65s eta: 3 days, 5:30:51
370
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:35:15][WARNING] [Step 57] The grad norm is NaN or Inf, skip this step. Skipped 58 steps in total.
371
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:35:15][INFO] [Train] (Epoch 1) Step 58/593 lr: 0.000020 loss: 0.258 loss(reduced): nan grad_norm: nan if_nan_skip: 58 max_memory: 33.1GB text_tokens: 31493.0 tgs: 60 data_time: 0.85s time: 521.32s eta: 3 days, 5:37:06
372
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:43:58][WARNING] [Step 58] The grad norm is NaN or Inf, skip this step. Skipped 59 steps in total.
373
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:43:58][INFO] [Train] (Epoch 1) Step 59/593 lr: 0.000020 loss: 0.228 loss(reduced): nan grad_norm: nan if_nan_skip: 59 max_memory: 33.1GB text_tokens: 31578.0 tgs: 60 data_time: 0.78s time: 523.58s eta: 3 days, 5:48:37
374
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:52:38][WARNING] [Step 59] The grad norm is NaN or Inf, skip this step. Skipped 60 steps in total.
375
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 08:52:38][INFO] [Train] (Epoch 1) Step 60/593 lr: 0.000020 loss: 0.280 loss(reduced): nan grad_norm: nan if_nan_skip: 60 max_memory: 33.1GB text_tokens: 32478.0 tgs: 62 data_time: 1.00s time: 519.32s eta: 3 days, 5:01:55
376
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:01:18][WARNING] [Step 60] The grad norm is NaN or Inf, skip this step. Skipped 61 steps in total.
377
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:01:18][INFO] [Train] (Epoch 1) Step 61/593 lr: 0.000020 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 61 max_memory: 32.8GB text_tokens: 29943.0 tgs: 57 data_time: 0.81s time: 520.66s eta: 3 days, 5:05:12
378
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:10:00][WARNING] [Step 61] The grad norm is NaN or Inf, skip this step. Skipped 62 steps in total.
379
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:10:00][INFO] [Train] (Epoch 1) Step 62/593 lr: 0.000020 loss: 0.302 loss(reduced): nan grad_norm: nan if_nan_skip: 62 max_memory: 33.0GB text_tokens: 31038.0 tgs: 59 data_time: 0.55s time: 521.45s eta: 3 days, 5:03:29
380
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:18:44][WARNING] [Step 62] The grad norm is NaN or Inf, skip this step. Skipped 63 steps in total.
381
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:18:44][INFO] [Train] (Epoch 1) Step 63/593 lr: 0.000020 loss: 0.254 loss(reduced): nan grad_norm: nan if_nan_skip: 63 max_memory: 32.8GB text_tokens: 31043.0 tgs: 59 data_time: 0.80s time: 523.94s eta: 3 days, 5:16:51
382
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:27:23][WARNING] [Step 63] The grad norm is NaN or Inf, skip this step. Skipped 64 steps in total.
383
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:27:23][INFO] [Train] (Epoch 1) Step 64/593 lr: 0.000020 loss: 0.271 loss(reduced): nan grad_norm: nan if_nan_skip: 64 max_memory: 33.0GB text_tokens: 32383.0 tgs: 62 data_time: 0.85s time: 518.84s eta: 3 days, 4:23:07
384
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:36:02][WARNING] [Step 64] The grad norm is NaN or Inf, skip this step. Skipped 65 steps in total.
385
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:36:02][INFO] [Train] (Epoch 1) Step 65/593 lr: 0.000020 loss: 0.276 loss(reduced): nan grad_norm: nan if_nan_skip: 65 max_memory: 33.0GB text_tokens: 31991.0 tgs: 61 data_time: 0.67s time: 519.27s eta: 3 days, 4:18:11
386
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:44:43][WARNING] [Step 65] The grad norm is NaN or Inf, skip this step. Skipped 66 steps in total.
387
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:44:43][INFO] [Train] (Epoch 1) Step 66/593 lr: 0.000020 loss: 0.262 loss(reduced): nan grad_norm: nan if_nan_skip: 66 max_memory: 32.5GB text_tokens: 30412.0 tgs: 58 data_time: 0.81s time: 520.71s eta: 3 days, 4:22:14
388
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:53:27][WARNING] [Step 66] The grad norm is NaN or Inf, skip this step. Skipped 67 steps in total.
389
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 09:53:27][INFO] [Train] (Epoch 1) Step 67/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 67 max_memory: 33.1GB text_tokens: 31377.0 tgs: 59 data_time: 1.01s time: 524.25s eta: 3 days, 4:44:37
390
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:02:07][WARNING] [Step 67] The grad norm is NaN or Inf, skip this step. Skipped 68 steps in total.
391
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:02:07][INFO] [Train] (Epoch 1) Step 68/593 lr: 0.000020 loss: 0.246 loss(reduced): nan grad_norm: nan if_nan_skip: 68 max_memory: 32.9GB text_tokens: 31712.0 tgs: 60 data_time: 0.74s time: 520.32s eta: 3 days, 4:01:29
392
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:10:47][WARNING] [Step 68] The grad norm is NaN or Inf, skip this step. Skipped 69 steps in total.
393
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:10:47][INFO] [Train] (Epoch 1) Step 69/593 lr: 0.000020 loss: 0.304 loss(reduced): nan grad_norm: nan if_nan_skip: 69 max_memory: 33.1GB text_tokens: 32481.0 tgs: 62 data_time: 0.96s time: 519.49s eta: 3 days, 3:45:31
394
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:19:29][WARNING] [Step 69] The grad norm is NaN or Inf, skip this step. Skipped 70 steps in total.
395
+ [XTuner][RANK 37][DP 9][SP 1][TP 0][2025-01-21 10:19:29][INFO] [Train] (Epoch 1) Step 70/593 lr: 0.000020 loss: 0.270 loss(reduced): nan grad_norm: nan if_nan_skip: 70 max_memory: 33.1GB text_tokens: 31743.0 tgs: 60 data_time: 0.98s time: 522.31s eta: 3 days, 4:01:31
20250120235238/rank50.log ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:52:42][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250120235238', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:52:42][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:53:37][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:54:31][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:55:25][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:56:18][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:57:14][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:58:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-20 23:59:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:00:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:00:05][INFO] [Dataset & Dataloader] Cost 443.11s
12
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:10:23][SUCCESS] [Parallelize LLM] Elapsed time 147.33 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:10:24][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:19:46][WARNING] [Step 0] The grad norm is NaN or Inf, skip this step. Skipped 1 steps in total.
257
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:19:46][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 1 max_memory: 33.1GB text_tokens: 31357.0 tgs: 57 data_time: 1.94s time: 548.09s eta: 3 days, 18:16:58
258
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:28:29][WARNING] [Step 1] The grad norm is NaN or Inf, skip this step. Skipped 2 steps in total.
259
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:28:29][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 2 max_memory: 33.0GB text_tokens: 32248.0 tgs: 61 data_time: 0.88s time: 523.23s eta: 3 days, 14:02:31
260
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:37:12][WARNING] [Step 2] The grad norm is NaN or Inf, skip this step. Skipped 3 steps in total.
261
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:37:12][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.284 loss(reduced): nan grad_norm: nan if_nan_skip: 3 max_memory: 32.8GB text_tokens: 31853.0 tgs: 60 data_time: 1.04s time: 522.87s eta: 3 days, 13:50:17
262
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:45:52][WARNING] [Step 3] The grad norm is NaN or Inf, skip this step. Skipped 4 steps in total.
263
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:45:52][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.298 loss(reduced): nan grad_norm: nan if_nan_skip: 4 max_memory: 33.0GB text_tokens: 32083.0 tgs: 61 data_time: 0.87s time: 520.29s eta: 3 days, 13:16:12
264
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:54:33][WARNING] [Step 4] The grad norm is NaN or Inf, skip this step. Skipped 5 steps in total.
265
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 00:54:33][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.274 loss(reduced): nan grad_norm: nan if_nan_skip: 5 max_memory: 32.9GB text_tokens: 32125.0 tgs: 61 data_time: 0.99s time: 520.98s eta: 3 days, 13:14:19
266
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:03:14][WARNING] [Step 5] The grad norm is NaN or Inf, skip this step. Skipped 6 steps in total.
267
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:03:14][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 6 max_memory: 32.9GB text_tokens: 31521.0 tgs: 60 data_time: 0.78s time: 520.89s eta: 3 days, 13:04:45
268
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:11:58][WARNING] [Step 6] The grad norm is NaN or Inf, skip this step. Skipped 7 steps in total.
269
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:11:58][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.295 loss(reduced): nan grad_norm: nan if_nan_skip: 7 max_memory: 33.1GB text_tokens: 31512.0 tgs: 60 data_time: 0.88s time: 523.39s eta: 3 days, 13:20:27
270
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:20:38][WARNING] [Step 7] The grad norm is NaN or Inf, skip this step. Skipped 8 steps in total.
271
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:20:38][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.246 loss(reduced): nan grad_norm: nan if_nan_skip: 8 max_memory: 33.1GB text_tokens: 31822.0 tgs: 61 data_time: 1.00s time: 520.66s eta: 3 days, 12:45:08
272
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:29:18][WARNING] [Step 8] The grad norm is NaN or Inf, skip this step. Skipped 9 steps in total.
273
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:29:18][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.310 loss(reduced): nan grad_norm: nan if_nan_skip: 9 max_memory: 33.1GB text_tokens: 31908.0 tgs: 61 data_time: 0.80s time: 520.18s eta: 3 days, 12:31:45
274
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:37:59][WARNING] [Step 9] The grad norm is NaN or Inf, skip this step. Skipped 10 steps in total.
275
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:37:59][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.266 loss(reduced): nan grad_norm: nan if_nan_skip: 10 max_memory: 33.1GB text_tokens: 32433.0 tgs: 62 data_time: 0.83s time: 520.40s eta: 3 days, 12:25:13
276
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:46:43][WARNING] [Step 10] The grad norm is NaN or Inf, skip this step. Skipped 11 steps in total.
277
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:46:43][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 11 max_memory: 33.1GB text_tokens: 32333.0 tgs: 61 data_time: 0.75s time: 524.54s eta: 3 days, 12:56:44
278
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:55:24][WARNING] [Step 11] The grad norm is NaN or Inf, skip this step. Skipped 12 steps in total.
279
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 01:55:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.218 loss(reduced): nan grad_norm: nan if_nan_skip: 12 max_memory: 33.0GB text_tokens: 30295.0 tgs: 58 data_time: 0.98s time: 520.65s eta: 3 days, 12:10:19
280
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:04:04][WARNING] [Step 12] The grad norm is NaN or Inf, skip this step. Skipped 13 steps in total.
281
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:04:04][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.283 loss(reduced): nan grad_norm: nan if_nan_skip: 13 max_memory: 33.1GB text_tokens: 31362.0 tgs: 60 data_time: 0.80s time: 519.96s eta: 3 days, 11:54:56
282
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:12:45][WARNING] [Step 13] The grad norm is NaN or Inf, skip this step. Skipped 14 steps in total.
283
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:12:45][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.343 loss(reduced): nan grad_norm: nan if_nan_skip: 14 max_memory: 33.1GB text_tokens: 32181.0 tgs: 61 data_time: 0.88s time: 521.29s eta: 3 days, 11:59:08
284
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:21:29][WARNING] [Step 14] The grad norm is NaN or Inf, skip this step. Skipped 15 steps in total.
285
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:21:29][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.238 loss(reduced): nan grad_norm: nan if_nan_skip: 15 max_memory: 33.1GB text_tokens: 31770.0 tgs: 60 data_time: 1.08s time: 524.13s eta: 3 days, 12:17:49
286
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:30:10][WARNING] [Step 15] The grad norm is NaN or Inf, skip this step. Skipped 16 steps in total.
287
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:30:10][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.296 loss(reduced): nan grad_norm: nan if_nan_skip: 16 max_memory: 33.1GB text_tokens: 32107.0 tgs: 61 data_time: 1.04s time: 520.54s eta: 3 days, 11:34:32
288
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:38:49][WARNING] [Step 16] The grad norm is NaN or Inf, skip this step. Skipped 17 steps in total.
289
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:38:49][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 17 max_memory: 33.0GB text_tokens: 31655.0 tgs: 61 data_time: 0.68s time: 518.79s eta: 3 days, 11:09:03
290
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:47:31][WARNING] [Step 17] The grad norm is NaN or Inf, skip this step. Skipped 18 steps in total.
291
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:47:31][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.322 loss(reduced): nan grad_norm: nan if_nan_skip: 18 max_memory: 32.9GB text_tokens: 31529.0 tgs: 60 data_time: 0.90s time: 522.11s eta: 3 days, 11:32:13
292
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:56:15][WARNING] [Step 18] The grad norm is NaN or Inf, skip this step. Skipped 19 steps in total.
293
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 02:56:15][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.254 loss(reduced): nan grad_norm: nan if_nan_skip: 19 max_memory: 33.1GB text_tokens: 31277.0 tgs: 59 data_time: 0.77s time: 523.83s eta: 3 days, 11:40:04
294
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:04:55][WARNING] [Step 19] The grad norm is NaN or Inf, skip this step. Skipped 20 steps in total.
295
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:04:55][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 20 max_memory: 33.1GB text_tokens: 31388.0 tgs: 60 data_time: 0.63s time: 520.47s eta: 3 days, 10:59:09
296
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:13:34][WARNING] [Step 20] The grad norm is NaN or Inf, skip this step. Skipped 21 steps in total.
297
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:13:34][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.287 loss(reduced): nan grad_norm: nan if_nan_skip: 21 max_memory: 33.0GB text_tokens: 32144.0 tgs: 62 data_time: 0.74s time: 518.43s eta: 3 days, 10:31:02
298
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:22:16][WARNING] [Step 21] The grad norm is NaN or Inf, skip this step. Skipped 22 steps in total.
299
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:22:16][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.233 loss(reduced): nan grad_norm: nan if_nan_skip: 22 max_memory: 32.8GB text_tokens: 31305.0 tgs: 59 data_time: 0.69s time: 522.84s eta: 3 days, 11:04:25
300
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:31:00][WARNING] [Step 22] The grad norm is NaN or Inf, skip this step. Skipped 23 steps in total.
301
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:31:00][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.195 loss(reduced): nan grad_norm: nan if_nan_skip: 23 max_memory: 33.1GB text_tokens: 32167.0 tgs: 61 data_time: 0.89s time: 523.52s eta: 3 days, 11:02:11
302
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:39:41][WARNING] [Step 23] The grad norm is NaN or Inf, skip this step. Skipped 24 steps in total.
303
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:39:41][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.266 loss(reduced): nan grad_norm: nan if_nan_skip: 24 max_memory: 33.1GB text_tokens: 32301.0 tgs: 62 data_time: 0.79s time: 520.96s eta: 3 days, 10:29:08
304
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:48:20][WARNING] [Step 24] The grad norm is NaN or Inf, skip this step. Skipped 25 steps in total.
305
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:48:20][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.315 loss(reduced): nan grad_norm: nan if_nan_skip: 25 max_memory: 33.1GB text_tokens: 31599.0 tgs: 60 data_time: 0.70s time: 519.10s eta: 3 days, 10:02:48
306
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:57:03][WARNING] [Step 25] The grad norm is NaN or Inf, skip this step. Skipped 26 steps in total.
307
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 03:57:03][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.282 loss(reduced): nan grad_norm: nan if_nan_skip: 26 max_memory: 33.1GB text_tokens: 32431.0 tgs: 61 data_time: 0.74s time: 523.30s eta: 3 days, 10:33:51
308
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:05:46][WARNING] [Step 26] The grad norm is NaN or Inf, skip this step. Skipped 27 steps in total.
309
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:05:46][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.260 loss(reduced): nan grad_norm: nan if_nan_skip: 27 max_memory: 33.0GB text_tokens: 31812.0 tgs: 60 data_time: 0.85s time: 522.95s eta: 3 days, 10:21:50
310
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:14:28][WARNING] [Step 27] The grad norm is NaN or Inf, skip this step. Skipped 28 steps in total.
311
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:14:28][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.294 loss(reduced): nan grad_norm: nan if_nan_skip: 28 max_memory: 33.0GB text_tokens: 30722.0 tgs: 58 data_time: 0.64s time: 521.32s eta: 3 days, 9:57:46
312
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:23:08][WARNING] [Step 28] The grad norm is NaN or Inf, skip this step. Skipped 29 steps in total.
313
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:23:08][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.198 loss(reduced): nan grad_norm: nan if_nan_skip: 29 max_memory: 33.0GB text_tokens: 29912.0 tgs: 57 data_time: 0.86s time: 520.64s eta: 3 days, 9:42:40
314
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:31:51][WARNING] [Step 29] The grad norm is NaN or Inf, skip this step. Skipped 30 steps in total.
315
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:31:51][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 30 max_memory: 33.0GB text_tokens: 31300.0 tgs: 59 data_time: 1.02s time: 522.39s eta: 3 days, 9:50:28
316
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:40:34][WARNING] [Step 30] The grad norm is NaN or Inf, skip this step. Skipped 31 steps in total.
317
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:40:34][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 31 max_memory: 33.1GB text_tokens: 32379.0 tgs: 61 data_time: 1.05s time: 523.75s eta: 3 days, 9:54:32
318
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:49:15][WARNING] [Step 31] The grad norm is NaN or Inf, skip this step. Skipped 32 steps in total.
319
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:49:15][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.272 loss(reduced): nan grad_norm: nan if_nan_skip: 32 max_memory: 32.8GB text_tokens: 31647.0 tgs: 60 data_time: 0.71s time: 520.43s eta: 3 days, 9:14:40
320
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:57:55][WARNING] [Step 32] The grad norm is NaN or Inf, skip this step. Skipped 33 steps in total.
321
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 04:57:55][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.242 loss(reduced): nan grad_norm: nan if_nan_skip: 33 max_memory: 32.7GB text_tokens: 31134.0 tgs: 59 data_time: 0.80s time: 520.68s eta: 3 days, 9:08:20
322
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:06:37][WARNING] [Step 33] The grad norm is NaN or Inf, skip this step. Skipped 34 steps in total.
323
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:06:37][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.197 loss(reduced): nan grad_norm: nan if_nan_skip: 34 max_memory: 33.1GB text_tokens: 31309.0 tgs: 60 data_time: 0.66s time: 521.29s eta: 3 days, 9:05:22
324
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:15:21][WARNING] [Step 34] The grad norm is NaN or Inf, skip this step. Skipped 35 steps in total.
325
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:15:21][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.272 loss(reduced): nan grad_norm: nan if_nan_skip: 35 max_memory: 32.9GB text_tokens: 31520.0 tgs: 60 data_time: 0.62s time: 524.11s eta: 3 days, 9:22:58
326
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:24:02][WARNING] [Step 35] The grad norm is NaN or Inf, skip this step. Skipped 36 steps in total.
327
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:24:02][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 36 max_memory: 33.1GB text_tokens: 32247.0 tgs: 61 data_time: 0.90s time: 520.96s eta: 3 days, 8:44:55
328
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:32:42][WARNING] [Step 36] The grad norm is NaN or Inf, skip this step. Skipped 37 steps in total.
329
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:32:42][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.300 loss(reduced): nan grad_norm: nan if_nan_skip: 37 max_memory: 33.1GB text_tokens: 31693.0 tgs: 60 data_time: 0.92s time: 520.17s eta: 3 days, 8:28:54
330
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:41:25][WARNING] [Step 37] The grad norm is NaN or Inf, skip this step. Skipped 38 steps in total.
331
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:41:25][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.259 loss(reduced): nan grad_norm: nan if_nan_skip: 38 max_memory: 33.0GB text_tokens: 32127.0 tgs: 61 data_time: 0.92s time: 522.56s eta: 3 days, 8:42:25
332
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:50:09][WARNING] [Step 38] The grad norm is NaN or Inf, skip this step. Skipped 39 steps in total.
333
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:50:09][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.265 loss(reduced): nan grad_norm: nan if_nan_skip: 39 max_memory: 33.0GB text_tokens: 30901.0 tgs: 58 data_time: 0.67s time: 524.33s eta: 3 days, 8:50:04
334
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:58:49][WARNING] [Step 39] The grad norm is NaN or Inf, skip this step. Skipped 40 steps in total.
335
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 05:58:49][INFO] [Train] (Epoch 1) Step 40/593 lr: 0.000020 loss: 0.312 loss(reduced): nan grad_norm: nan if_nan_skip: 40 max_memory: 33.0GB text_tokens: 31581.0 tgs: 60 data_time: 0.86s time: 519.89s eta: 3 days, 8:00:17
336
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:07:29][WARNING] [Step 40] The grad norm is NaN or Inf, skip this step. Skipped 41 steps in total.
337
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:07:29][INFO] [Train] (Epoch 1) Step 41/593 lr: 0.000020 loss: 0.244 loss(reduced): nan grad_norm: nan if_nan_skip: 41 max_memory: 33.1GB text_tokens: 31785.0 tgs: 61 data_time: 0.79s time: 520.10s eta: 3 days, 7:53:33
338
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:16:11][WARNING] [Step 41] The grad norm is NaN or Inf, skip this step. Skipped 42 steps in total.
339
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:16:11][INFO] [Train] (Epoch 1) Step 42/593 lr: 0.000020 loss: 0.333 loss(reduced): nan grad_norm: nan if_nan_skip: 42 max_memory: 32.8GB text_tokens: 31496.0 tgs: 60 data_time: 0.89s time: 522.00s eta: 3 days, 8:02:24
340
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:24:55][WARNING] [Step 42] The grad norm is NaN or Inf, skip this step. Skipped 43 steps in total.
341
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:24:55][INFO] [Train] (Epoch 1) Step 43/593 lr: 0.000020 loss: 0.310 loss(reduced): nan grad_norm: nan if_nan_skip: 43 max_memory: 33.1GB text_tokens: 32186.0 tgs: 61 data_time: 0.72s time: 524.35s eta: 3 days, 8:15:18
342
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:33:36][WARNING] [Step 43] The grad norm is NaN or Inf, skip this step. Skipped 44 steps in total.
343
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:33:36][INFO] [Train] (Epoch 1) Step 44/593 lr: 0.000020 loss: 0.284 loss(reduced): nan grad_norm: nan if_nan_skip: 44 max_memory: 33.1GB text_tokens: 31986.0 tgs: 61 data_time: 0.81s time: 520.51s eta: 3 days, 7:31:21
344
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:42:14][WARNING] [Step 44] The grad norm is NaN or Inf, skip this step. Skipped 45 steps in total.
345
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:42:14][INFO] [Train] (Epoch 1) Step 45/593 lr: 0.000020 loss: 0.277 loss(reduced): nan grad_norm: nan if_nan_skip: 45 max_memory: 33.0GB text_tokens: 31430.0 tgs: 60 data_time: 0.83s time: 518.57s eta: 3 days, 7:04:52
346
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:50:57][WARNING] [Step 45] The grad norm is NaN or Inf, skip this step. Skipped 46 steps in total.
347
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:50:57][INFO] [Train] (Epoch 1) Step 46/593 lr: 0.000020 loss: 0.333 loss(reduced): nan grad_norm: nan if_nan_skip: 46 max_memory: 33.0GB text_tokens: 31889.0 tgs: 60 data_time: 0.71s time: 523.16s eta: 3 days, 7:38:09
348
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:59:41][WARNING] [Step 46] The grad norm is NaN or Inf, skip this step. Skipped 47 steps in total.
349
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 06:59:41][INFO] [Train] (Epoch 1) Step 47/593 lr: 0.000020 loss: 0.286 loss(reduced): nan grad_norm: nan if_nan_skip: 47 max_memory: 32.7GB text_tokens: 31343.0 tgs: 59 data_time: 0.93s time: 523.62s eta: 3 days, 7:33:42
350
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:08:21][WARNING] [Step 47] The grad norm is NaN or Inf, skip this step. Skipped 48 steps in total.
351
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:08:21][INFO] [Train] (Epoch 1) Step 48/593 lr: 0.000020 loss: 0.214 loss(reduced): nan grad_norm: nan if_nan_skip: 48 max_memory: 32.9GB text_tokens: 31765.0 tgs: 61 data_time: 0.80s time: 520.16s eta: 3 days, 6:53:29
352
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:17:01][WARNING] [Step 48] The grad norm is NaN or Inf, skip this step. Skipped 49 steps in total.
353
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:17:01][INFO] [Train] (Epoch 1) Step 49/593 lr: 0.000020 loss: 0.248 loss(reduced): nan grad_norm: nan if_nan_skip: 49 max_memory: 33.0GB text_tokens: 31720.0 tgs: 60 data_time: 0.95s time: 520.06s eta: 3 days, 6:43:50
354
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:25:45][WARNING] [Step 49] The grad norm is NaN or Inf, skip this step. Skipped 50 steps in total.
355
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:25:45][INFO] [Train] (Epoch 1) Step 50/593 lr: 0.000020 loss: 0.392 loss(reduced): nan grad_norm: nan if_nan_skip: 50 max_memory: 33.1GB text_tokens: 31630.0 tgs: 60 data_time: 1.15s time: 523.76s eta: 3 days, 7:08:45
356
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:34:29][WARNING] [Step 50] The grad norm is NaN or Inf, skip this step. Skipped 51 steps in total.
357
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:34:29][INFO] [Train] (Epoch 1) Step 51/593 lr: 0.000020 loss: 0.338 loss(reduced): nan grad_norm: nan if_nan_skip: 51 max_memory: 33.0GB text_tokens: 31547.0 tgs: 60 data_time: 0.87s time: 523.80s eta: 3 days, 7:00:20
358
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:43:09][WARNING] [Step 51] The grad norm is NaN or Inf, skip this step. Skipped 52 steps in total.
359
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:43:09][INFO] [Train] (Epoch 1) Step 52/593 lr: 0.000020 loss: 0.283 loss(reduced): nan grad_norm: nan if_nan_skip: 52 max_memory: 33.1GB text_tokens: 31899.0 tgs: 61 data_time: 0.85s time: 520.24s eta: 3 days, 6:19:30
360
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:51:48][WARNING] [Step 52] The grad norm is NaN or Inf, skip this step. Skipped 53 steps in total.
361
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 07:51:48][INFO] [Train] (Epoch 1) Step 53/593 lr: 0.000020 loss: 0.200 loss(reduced): nan grad_norm: nan if_nan_skip: 53 max_memory: 32.7GB text_tokens: 30536.0 tgs: 58 data_time: 0.86s time: 518.78s eta: 3 days, 5:57:38
362
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:00:31][WARNING] [Step 53] The grad norm is NaN or Inf, skip this step. Skipped 54 steps in total.
363
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:00:31][INFO] [Train] (Epoch 1) Step 54/593 lr: 0.000020 loss: 0.258 loss(reduced): nan grad_norm: nan if_nan_skip: 54 max_memory: 33.1GB text_tokens: 31859.0 tgs: 60 data_time: 0.75s time: 522.67s eta: 3 days, 6:24:00
364
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:09:13][WARNING] [Step 54] The grad norm is NaN or Inf, skip this step. Skipped 55 steps in total.
365
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:09:13][INFO] [Train] (Epoch 1) Step 55/593 lr: 0.000020 loss: 0.251 loss(reduced): nan grad_norm: nan if_nan_skip: 55 max_memory: 32.8GB text_tokens: 31761.0 tgs: 60 data_time: 0.81s time: 522.72s eta: 3 days, 6:15:46
366
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:17:54][WARNING] [Step 55] The grad norm is NaN or Inf, skip this step. Skipped 56 steps in total.
367
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:17:54][INFO] [Train] (Epoch 1) Step 56/593 lr: 0.000020 loss: 0.260 loss(reduced): nan grad_norm: nan if_nan_skip: 56 max_memory: 33.0GB text_tokens: 31082.0 tgs: 59 data_time: 0.81s time: 520.53s eta: 3 days, 5:47:24
368
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:26:33][WARNING] [Step 56] The grad norm is NaN or Inf, skip this step. Skipped 57 steps in total.
369
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:26:34][INFO] [Train] (Epoch 1) Step 57/593 lr: 0.000020 loss: 0.260 loss(reduced): nan grad_norm: nan if_nan_skip: 57 max_memory: 32.9GB text_tokens: 32012.0 tgs: 61 data_time: 0.78s time: 519.66s eta: 3 days, 5:30:56
370
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:35:15][WARNING] [Step 57] The grad norm is NaN or Inf, skip this step. Skipped 58 steps in total.
371
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:35:15][INFO] [Train] (Epoch 1) Step 58/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 58 max_memory: 32.8GB text_tokens: 31850.0 tgs: 61 data_time: 0.93s time: 521.31s eta: 3 days, 5:36:59
372
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:43:58][WARNING] [Step 58] The grad norm is NaN or Inf, skip this step. Skipped 59 steps in total.
373
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:43:58][INFO] [Train] (Epoch 1) Step 59/593 lr: 0.000020 loss: 0.308 loss(reduced): nan grad_norm: nan if_nan_skip: 59 max_memory: 32.9GB text_tokens: 30425.0 tgs: 58 data_time: 0.97s time: 523.59s eta: 3 days, 5:48:41
374
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:52:38][WARNING] [Step 59] The grad norm is NaN or Inf, skip this step. Skipped 60 steps in total.
375
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 08:52:38][INFO] [Train] (Epoch 1) Step 60/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 60 max_memory: 33.1GB text_tokens: 31774.0 tgs: 61 data_time: 0.59s time: 519.32s eta: 3 days, 5:01:59
376
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:01:18][WARNING] [Step 60] The grad norm is NaN or Inf, skip this step. Skipped 61 steps in total.
377
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:01:18][INFO] [Train] (Epoch 1) Step 61/593 lr: 0.000020 loss: 0.265 loss(reduced): nan grad_norm: nan if_nan_skip: 61 max_memory: 32.9GB text_tokens: 30693.0 tgs: 58 data_time: 0.84s time: 520.64s eta: 3 days, 5:05:03
378
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:10:00][WARNING] [Step 61] The grad norm is NaN or Inf, skip this step. Skipped 62 steps in total.
379
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:10:00][INFO] [Train] (Epoch 1) Step 62/593 lr: 0.000020 loss: 0.299 loss(reduced): nan grad_norm: nan if_nan_skip: 62 max_memory: 33.1GB text_tokens: 31736.0 tgs: 60 data_time: 0.94s time: 521.45s eta: 3 days, 5:03:31
380
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:18:44][WARNING] [Step 62] The grad norm is NaN or Inf, skip this step. Skipped 63 steps in total.
381
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:18:44][INFO] [Train] (Epoch 1) Step 63/593 lr: 0.000020 loss: 0.296 loss(reduced): nan grad_norm: nan if_nan_skip: 63 max_memory: 33.1GB text_tokens: 30661.0 tgs: 58 data_time: 0.68s time: 523.95s eta: 3 days, 5:16:56
382
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:27:23][WARNING] [Step 63] The grad norm is NaN or Inf, skip this step. Skipped 64 steps in total.
383
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:27:23][INFO] [Train] (Epoch 1) Step 64/593 lr: 0.000020 loss: 0.364 loss(reduced): nan grad_norm: nan if_nan_skip: 64 max_memory: 33.1GB text_tokens: 32237.0 tgs: 62 data_time: 1.01s time: 518.85s eta: 3 days, 4:23:12
384
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:36:02][WARNING] [Step 64] The grad norm is NaN or Inf, skip this step. Skipped 65 steps in total.
385
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:36:02][INFO] [Train] (Epoch 1) Step 65/593 lr: 0.000020 loss: 0.274 loss(reduced): nan grad_norm: nan if_nan_skip: 65 max_memory: 33.0GB text_tokens: 31226.0 tgs: 60 data_time: 1.10s time: 519.24s eta: 3 days, 4:18:00
386
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:44:43][WARNING] [Step 65] The grad norm is NaN or Inf, skip this step. Skipped 66 steps in total.
387
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:44:43][INFO] [Train] (Epoch 1) Step 66/593 lr: 0.000020 loss: 0.292 loss(reduced): nan grad_norm: nan if_nan_skip: 66 max_memory: 32.9GB text_tokens: 31462.0 tgs: 60 data_time: 0.81s time: 520.72s eta: 3 days, 4:22:18
388
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:53:27][WARNING] [Step 66] The grad norm is NaN or Inf, skip this step. Skipped 67 steps in total.
389
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 09:53:27][INFO] [Train] (Epoch 1) Step 67/593 lr: 0.000020 loss: 0.347 loss(reduced): nan grad_norm: nan if_nan_skip: 67 max_memory: 32.9GB text_tokens: 31003.0 tgs: 59 data_time: 0.89s time: 524.25s eta: 3 days, 4:44:41
390
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:02:07][WARNING] [Step 67] The grad norm is NaN or Inf, skip this step. Skipped 68 steps in total.
391
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:02:07][INFO] [Train] (Epoch 1) Step 68/593 lr: 0.000020 loss: 0.228 loss(reduced): nan grad_norm: nan if_nan_skip: 68 max_memory: 33.0GB text_tokens: 30989.0 tgs: 59 data_time: 0.59s time: 520.30s eta: 3 days, 4:01:18
392
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:10:47][WARNING] [Step 68] The grad norm is NaN or Inf, skip this step. Skipped 69 steps in total.
393
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:10:47][INFO] [Train] (Epoch 1) Step 69/593 lr: 0.000020 loss: 0.308 loss(reduced): nan grad_norm: nan if_nan_skip: 69 max_memory: 33.1GB text_tokens: 31722.0 tgs: 61 data_time: 0.70s time: 519.50s eta: 3 days, 3:45:35
394
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:19:29][WARNING] [Step 69] The grad norm is NaN or Inf, skip this step. Skipped 70 steps in total.
395
+ [XTuner][RANK 50][DP 12][SP 2][TP 0][2025-01-21 10:19:29][INFO] [Train] (Epoch 1) Step 70/593 lr: 0.000020 loss: 0.292 loss(reduced): nan grad_norm: nan if_nan_skip: 70 max_memory: 33.1GB text_tokens: 32510.0 tgs: 62 data_time: 0.84s time: 522.32s eta: 3 days, 4:01:35
20250120235238/rank52.log ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:52:42][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250120235238', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:52:42][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:53:37][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:54:31][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:55:25][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:56:18][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:57:14][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:58:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-20 23:59:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:00:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:00:05][INFO] [Dataset & Dataloader] Cost 443.11s
12
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:55][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:07:56][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:10:23][SUCCESS] [Parallelize LLM] Elapsed time 147.33 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:10:24][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:19:46][WARNING] [Step 0] The grad norm is NaN or Inf, skip this step. Skipped 1 steps in total.
257
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:19:46][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 1 max_memory: 33.1GB text_tokens: 31804.0 tgs: 58 data_time: 1.89s time: 547.83s eta: 3 days, 18:14:23
258
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:28:29][WARNING] [Step 1] The grad norm is NaN or Inf, skip this step. Skipped 2 steps in total.
259
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:28:29][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.260 loss(reduced): nan grad_norm: nan if_nan_skip: 2 max_memory: 33.1GB text_tokens: 31938.0 tgs: 61 data_time: 0.75s time: 523.22s eta: 3 days, 14:02:28
260
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:37:12][WARNING] [Step 2] The grad norm is NaN or Inf, skip this step. Skipped 3 steps in total.
261
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:37:12][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.371 loss(reduced): nan grad_norm: nan if_nan_skip: 3 max_memory: 33.1GB text_tokens: 31420.0 tgs: 60 data_time: 0.78s time: 522.87s eta: 3 days, 13:50:17
262
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:45:52][WARNING] [Step 3] The grad norm is NaN or Inf, skip this step. Skipped 4 steps in total.
263
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:45:52][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.247 loss(reduced): nan grad_norm: nan if_nan_skip: 4 max_memory: 33.1GB text_tokens: 32320.0 tgs: 62 data_time: 0.92s time: 520.29s eta: 3 days, 13:16:12
264
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:54:33][WARNING] [Step 4] The grad norm is NaN or Inf, skip this step. Skipped 5 steps in total.
265
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 00:54:33][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.270 loss(reduced): nan grad_norm: nan if_nan_skip: 5 max_memory: 32.9GB text_tokens: 31755.0 tgs: 60 data_time: 0.88s time: 520.99s eta: 3 days, 13:14:20
266
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:03:14][WARNING] [Step 5] The grad norm is NaN or Inf, skip this step. Skipped 6 steps in total.
267
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:03:14][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.246 loss(reduced): nan grad_norm: nan if_nan_skip: 6 max_memory: 33.0GB text_tokens: 31116.0 tgs: 59 data_time: 0.95s time: 520.89s eta: 3 days, 13:04:45
268
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:11:58][WARNING] [Step 6] The grad norm is NaN or Inf, skip this step. Skipped 7 steps in total.
269
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:11:58][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.246 loss(reduced): nan grad_norm: nan if_nan_skip: 7 max_memory: 33.1GB text_tokens: 32297.0 tgs: 61 data_time: 0.82s time: 523.39s eta: 3 days, 13:20:27
270
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:20:38][WARNING] [Step 7] The grad norm is NaN or Inf, skip this step. Skipped 8 steps in total.
271
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:20:38][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.227 loss(reduced): nan grad_norm: nan if_nan_skip: 8 max_memory: 33.0GB text_tokens: 32099.0 tgs: 61 data_time: 0.87s time: 520.66s eta: 3 days, 12:45:08
272
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:29:18][WARNING] [Step 8] The grad norm is NaN or Inf, skip this step. Skipped 9 steps in total.
273
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:29:18][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.247 loss(reduced): nan grad_norm: nan if_nan_skip: 9 max_memory: 33.1GB text_tokens: 31943.0 tgs: 61 data_time: 0.95s time: 520.18s eta: 3 days, 12:31:45
274
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:37:59][WARNING] [Step 9] The grad norm is NaN or Inf, skip this step. Skipped 10 steps in total.
275
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:37:59][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 10 max_memory: 33.1GB text_tokens: 32321.0 tgs: 62 data_time: 0.82s time: 520.40s eta: 3 days, 12:25:13
276
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:46:43][WARNING] [Step 10] The grad norm is NaN or Inf, skip this step. Skipped 11 steps in total.
277
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:46:43][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.284 loss(reduced): nan grad_norm: nan if_nan_skip: 11 max_memory: 33.0GB text_tokens: 31685.0 tgs: 60 data_time: 1.22s time: 524.54s eta: 3 days, 12:56:44
278
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:55:24][WARNING] [Step 11] The grad norm is NaN or Inf, skip this step. Skipped 12 steps in total.
279
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 01:55:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.320 loss(reduced): nan grad_norm: nan if_nan_skip: 12 max_memory: 33.0GB text_tokens: 31570.0 tgs: 60 data_time: 1.09s time: 520.65s eta: 3 days, 12:10:18
280
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:04:04][WARNING] [Step 12] The grad norm is NaN or Inf, skip this step. Skipped 13 steps in total.
281
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:04:04][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.329 loss(reduced): nan grad_norm: nan if_nan_skip: 13 max_memory: 33.1GB text_tokens: 32271.0 tgs: 62 data_time: 1.03s time: 519.96s eta: 3 days, 11:54:57
282
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:12:45][WARNING] [Step 13] The grad norm is NaN or Inf, skip this step. Skipped 14 steps in total.
283
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:12:45][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.288 loss(reduced): nan grad_norm: nan if_nan_skip: 14 max_memory: 32.9GB text_tokens: 31826.0 tgs: 61 data_time: 0.85s time: 521.29s eta: 3 days, 11:59:07
284
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:21:29][WARNING] [Step 14] The grad norm is NaN or Inf, skip this step. Skipped 15 steps in total.
285
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:21:29][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.215 loss(reduced): nan grad_norm: nan if_nan_skip: 15 max_memory: 33.0GB text_tokens: 31152.0 tgs: 59 data_time: 0.75s time: 524.13s eta: 3 days, 12:17:49
286
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:30:10][WARNING] [Step 15] The grad norm is NaN or Inf, skip this step. Skipped 16 steps in total.
287
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:30:10][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.242 loss(reduced): nan grad_norm: nan if_nan_skip: 16 max_memory: 33.1GB text_tokens: 31903.0 tgs: 61 data_time: 0.86s time: 520.54s eta: 3 days, 11:34:31
288
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:38:49][WARNING] [Step 16] The grad norm is NaN or Inf, skip this step. Skipped 17 steps in total.
289
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:38:49][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.260 loss(reduced): nan grad_norm: nan if_nan_skip: 17 max_memory: 32.9GB text_tokens: 30941.0 tgs: 59 data_time: 0.67s time: 518.79s eta: 3 days, 11:09:04
290
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:47:31][WARNING] [Step 17] The grad norm is NaN or Inf, skip this step. Skipped 18 steps in total.
291
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:47:31][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.246 loss(reduced): nan grad_norm: nan if_nan_skip: 18 max_memory: 33.1GB text_tokens: 29085.0 tgs: 55 data_time: 0.67s time: 522.11s eta: 3 days, 11:32:13
292
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:56:15][WARNING] [Step 18] The grad norm is NaN or Inf, skip this step. Skipped 19 steps in total.
293
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 02:56:15][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.318 loss(reduced): nan grad_norm: nan if_nan_skip: 19 max_memory: 33.1GB text_tokens: 32477.0 tgs: 61 data_time: 1.11s time: 523.83s eta: 3 days, 11:40:03
294
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:04:55][WARNING] [Step 19] The grad norm is NaN or Inf, skip this step. Skipped 20 steps in total.
295
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:04:55][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.206 loss(reduced): nan grad_norm: nan if_nan_skip: 20 max_memory: 32.9GB text_tokens: 31064.0 tgs: 59 data_time: 0.74s time: 520.47s eta: 3 days, 10:59:09
296
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:13:34][WARNING] [Step 20] The grad norm is NaN or Inf, skip this step. Skipped 21 steps in total.
297
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:13:34][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.248 loss(reduced): nan grad_norm: nan if_nan_skip: 21 max_memory: 33.1GB text_tokens: 31960.0 tgs: 61 data_time: 1.00s time: 518.43s eta: 3 days, 10:31:03
298
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:22:16][WARNING] [Step 21] The grad norm is NaN or Inf, skip this step. Skipped 22 steps in total.
299
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:22:16][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.236 loss(reduced): nan grad_norm: nan if_nan_skip: 22 max_memory: 33.0GB text_tokens: 31106.0 tgs: 59 data_time: 1.00s time: 522.84s eta: 3 days, 11:04:25
300
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:31:00][WARNING] [Step 22] The grad norm is NaN or Inf, skip this step. Skipped 23 steps in total.
301
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:31:00][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.263 loss(reduced): nan grad_norm: nan if_nan_skip: 23 max_memory: 33.0GB text_tokens: 31337.0 tgs: 59 data_time: 1.02s time: 523.52s eta: 3 days, 11:02:11
302
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:39:41][WARNING] [Step 23] The grad norm is NaN or Inf, skip this step. Skipped 24 steps in total.
303
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:39:41][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.293 loss(reduced): nan grad_norm: nan if_nan_skip: 24 max_memory: 32.8GB text_tokens: 31611.0 tgs: 60 data_time: 1.00s time: 520.96s eta: 3 days, 10:29:08
304
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:48:20][WARNING] [Step 24] The grad norm is NaN or Inf, skip this step. Skipped 25 steps in total.
305
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:48:20][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.280 loss(reduced): nan grad_norm: nan if_nan_skip: 25 max_memory: 32.9GB text_tokens: 31573.0 tgs: 60 data_time: 0.68s time: 519.10s eta: 3 days, 10:02:50
306
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:57:03][WARNING] [Step 25] The grad norm is NaN or Inf, skip this step. Skipped 26 steps in total.
307
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 03:57:03][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.301 loss(reduced): nan grad_norm: nan if_nan_skip: 26 max_memory: 33.1GB text_tokens: 31897.0 tgs: 60 data_time: 1.07s time: 523.29s eta: 3 days, 10:33:49
308
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:05:46][WARNING] [Step 26] The grad norm is NaN or Inf, skip this step. Skipped 27 steps in total.
309
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:05:46][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 27 max_memory: 33.1GB text_tokens: 32058.0 tgs: 61 data_time: 0.94s time: 522.95s eta: 3 days, 10:21:50
310
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:14:28][WARNING] [Step 27] The grad norm is NaN or Inf, skip this step. Skipped 28 steps in total.
311
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:14:28][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 28 max_memory: 33.1GB text_tokens: 31119.0 tgs: 59 data_time: 0.81s time: 521.32s eta: 3 days, 9:57:46
312
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:23:08][WARNING] [Step 28] The grad norm is NaN or Inf, skip this step. Skipped 29 steps in total.
313
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:23:08][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.264 loss(reduced): nan grad_norm: nan if_nan_skip: 29 max_memory: 32.7GB text_tokens: 31356.0 tgs: 60 data_time: 0.58s time: 520.64s eta: 3 days, 9:42:40
314
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:31:51][WARNING] [Step 29] The grad norm is NaN or Inf, skip this step. Skipped 30 steps in total.
315
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:31:51][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.268 loss(reduced): nan grad_norm: nan if_nan_skip: 30 max_memory: 33.1GB text_tokens: 32225.0 tgs: 61 data_time: 1.08s time: 522.39s eta: 3 days, 9:50:28
316
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:40:34][WARNING] [Step 30] The grad norm is NaN or Inf, skip this step. Skipped 31 steps in total.
317
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:40:34][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 31 max_memory: 33.0GB text_tokens: 31625.0 tgs: 60 data_time: 0.76s time: 523.75s eta: 3 days, 9:54:32
318
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:49:15][WARNING] [Step 31] The grad norm is NaN or Inf, skip this step. Skipped 32 steps in total.
319
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:49:15][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.272 loss(reduced): nan grad_norm: nan if_nan_skip: 32 max_memory: 33.1GB text_tokens: 31942.0 tgs: 61 data_time: 0.87s time: 520.43s eta: 3 days, 9:14:40
320
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:57:55][WARNING] [Step 32] The grad norm is NaN or Inf, skip this step. Skipped 33 steps in total.
321
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 04:57:55][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.271 loss(reduced): nan grad_norm: nan if_nan_skip: 33 max_memory: 32.8GB text_tokens: 32061.0 tgs: 61 data_time: 0.89s time: 520.68s eta: 3 days, 9:08:20
322
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:06:37][WARNING] [Step 33] The grad norm is NaN or Inf, skip this step. Skipped 34 steps in total.
323
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:06:37][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.249 loss(reduced): nan grad_norm: nan if_nan_skip: 34 max_memory: 32.4GB text_tokens: 30275.0 tgs: 58 data_time: 0.96s time: 521.29s eta: 3 days, 9:05:21
324
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:15:21][WARNING] [Step 34] The grad norm is NaN or Inf, skip this step. Skipped 35 steps in total.
325
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:15:21][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.277 loss(reduced): nan grad_norm: nan if_nan_skip: 35 max_memory: 32.7GB text_tokens: 31108.0 tgs: 59 data_time: 0.78s time: 524.11s eta: 3 days, 9:22:58
326
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:24:02][WARNING] [Step 35] The grad norm is NaN or Inf, skip this step. Skipped 36 steps in total.
327
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:24:02][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.285 loss(reduced): nan grad_norm: nan if_nan_skip: 36 max_memory: 32.9GB text_tokens: 31957.0 tgs: 61 data_time: 0.79s time: 520.96s eta: 3 days, 8:44:55
328
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:32:42][WARNING] [Step 36] The grad norm is NaN or Inf, skip this step. Skipped 37 steps in total.
329
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:32:42][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.274 loss(reduced): nan grad_norm: nan if_nan_skip: 37 max_memory: 32.5GB text_tokens: 29801.0 tgs: 57 data_time: 0.98s time: 520.17s eta: 3 days, 8:28:53
330
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:41:25][WARNING] [Step 37] The grad norm is NaN or Inf, skip this step. Skipped 38 steps in total.
331
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:41:25][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.321 loss(reduced): nan grad_norm: nan if_nan_skip: 38 max_memory: 33.0GB text_tokens: 31102.0 tgs: 59 data_time: 0.84s time: 522.56s eta: 3 days, 8:42:25
332
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:50:09][WARNING] [Step 38] The grad norm is NaN or Inf, skip this step. Skipped 39 steps in total.
333
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:50:09][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.197 loss(reduced): nan grad_norm: nan if_nan_skip: 39 max_memory: 33.1GB text_tokens: 32173.0 tgs: 61 data_time: 0.81s time: 524.33s eta: 3 days, 8:50:05
334
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:58:49][WARNING] [Step 39] The grad norm is NaN or Inf, skip this step. Skipped 40 steps in total.
335
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 05:58:49][INFO] [Train] (Epoch 1) Step 40/593 lr: 0.000020 loss: 0.250 loss(reduced): nan grad_norm: nan if_nan_skip: 40 max_memory: 33.0GB text_tokens: 31492.0 tgs: 60 data_time: 0.92s time: 519.89s eta: 3 days, 8:00:16
336
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:07:29][WARNING] [Step 40] The grad norm is NaN or Inf, skip this step. Skipped 41 steps in total.
337
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:07:29][INFO] [Train] (Epoch 1) Step 41/593 lr: 0.000020 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 41 max_memory: 33.0GB text_tokens: 31539.0 tgs: 60 data_time: 0.90s time: 520.10s eta: 3 days, 7:53:33
338
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:16:11][WARNING] [Step 41] The grad norm is NaN or Inf, skip this step. Skipped 42 steps in total.
339
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:16:11][INFO] [Train] (Epoch 1) Step 42/593 lr: 0.000020 loss: 0.317 loss(reduced): nan grad_norm: nan if_nan_skip: 42 max_memory: 33.1GB text_tokens: 32436.0 tgs: 62 data_time: 1.19s time: 522.00s eta: 3 days, 8:02:23
340
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:24:55][WARNING] [Step 42] The grad norm is NaN or Inf, skip this step. Skipped 43 steps in total.
341
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:24:55][INFO] [Train] (Epoch 1) Step 43/593 lr: 0.000020 loss: 0.299 loss(reduced): nan grad_norm: nan if_nan_skip: 43 max_memory: 33.1GB text_tokens: 32092.0 tgs: 61 data_time: 0.73s time: 524.35s eta: 3 days, 8:15:18
342
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:33:36][WARNING] [Step 43] The grad norm is NaN or Inf, skip this step. Skipped 44 steps in total.
343
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:33:36][INFO] [Train] (Epoch 1) Step 44/593 lr: 0.000020 loss: 0.251 loss(reduced): nan grad_norm: nan if_nan_skip: 44 max_memory: 33.0GB text_tokens: 31585.0 tgs: 60 data_time: 0.87s time: 520.51s eta: 3 days, 7:31:21
344
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:42:14][WARNING] [Step 44] The grad norm is NaN or Inf, skip this step. Skipped 45 steps in total.
345
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:42:14][INFO] [Train] (Epoch 1) Step 45/593 lr: 0.000020 loss: 0.317 loss(reduced): nan grad_norm: nan if_nan_skip: 45 max_memory: 32.1GB text_tokens: 30729.0 tgs: 59 data_time: 0.47s time: 518.57s eta: 3 days, 7:04:52
346
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:50:57][WARNING] [Step 45] The grad norm is NaN or Inf, skip this step. Skipped 46 steps in total.
347
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:50:57][INFO] [Train] (Epoch 1) Step 46/593 lr: 0.000020 loss: 0.277 loss(reduced): nan grad_norm: nan if_nan_skip: 46 max_memory: 33.1GB text_tokens: 31914.0 tgs: 61 data_time: 0.77s time: 523.15s eta: 3 days, 7:38:08
348
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:59:41][WARNING] [Step 46] The grad norm is NaN or Inf, skip this step. Skipped 47 steps in total.
349
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 06:59:41][INFO] [Train] (Epoch 1) Step 47/593 lr: 0.000020 loss: 0.228 loss(reduced): nan grad_norm: nan if_nan_skip: 47 max_memory: 32.9GB text_tokens: 31429.0 tgs: 60 data_time: 0.96s time: 523.62s eta: 3 days, 7:33:42
350
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:08:21][WARNING] [Step 47] The grad norm is NaN or Inf, skip this step. Skipped 48 steps in total.
351
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:08:21][INFO] [Train] (Epoch 1) Step 48/593 lr: 0.000020 loss: 0.287 loss(reduced): nan grad_norm: nan if_nan_skip: 48 max_memory: 32.7GB text_tokens: 31767.0 tgs: 61 data_time: 0.96s time: 520.16s eta: 3 days, 6:53:28
352
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:17:01][WARNING] [Step 48] The grad norm is NaN or Inf, skip this step. Skipped 49 steps in total.
353
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:17:01][INFO] [Train] (Epoch 1) Step 49/593 lr: 0.000020 loss: 0.278 loss(reduced): nan grad_norm: nan if_nan_skip: 49 max_memory: 33.1GB text_tokens: 31842.0 tgs: 61 data_time: 0.87s time: 520.06s eta: 3 days, 6:43:50
354
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:25:45][WARNING] [Step 49] The grad norm is NaN or Inf, skip this step. Skipped 50 steps in total.
355
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:25:45][INFO] [Train] (Epoch 1) Step 50/593 lr: 0.000020 loss: 0.361 loss(reduced): nan grad_norm: nan if_nan_skip: 50 max_memory: 33.1GB text_tokens: 31119.0 tgs: 59 data_time: 0.90s time: 523.76s eta: 3 days, 7:08:45
356
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:34:29][WARNING] [Step 50] The grad norm is NaN or Inf, skip this step. Skipped 51 steps in total.
357
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:34:29][INFO] [Train] (Epoch 1) Step 51/593 lr: 0.000020 loss: 0.258 loss(reduced): nan grad_norm: nan if_nan_skip: 51 max_memory: 32.7GB text_tokens: 31541.0 tgs: 60 data_time: 1.24s time: 523.80s eta: 3 days, 7:00:21
358
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:43:09][WARNING] [Step 51] The grad norm is NaN or Inf, skip this step. Skipped 52 steps in total.
359
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:43:09][INFO] [Train] (Epoch 1) Step 52/593 lr: 0.000020 loss: 0.198 loss(reduced): nan grad_norm: nan if_nan_skip: 52 max_memory: 32.8GB text_tokens: 31567.0 tgs: 60 data_time: 0.79s time: 520.24s eta: 3 days, 6:19:30
360
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:51:48][WARNING] [Step 52] The grad norm is NaN or Inf, skip this step. Skipped 53 steps in total.
361
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 07:51:48][INFO] [Train] (Epoch 1) Step 53/593 lr: 0.000020 loss: 0.251 loss(reduced): nan grad_norm: nan if_nan_skip: 53 max_memory: 33.0GB text_tokens: 32276.0 tgs: 62 data_time: 1.00s time: 518.78s eta: 3 days, 5:57:38
362
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:00:31][WARNING] [Step 53] The grad norm is NaN or Inf, skip this step. Skipped 54 steps in total.
363
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:00:31][INFO] [Train] (Epoch 1) Step 54/593 lr: 0.000020 loss: 0.203 loss(reduced): nan grad_norm: nan if_nan_skip: 54 max_memory: 33.1GB text_tokens: 31104.0 tgs: 59 data_time: 0.71s time: 522.67s eta: 3 days, 6:24:00
364
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:09:13][WARNING] [Step 54] The grad norm is NaN or Inf, skip this step. Skipped 55 steps in total.
365
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:09:13][INFO] [Train] (Epoch 1) Step 55/593 lr: 0.000020 loss: 0.296 loss(reduced): nan grad_norm: nan if_nan_skip: 55 max_memory: 33.0GB text_tokens: 32112.0 tgs: 61 data_time: 0.78s time: 522.72s eta: 3 days, 6:15:46
366
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:17:54][WARNING] [Step 55] The grad norm is NaN or Inf, skip this step. Skipped 56 steps in total.
367
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:17:54][INFO] [Train] (Epoch 1) Step 56/593 lr: 0.000020 loss: 0.241 loss(reduced): nan grad_norm: nan if_nan_skip: 56 max_memory: 32.7GB text_tokens: 31195.0 tgs: 59 data_time: 0.77s time: 520.53s eta: 3 days, 5:47:23
368
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:26:33][WARNING] [Step 56] The grad norm is NaN or Inf, skip this step. Skipped 57 steps in total.
369
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:26:34][INFO] [Train] (Epoch 1) Step 57/593 lr: 0.000020 loss: 0.325 loss(reduced): nan grad_norm: nan if_nan_skip: 57 max_memory: 33.1GB text_tokens: 31401.0 tgs: 60 data_time: 0.65s time: 519.66s eta: 3 days, 5:30:57
370
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:35:15][WARNING] [Step 57] The grad norm is NaN or Inf, skip this step. Skipped 58 steps in total.
371
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:35:15][INFO] [Train] (Epoch 1) Step 58/593 lr: 0.000020 loss: 0.272 loss(reduced): nan grad_norm: nan if_nan_skip: 58 max_memory: 32.9GB text_tokens: 31338.0 tgs: 60 data_time: 0.84s time: 521.31s eta: 3 days, 5:37:00
372
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:43:58][WARNING] [Step 58] The grad norm is NaN or Inf, skip this step. Skipped 59 steps in total.
373
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:43:58][INFO] [Train] (Epoch 1) Step 59/593 lr: 0.000020 loss: 0.297 loss(reduced): nan grad_norm: nan if_nan_skip: 59 max_memory: 33.0GB text_tokens: 31585.0 tgs: 60 data_time: 0.82s time: 523.59s eta: 3 days, 5:48:41
374
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:52:38][WARNING] [Step 59] The grad norm is NaN or Inf, skip this step. Skipped 60 steps in total.
375
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 08:52:38][INFO] [Train] (Epoch 1) Step 60/593 lr: 0.000020 loss: 0.259 loss(reduced): nan grad_norm: nan if_nan_skip: 60 max_memory: 32.6GB text_tokens: 31441.0 tgs: 60 data_time: 0.79s time: 519.32s eta: 3 days, 5:01:59
376
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:01:18][WARNING] [Step 60] The grad norm is NaN or Inf, skip this step. Skipped 61 steps in total.
377
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:01:18][INFO] [Train] (Epoch 1) Step 61/593 lr: 0.000020 loss: 0.298 loss(reduced): nan grad_norm: nan if_nan_skip: 61 max_memory: 33.0GB text_tokens: 32230.0 tgs: 61 data_time: 1.04s time: 520.64s eta: 3 days, 5:05:02
378
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:10:00][WARNING] [Step 61] The grad norm is NaN or Inf, skip this step. Skipped 62 steps in total.
379
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:10:00][INFO] [Train] (Epoch 1) Step 62/593 lr: 0.000020 loss: 0.261 loss(reduced): nan grad_norm: nan if_nan_skip: 62 max_memory: 32.9GB text_tokens: 32203.0 tgs: 61 data_time: 0.84s time: 521.45s eta: 3 days, 5:03:32
380
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:18:44][WARNING] [Step 62] The grad norm is NaN or Inf, skip this step. Skipped 63 steps in total.
381
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:18:44][INFO] [Train] (Epoch 1) Step 63/593 lr: 0.000020 loss: 0.281 loss(reduced): nan grad_norm: nan if_nan_skip: 63 max_memory: 33.1GB text_tokens: 31194.0 tgs: 59 data_time: 0.83s time: 523.95s eta: 3 days, 5:16:55
382
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:27:23][WARNING] [Step 63] The grad norm is NaN or Inf, skip this step. Skipped 64 steps in total.
383
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:27:23][INFO] [Train] (Epoch 1) Step 64/593 lr: 0.000020 loss: 0.324 loss(reduced): nan grad_norm: nan if_nan_skip: 64 max_memory: 33.1GB text_tokens: 31045.0 tgs: 59 data_time: 0.77s time: 518.85s eta: 3 days, 4:23:12
384
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:36:02][WARNING] [Step 64] The grad norm is NaN or Inf, skip this step. Skipped 65 steps in total.
385
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:36:02][INFO] [Train] (Epoch 1) Step 65/593 lr: 0.000020 loss: 0.335 loss(reduced): nan grad_norm: nan if_nan_skip: 65 max_memory: 33.1GB text_tokens: 32341.0 tgs: 62 data_time: 0.82s time: 519.25s eta: 3 days, 4:18:00
386
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:44:43][WARNING] [Step 65] The grad norm is NaN or Inf, skip this step. Skipped 66 steps in total.
387
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:44:43][INFO] [Train] (Epoch 1) Step 66/593 lr: 0.000020 loss: 0.275 loss(reduced): nan grad_norm: nan if_nan_skip: 66 max_memory: 33.1GB text_tokens: 32008.0 tgs: 61 data_time: 0.78s time: 520.72s eta: 3 days, 4:22:17
388
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:53:27][WARNING] [Step 66] The grad norm is NaN or Inf, skip this step. Skipped 67 steps in total.
389
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 09:53:27][INFO] [Train] (Epoch 1) Step 67/593 lr: 0.000020 loss: 0.287 loss(reduced): nan grad_norm: nan if_nan_skip: 67 max_memory: 33.0GB text_tokens: 31320.0 tgs: 59 data_time: 1.25s time: 524.25s eta: 3 days, 4:44:42
390
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:02:07][WARNING] [Step 67] The grad norm is NaN or Inf, skip this step. Skipped 68 steps in total.
391
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:02:07][INFO] [Train] (Epoch 1) Step 68/593 lr: 0.000020 loss: 0.273 loss(reduced): nan grad_norm: nan if_nan_skip: 68 max_memory: 32.8GB text_tokens: 31671.0 tgs: 60 data_time: 0.43s time: 520.30s eta: 3 days, 4:01:18
392
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:10:47][WARNING] [Step 68] The grad norm is NaN or Inf, skip this step. Skipped 69 steps in total.
393
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:10:47][INFO] [Train] (Epoch 1) Step 69/593 lr: 0.000020 loss: 0.256 loss(reduced): nan grad_norm: nan if_nan_skip: 69 max_memory: 33.1GB text_tokens: 31891.0 tgs: 61 data_time: 0.76s time: 519.50s eta: 3 days, 3:45:36
394
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:19:29][WARNING] [Step 69] The grad norm is NaN or Inf, skip this step. Skipped 70 steps in total.
395
+ [XTuner][RANK 52][DP 13][SP 0][TP 0][2025-01-21 10:19:29][INFO] [Train] (Epoch 1) Step 70/593 lr: 0.000020 loss: 0.383 loss(reduced): nan grad_norm: nan if_nan_skip: 70 max_memory: 33.0GB text_tokens: 31509.0 tgs: 60 data_time: 0.91s time: 522.32s eta: 3 days, 4:01:35
20250121104251/rank12.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.82s
12
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.81 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.333 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.3GB text_tokens: 30603.0 tgs: 55 data_time: 1.94s time: 551.48s eta: 3 days, 18:50:28
257
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.293 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31849.0 tgs: 60 data_time: 0.76s time: 529.55s eta: 3 days, 15:04:53
258
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.187 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31059.0 tgs: 58 data_time: 0.98s time: 529.09s eta: 3 days, 14:51:31
259
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.192 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32411.0 tgs: 61 data_time: 0.88s time: 529.95s eta: 3 days, 14:51:12
260
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.203 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.67s time: 529.05s eta: 3 days, 14:33:29
261
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.229 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31449.0 tgs: 59 data_time: 0.89s time: 529.85s eta: 3 days, 14:32:33
262
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.222 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31590.0 tgs: 59 data_time: 0.82s time: 529.62s eta: 3 days, 14:21:28
263
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.202 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31967.0 tgs: 59 data_time: 0.79s time: 535.77s eta: 3 days, 15:12:42
264
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.208 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31084.0 tgs: 58 data_time: 0.91s time: 534.04s eta: 3 days, 14:46:51
265
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.319 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31694.0 tgs: 59 data_time: 0.87s time: 533.28s eta: 3 days, 14:30:33
266
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.228 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32132.0 tgs: 60 data_time: 0.89s time: 529.43s eta: 3 days, 13:44:15
267
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.240 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31935.0 tgs: 60 data_time: 0.76s time: 529.04s eta: 3 days, 13:31:38
268
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.163 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31576.0 tgs: 59 data_time: 0.81s time: 533.80s eta: 3 days, 14:09:00
269
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.224 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31631.0 tgs: 59 data_time: 0.78s time: 528.94s eta: 3 days, 13:13:05
270
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.179 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31904.0 tgs: 60 data_time: 0.80s time: 530.39s eta: 3 days, 13:18:13
271
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.182 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32061.0 tgs: 59 data_time: 0.87s time: 543.37s eta: 3 days, 15:14:30
272
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.169 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30810.0 tgs: 57 data_time: 0.73s time: 536.34s eta: 3 days, 13:57:50
273
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.199 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.90s time: 530.95s eta: 3 days, 12:57:07
274
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.210 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31557.0 tgs: 59 data_time: 1.21s time: 529.95s eta: 3 days, 12:38:41
275
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.268 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31964.0 tgs: 59 data_time: 0.77s time: 534.34s eta: 3 days, 13:11:52
276
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.147 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31859.0 tgs: 60 data_time: 0.71s time: 529.94s eta: 3 days, 12:20:54
277
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.233 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32340.0 tgs: 60 data_time: 0.92s time: 537.08s eta: 3 days, 13:20:08
278
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.145 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31453.0 tgs: 59 data_time: 0.50s time: 529.34s eta: 3 days, 11:57:32
279
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.141 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31779.0 tgs: 59 data_time: 0.61s time: 535.73s eta: 3 days, 12:49:28
280
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.155 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31169.0 tgs: 57 data_time: 0.82s time: 538.83s eta: 3 days, 13:09:52
281
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.239 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 29708.0 tgs: 55 data_time: 0.96s time: 533.66s eta: 3 days, 12:12:01
282
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.222 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31114.0 tgs: 58 data_time: 1.01s time: 529.59s eta: 3 days, 11:24:38
283
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32372.0 tgs: 60 data_time: 0.96s time: 531.66s eta: 3 days, 11:35:17
284
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.196 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31554.0 tgs: 59 data_time: 0.99s time: 529.53s eta: 3 days, 11:06:23
285
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.148 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31042.0 tgs: 58 data_time: 1.08s time: 529.50s eta: 3 days, 10:57:19
286
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31728.0 tgs: 59 data_time: 0.84s time: 529.82s eta: 3 days, 10:51:27
287
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.180 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31816.0 tgs: 59 data_time: 0.74s time: 534.70s eta: 3 days, 11:28:22
288
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30775.0 tgs: 58 data_time: 0.67s time: 529.78s eta: 3 days, 10:33:25
289
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31569.0 tgs: 59 data_time: 0.50s time: 530.08s eta: 3 days, 10:27:25
290
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.169 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31161.0 tgs: 58 data_time: 0.70s time: 529.94s eta: 3 days, 10:17:15
291
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.261 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31655.0 tgs: 59 data_time: 1.00s time: 533.32s eta: 3 days, 10:39:52
292
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31369.0 tgs: 59 data_time: 0.81s time: 529.20s eta: 3 days, 9:52:46
293
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32447.0 tgs: 61 data_time: 0.87s time: 529.65s eta: 3 days, 9:48:04
294
+ [XTuner][RANK 12][DP 3][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31783.0 tgs: 59 data_time: 0.94s time: 530.42s eta: 3 days, 9:46:21
20250121104251/rank13.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.85s
12
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.51 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.252 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.3GB text_tokens: 30603.0 tgs: 55 data_time: 1.96s time: 551.45s eta: 3 days, 18:50:08
257
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.242 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31849.0 tgs: 60 data_time: 0.71s time: 529.59s eta: 3 days, 15:05:20
258
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.233 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31059.0 tgs: 58 data_time: 0.93s time: 529.09s eta: 3 days, 14:51:31
259
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.219 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32411.0 tgs: 61 data_time: 0.86s time: 529.95s eta: 3 days, 14:51:13
260
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.188 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.70s time: 529.05s eta: 3 days, 14:33:29
261
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.244 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31449.0 tgs: 59 data_time: 0.84s time: 529.85s eta: 3 days, 14:32:33
262
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.215 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31590.0 tgs: 59 data_time: 0.85s time: 529.62s eta: 3 days, 14:21:28
263
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.252 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31967.0 tgs: 59 data_time: 0.78s time: 535.77s eta: 3 days, 15:12:42
264
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.169 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31084.0 tgs: 58 data_time: 0.93s time: 534.04s eta: 3 days, 14:46:51
265
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.210 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31694.0 tgs: 59 data_time: 0.87s time: 533.28s eta: 3 days, 14:30:33
266
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.212 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32132.0 tgs: 60 data_time: 0.89s time: 529.43s eta: 3 days, 13:44:15
267
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.190 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31935.0 tgs: 60 data_time: 0.79s time: 529.03s eta: 3 days, 13:31:36
268
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.182 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31576.0 tgs: 59 data_time: 0.81s time: 533.80s eta: 3 days, 14:08:59
269
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.252 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31631.0 tgs: 59 data_time: 0.81s time: 528.94s eta: 3 days, 13:13:07
270
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.218 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31904.0 tgs: 60 data_time: 0.85s time: 530.38s eta: 3 days, 13:18:12
271
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.225 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32061.0 tgs: 59 data_time: 0.83s time: 543.37s eta: 3 days, 15:14:28
272
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30810.0 tgs: 57 data_time: 0.71s time: 536.34s eta: 3 days, 13:57:50
273
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.221 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.92s time: 530.95s eta: 3 days, 12:57:06
274
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.240 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31557.0 tgs: 59 data_time: 1.19s time: 529.95s eta: 3 days, 12:38:41
275
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.242 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31964.0 tgs: 59 data_time: 0.73s time: 534.34s eta: 3 days, 13:11:51
276
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.204 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31859.0 tgs: 60 data_time: 0.70s time: 529.94s eta: 3 days, 12:20:56
277
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32340.0 tgs: 60 data_time: 0.87s time: 537.08s eta: 3 days, 13:20:09
278
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.138 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31453.0 tgs: 59 data_time: 0.48s time: 529.34s eta: 3 days, 11:57:30
279
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.199 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31779.0 tgs: 59 data_time: 0.61s time: 535.73s eta: 3 days, 12:49:28
280
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.187 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31169.0 tgs: 57 data_time: 0.83s time: 538.83s eta: 3 days, 13:09:52
281
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.215 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 29708.0 tgs: 55 data_time: 0.95s time: 533.66s eta: 3 days, 12:11:58
282
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31114.0 tgs: 58 data_time: 1.02s time: 529.59s eta: 3 days, 11:24:39
283
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.212 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32372.0 tgs: 60 data_time: 0.95s time: 531.66s eta: 3 days, 11:35:17
284
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.217 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31554.0 tgs: 59 data_time: 1.00s time: 529.53s eta: 3 days, 11:06:23
285
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31042.0 tgs: 58 data_time: 1.06s time: 529.51s eta: 3 days, 10:57:21
286
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.250 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31728.0 tgs: 59 data_time: 0.83s time: 529.82s eta: 3 days, 10:51:26
287
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.282 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31816.0 tgs: 59 data_time: 0.73s time: 534.70s eta: 3 days, 11:28:24
288
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30775.0 tgs: 58 data_time: 0.66s time: 529.78s eta: 3 days, 10:33:24
289
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31569.0 tgs: 59 data_time: 0.50s time: 530.08s eta: 3 days, 10:27:24
290
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31161.0 tgs: 58 data_time: 0.71s time: 529.94s eta: 3 days, 10:17:16
291
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31655.0 tgs: 59 data_time: 1.00s time: 533.32s eta: 3 days, 10:39:51
292
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31369.0 tgs: 59 data_time: 0.80s time: 529.20s eta: 3 days, 9:52:45
293
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.169 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32447.0 tgs: 61 data_time: 0.85s time: 529.65s eta: 3 days, 9:48:06
294
+ [XTuner][RANK 13][DP 3][SP 1][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.134 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31783.0 tgs: 59 data_time: 0.94s time: 530.42s eta: 3 days, 9:46:21
20250121104251/rank15.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.84s
12
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.09 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.263 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.3GB text_tokens: 30603.0 tgs: 55 data_time: 2.00s time: 551.60s eta: 3 days, 18:51:39
257
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.284 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31849.0 tgs: 60 data_time: 0.73s time: 529.47s eta: 3 days, 15:04:07
258
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.261 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31059.0 tgs: 58 data_time: 0.96s time: 529.09s eta: 3 days, 14:51:34
259
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.193 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32411.0 tgs: 61 data_time: 0.86s time: 529.95s eta: 3 days, 14:51:09
260
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.237 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.68s time: 529.05s eta: 3 days, 14:33:28
261
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.259 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31449.0 tgs: 59 data_time: 0.85s time: 529.85s eta: 3 days, 14:32:33
262
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.202 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31590.0 tgs: 59 data_time: 0.83s time: 529.63s eta: 3 days, 14:21:30
263
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.234 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31967.0 tgs: 59 data_time: 0.79s time: 535.77s eta: 3 days, 15:12:40
264
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.330 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31084.0 tgs: 58 data_time: 0.90s time: 534.04s eta: 3 days, 14:46:51
265
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.146 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31694.0 tgs: 59 data_time: 0.86s time: 533.28s eta: 3 days, 14:30:33
266
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.175 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32132.0 tgs: 60 data_time: 0.89s time: 529.42s eta: 3 days, 13:44:11
267
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.218 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31935.0 tgs: 60 data_time: 0.77s time: 529.04s eta: 3 days, 13:31:42
268
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.173 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31576.0 tgs: 59 data_time: 0.81s time: 533.80s eta: 3 days, 14:08:58
269
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31631.0 tgs: 59 data_time: 0.78s time: 528.94s eta: 3 days, 13:13:03
270
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31904.0 tgs: 60 data_time: 0.82s time: 530.39s eta: 3 days, 13:18:16
271
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32061.0 tgs: 59 data_time: 0.81s time: 543.37s eta: 3 days, 15:14:30
272
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.251 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30810.0 tgs: 57 data_time: 0.71s time: 536.34s eta: 3 days, 13:57:48
273
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31204.0 tgs: 58 data_time: 0.93s time: 530.95s eta: 3 days, 12:57:05
274
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.213 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31557.0 tgs: 59 data_time: 1.17s time: 529.95s eta: 3 days, 12:38:42
275
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.234 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31964.0 tgs: 59 data_time: 0.72s time: 534.34s eta: 3 days, 13:11:51
276
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31859.0 tgs: 60 data_time: 0.68s time: 529.94s eta: 3 days, 12:20:54
277
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.208 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32340.0 tgs: 60 data_time: 0.87s time: 537.08s eta: 3 days, 13:20:10
278
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.214 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31453.0 tgs: 59 data_time: 0.47s time: 529.34s eta: 3 days, 11:57:31
279
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31779.0 tgs: 59 data_time: 0.61s time: 535.73s eta: 3 days, 12:49:27
280
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31169.0 tgs: 57 data_time: 0.82s time: 538.83s eta: 3 days, 13:09:52
281
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.160 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 29708.0 tgs: 55 data_time: 0.95s time: 533.66s eta: 3 days, 12:11:59
282
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.271 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31114.0 tgs: 58 data_time: 1.01s time: 529.60s eta: 3 days, 11:24:41
283
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32372.0 tgs: 60 data_time: 0.94s time: 531.65s eta: 3 days, 11:35:15
284
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.215 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31554.0 tgs: 59 data_time: 0.96s time: 529.53s eta: 3 days, 11:06:23
285
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.205 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 31042.0 tgs: 58 data_time: 1.02s time: 529.50s eta: 3 days, 10:57:20
286
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31728.0 tgs: 59 data_time: 0.82s time: 529.82s eta: 3 days, 10:51:29
287
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31816.0 tgs: 59 data_time: 0.72s time: 534.71s eta: 3 days, 11:28:26
288
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.224 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30775.0 tgs: 58 data_time: 0.65s time: 529.77s eta: 3 days, 10:33:21
289
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31569.0 tgs: 59 data_time: 0.50s time: 530.09s eta: 3 days, 10:27:28
290
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.206 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31161.0 tgs: 58 data_time: 0.71s time: 529.93s eta: 3 days, 10:17:10
291
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31655.0 tgs: 59 data_time: 1.00s time: 533.32s eta: 3 days, 10:39:53
292
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31369.0 tgs: 59 data_time: 0.81s time: 529.20s eta: 3 days, 9:52:44
293
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.124 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32447.0 tgs: 61 data_time: 0.85s time: 529.66s eta: 3 days, 9:48:08
294
+ [XTuner][RANK 15][DP 3][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31783.0 tgs: 59 data_time: 0.92s time: 530.42s eta: 3 days, 9:46:22
20250121104251/rank18.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.84s
12
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 121.40 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.227 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31743.0 tgs: 57 data_time: 1.81s time: 550.06s eta: 3 days, 18:36:28
257
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.331 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32110.0 tgs: 60 data_time: 1.05s time: 529.60s eta: 3 days, 15:05:20
258
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.201 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31536.0 tgs: 59 data_time: 0.89s time: 529.10s eta: 3 days, 14:51:40
259
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.218 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31549.0 tgs: 59 data_time: 1.10s time: 529.97s eta: 3 days, 14:51:20
260
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.181 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31125.0 tgs: 58 data_time: 0.88s time: 529.03s eta: 3 days, 14:33:20
261
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.292 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31488.0 tgs: 59 data_time: 1.18s time: 529.87s eta: 3 days, 14:32:43
262
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.190 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30800.0 tgs: 58 data_time: 0.79s time: 529.63s eta: 3 days, 14:21:32
263
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.211 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32301.0 tgs: 60 data_time: 0.94s time: 535.78s eta: 3 days, 15:12:48
264
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.240 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31885.0 tgs: 59 data_time: 0.97s time: 534.00s eta: 3 days, 14:46:31
265
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.187 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32246.0 tgs: 60 data_time: 0.87s time: 533.29s eta: 3 days, 14:30:40
266
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.196 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32255.0 tgs: 60 data_time: 0.75s time: 529.43s eta: 3 days, 13:44:19
267
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.215 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31976.0 tgs: 60 data_time: 0.92s time: 529.01s eta: 3 days, 13:31:25
268
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.198 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32057.0 tgs: 60 data_time: 0.77s time: 533.82s eta: 3 days, 14:09:08
269
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31783.0 tgs: 60 data_time: 0.86s time: 528.95s eta: 3 days, 13:13:09
270
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.178 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30276.0 tgs: 57 data_time: 0.77s time: 530.40s eta: 3 days, 13:18:21
271
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.178 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32240.0 tgs: 59 data_time: 0.78s time: 543.34s eta: 3 days, 15:14:12
272
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32111.0 tgs: 59 data_time: 0.97s time: 536.35s eta: 3 days, 13:57:55
273
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.135 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30872.0 tgs: 58 data_time: 0.80s time: 530.96s eta: 3 days, 12:57:14
274
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31286.0 tgs: 59 data_time: 0.99s time: 529.92s eta: 3 days, 12:38:25
275
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31340.0 tgs: 58 data_time: 0.77s time: 534.36s eta: 3 days, 13:12:02
276
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31187.0 tgs: 58 data_time: 0.89s time: 529.94s eta: 3 days, 12:20:58
277
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31884.0 tgs: 59 data_time: 0.91s time: 537.05s eta: 3 days, 13:19:50
278
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32216.0 tgs: 60 data_time: 0.85s time: 529.35s eta: 3 days, 11:57:40
279
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.155 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31832.0 tgs: 59 data_time: 0.65s time: 535.75s eta: 3 days, 12:49:35
280
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.229 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31974.0 tgs: 59 data_time: 1.02s time: 538.84s eta: 3 days, 13:09:58
281
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.179 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31429.0 tgs: 58 data_time: 0.83s time: 533.64s eta: 3 days, 12:11:48
282
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.204 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31130.0 tgs: 58 data_time: 0.98s time: 529.61s eta: 3 days, 11:24:46
283
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.169 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30936.0 tgs: 58 data_time: 0.60s time: 531.66s eta: 3 days, 11:35:22
284
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.154 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31633.0 tgs: 59 data_time: 0.75s time: 529.50s eta: 3 days, 11:06:07
285
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31970.0 tgs: 60 data_time: 1.03s time: 529.52s eta: 3 days, 10:57:29
286
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 25656.0 tgs: 48 data_time: 0.86s time: 529.83s eta: 3 days, 10:51:33
287
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31702.0 tgs: 59 data_time: 0.97s time: 534.68s eta: 3 days, 11:28:11
288
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31282.0 tgs: 59 data_time: 0.69s time: 529.79s eta: 3 days, 10:33:30
289
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.178 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31475.0 tgs: 59 data_time: 0.66s time: 530.08s eta: 3 days, 10:27:26
290
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31004.0 tgs: 58 data_time: 0.96s time: 529.95s eta: 3 days, 10:17:23
291
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.205 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31063.0 tgs: 58 data_time: 0.91s time: 533.29s eta: 3 days, 10:39:34
292
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31013.0 tgs: 58 data_time: 1.01s time: 529.22s eta: 3 days, 9:52:53
293
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32430.0 tgs: 61 data_time: 0.79s time: 529.66s eta: 3 days, 9:48:11
294
+ [XTuner][RANK 18][DP 4][SP 2][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.179 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32259.0 tgs: 60 data_time: 0.61s time: 530.39s eta: 3 days, 9:46:05
20250121104251/rank19.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.84s
12
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 121.39 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.262 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31743.0 tgs: 57 data_time: 1.80s time: 550.15s eta: 3 days, 18:37:20
257
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.223 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32110.0 tgs: 60 data_time: 0.93s time: 529.50s eta: 3 days, 15:04:22
258
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.242 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31536.0 tgs: 59 data_time: 0.81s time: 529.10s eta: 3 days, 14:51:40
259
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.208 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31549.0 tgs: 59 data_time: 1.05s time: 529.97s eta: 3 days, 14:51:19
260
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.228 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31125.0 tgs: 58 data_time: 0.85s time: 529.03s eta: 3 days, 14:33:20
261
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.206 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31488.0 tgs: 59 data_time: 1.17s time: 529.86s eta: 3 days, 14:32:40
262
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.243 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30800.0 tgs: 58 data_time: 0.76s time: 529.63s eta: 3 days, 14:21:35
263
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.202 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32301.0 tgs: 60 data_time: 0.93s time: 535.78s eta: 3 days, 15:12:49
264
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.185 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31885.0 tgs: 59 data_time: 0.90s time: 534.00s eta: 3 days, 14:46:30
265
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.208 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32246.0 tgs: 60 data_time: 0.80s time: 533.29s eta: 3 days, 14:30:40
266
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.190 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32255.0 tgs: 60 data_time: 0.73s time: 529.43s eta: 3 days, 13:44:19
267
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.253 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31976.0 tgs: 60 data_time: 0.83s time: 529.02s eta: 3 days, 13:31:27
268
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.223 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32057.0 tgs: 60 data_time: 0.71s time: 533.82s eta: 3 days, 14:09:06
269
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.229 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31783.0 tgs: 60 data_time: 0.76s time: 528.95s eta: 3 days, 13:13:08
270
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.254 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30276.0 tgs: 57 data_time: 0.74s time: 530.40s eta: 3 days, 13:18:22
271
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32240.0 tgs: 59 data_time: 0.74s time: 543.34s eta: 3 days, 15:14:12
272
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.204 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32111.0 tgs: 59 data_time: 0.87s time: 536.36s eta: 3 days, 13:57:58
273
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30872.0 tgs: 58 data_time: 0.74s time: 530.96s eta: 3 days, 12:57:11
274
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31286.0 tgs: 59 data_time: 0.94s time: 529.93s eta: 3 days, 12:38:27
275
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31340.0 tgs: 58 data_time: 0.67s time: 534.35s eta: 3 days, 13:11:57
276
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.211 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31187.0 tgs: 58 data_time: 0.83s time: 529.95s eta: 3 days, 12:21:03
277
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31884.0 tgs: 59 data_time: 0.88s time: 537.05s eta: 3 days, 13:19:51
278
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32216.0 tgs: 60 data_time: 0.84s time: 529.35s eta: 3 days, 11:57:38
279
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31832.0 tgs: 59 data_time: 0.62s time: 535.75s eta: 3 days, 12:49:35
280
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.202 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31974.0 tgs: 59 data_time: 0.92s time: 538.84s eta: 3 days, 13:09:59
281
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31429.0 tgs: 58 data_time: 0.73s time: 533.64s eta: 3 days, 12:11:46
282
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31130.0 tgs: 58 data_time: 0.91s time: 529.60s eta: 3 days, 11:24:44
283
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.145 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30936.0 tgs: 58 data_time: 0.57s time: 531.67s eta: 3 days, 11:35:23
284
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.142 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31633.0 tgs: 59 data_time: 0.69s time: 529.50s eta: 3 days, 11:06:07
285
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31970.0 tgs: 60 data_time: 0.96s time: 529.52s eta: 3 days, 10:57:28
286
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.136 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 25656.0 tgs: 48 data_time: 0.78s time: 529.83s eta: 3 days, 10:51:32
287
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31702.0 tgs: 59 data_time: 0.86s time: 534.68s eta: 3 days, 11:28:07
288
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31282.0 tgs: 59 data_time: 0.64s time: 529.79s eta: 3 days, 10:33:31
289
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.164 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31475.0 tgs: 59 data_time: 0.59s time: 530.10s eta: 3 days, 10:27:33
290
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.178 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31004.0 tgs: 58 data_time: 0.86s time: 529.95s eta: 3 days, 10:17:20
291
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31063.0 tgs: 58 data_time: 0.88s time: 533.29s eta: 3 days, 10:39:36
292
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.136 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31013.0 tgs: 58 data_time: 0.92s time: 529.22s eta: 3 days, 9:52:53
293
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.202 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32430.0 tgs: 61 data_time: 0.74s time: 529.66s eta: 3 days, 9:48:11
294
+ [XTuner][RANK 19][DP 4][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32259.0 tgs: 60 data_time: 0.56s time: 530.39s eta: 3 days, 9:46:05
20250121104251/rank2.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:13][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.87s
12
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 131.04 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.293 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.7GB text_tokens: 31548.0 tgs: 57 data_time: 1.97s time: 551.44s eta: 3 days, 18:50:01
257
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.292 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32129.0 tgs: 60 data_time: 0.78s time: 529.64s eta: 3 days, 15:05:48
258
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.243 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31947.0 tgs: 60 data_time: 0.85s time: 529.07s eta: 3 days, 14:51:21
259
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.235 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31471.0 tgs: 59 data_time: 0.69s time: 529.94s eta: 3 days, 14:51:03
260
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.203 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32503.0 tgs: 61 data_time: 0.73s time: 529.06s eta: 3 days, 14:33:36
261
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.206 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31133.0 tgs: 58 data_time: 0.85s time: 529.86s eta: 3 days, 14:32:37
262
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.225 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31847.0 tgs: 60 data_time: 1.17s time: 529.60s eta: 3 days, 14:21:15
263
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.200 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32030.0 tgs: 59 data_time: 0.85s time: 535.76s eta: 3 days, 15:12:34
264
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.237 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31786.0 tgs: 59 data_time: 0.70s time: 534.08s eta: 3 days, 14:47:16
265
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.209 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32078.0 tgs: 60 data_time: 1.05s time: 533.26s eta: 3 days, 14:30:24
266
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.184 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32240.0 tgs: 60 data_time: 0.81s time: 529.41s eta: 3 days, 13:44:03
267
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.221 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31632.0 tgs: 59 data_time: 0.78s time: 529.07s eta: 3 days, 13:31:59
268
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.273 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 40.6GB text_tokens: 30691.0 tgs: 57 data_time: 0.74s time: 533.79s eta: 3 days, 14:08:50
269
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.222 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32119.0 tgs: 60 data_time: 0.83s time: 528.92s eta: 3 days, 13:12:54
270
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.231 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31154.0 tgs: 58 data_time: 0.65s time: 530.38s eta: 3 days, 13:18:07
271
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.206 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31896.0 tgs: 58 data_time: 0.61s time: 543.41s eta: 3 days, 15:14:50
272
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.210 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31357.0 tgs: 58 data_time: 0.88s time: 536.33s eta: 3 days, 13:57:42
273
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30112.0 tgs: 56 data_time: 0.71s time: 530.93s eta: 3 days, 12:56:55
274
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31775.0 tgs: 59 data_time: 0.87s time: 529.99s eta: 3 days, 12:39:04
275
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32263.0 tgs: 60 data_time: 0.79s time: 534.33s eta: 3 days, 13:11:45
276
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.231 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31621.0 tgs: 59 data_time: 1.07s time: 529.92s eta: 3 days, 12:20:42
277
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.221 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32165.0 tgs: 59 data_time: 1.05s time: 537.11s eta: 3 days, 13:20:24
278
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31731.0 tgs: 59 data_time: 0.59s time: 529.32s eta: 3 days, 11:57:23
279
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31953.0 tgs: 59 data_time: 0.72s time: 535.72s eta: 3 days, 12:49:19
280
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.242 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32071.0 tgs: 59 data_time: 0.80s time: 538.81s eta: 3 days, 13:09:42
281
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32232.0 tgs: 60 data_time: 0.61s time: 533.72s eta: 3 days, 12:12:31
282
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31872.0 tgs: 60 data_time: 0.70s time: 529.58s eta: 3 days, 11:24:32
283
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.164 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31715.0 tgs: 59 data_time: 0.81s time: 531.64s eta: 3 days, 11:35:05
284
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.220 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31142.0 tgs: 58 data_time: 0.70s time: 529.56s eta: 3 days, 11:06:42
285
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31246.0 tgs: 59 data_time: 1.07s time: 529.49s eta: 3 days, 10:57:09
286
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.135 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31427.0 tgs: 59 data_time: 0.88s time: 529.81s eta: 3 days, 10:51:21
287
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31009.0 tgs: 57 data_time: 0.83s time: 534.69s eta: 3 days, 11:28:15
288
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.192 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31125.0 tgs: 58 data_time: 0.70s time: 529.82s eta: 3 days, 10:33:47
289
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.217 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32138.0 tgs: 60 data_time: 0.91s time: 530.06s eta: 3 days, 10:27:15
290
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.224 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31588.0 tgs: 59 data_time: 0.93s time: 529.92s eta: 3 days, 10:17:05
291
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31748.0 tgs: 59 data_time: 0.93s time: 533.36s eta: 3 days, 10:40:13
292
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32016.0 tgs: 60 data_time: 0.93s time: 529.18s eta: 3 days, 9:52:34
293
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.172 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31540.0 tgs: 59 data_time: 0.74s time: 529.64s eta: 3 days, 9:47:57
294
+ [XTuner][RANK 2][DP 0][SP 2][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 30666.0 tgs: 57 data_time: 0.80s time: 530.46s eta: 3 days, 9:46:43
20250121104251/rank20.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.90s
12
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 121.38 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.229 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.3GB text_tokens: 29620.0 tgs: 53 data_time: 1.86s time: 550.07s eta: 3 days, 18:36:33
257
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.290 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32369.0 tgs: 61 data_time: 0.93s time: 529.48s eta: 3 days, 15:04:13
258
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.260 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32546.0 tgs: 61 data_time: 0.82s time: 529.10s eta: 3 days, 14:51:40
259
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.162 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32411.0 tgs: 61 data_time: 1.14s time: 529.97s eta: 3 days, 14:51:20
260
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.194 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30927.0 tgs: 58 data_time: 0.91s time: 529.03s eta: 3 days, 14:33:20
261
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.318 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32427.0 tgs: 61 data_time: 1.24s time: 529.86s eta: 3 days, 14:32:38
262
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.172 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31527.0 tgs: 59 data_time: 1.15s time: 529.64s eta: 3 days, 14:21:37
263
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.146 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31799.0 tgs: 59 data_time: 0.69s time: 535.78s eta: 3 days, 15:12:47
264
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.284 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30776.0 tgs: 57 data_time: 0.74s time: 534.00s eta: 3 days, 14:46:30
265
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.201 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31699.0 tgs: 59 data_time: 0.75s time: 533.29s eta: 3 days, 14:30:40
266
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.274 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31472.0 tgs: 59 data_time: 0.95s time: 529.43s eta: 3 days, 13:44:19
267
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.211 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31345.0 tgs: 59 data_time: 0.70s time: 529.01s eta: 3 days, 13:31:26
268
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.203 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30795.0 tgs: 57 data_time: 0.81s time: 533.81s eta: 3 days, 14:09:06
269
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31328.0 tgs: 59 data_time: 0.75s time: 528.95s eta: 3 days, 13:13:10
270
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32265.0 tgs: 60 data_time: 0.88s time: 530.41s eta: 3 days, 13:18:29
271
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 29937.0 tgs: 55 data_time: 0.80s time: 543.33s eta: 3 days, 15:14:05
272
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.219 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30786.0 tgs: 57 data_time: 0.79s time: 536.35s eta: 3 days, 13:57:55
273
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31670.0 tgs: 59 data_time: 0.96s time: 530.96s eta: 3 days, 12:57:12
274
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31374.0 tgs: 59 data_time: 0.65s time: 529.92s eta: 3 days, 12:38:26
275
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.196 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32080.0 tgs: 60 data_time: 0.80s time: 534.36s eta: 3 days, 13:12:02
276
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.133 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 30100.0 tgs: 56 data_time: 0.81s time: 529.95s eta: 3 days, 12:21:00
277
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.177 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31525.0 tgs: 58 data_time: 0.93s time: 537.05s eta: 3 days, 13:19:49
278
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.264 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32200.0 tgs: 60 data_time: 0.91s time: 529.35s eta: 3 days, 11:57:40
279
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31533.0 tgs: 58 data_time: 1.07s time: 535.74s eta: 3 days, 12:49:33
280
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.232 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31572.0 tgs: 58 data_time: 0.83s time: 538.84s eta: 3 days, 13:09:59
281
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31711.0 tgs: 59 data_time: 0.61s time: 533.64s eta: 3 days, 12:11:46
282
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31027.0 tgs: 58 data_time: 0.82s time: 529.61s eta: 3 days, 11:24:46
283
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31174.0 tgs: 58 data_time: 1.12s time: 531.67s eta: 3 days, 11:35:23
284
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31786.0 tgs: 60 data_time: 0.91s time: 529.50s eta: 3 days, 11:06:07
285
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31831.0 tgs: 60 data_time: 0.95s time: 529.52s eta: 3 days, 10:57:28
286
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31822.0 tgs: 60 data_time: 0.95s time: 529.83s eta: 3 days, 10:51:34
287
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31024.0 tgs: 58 data_time: 0.76s time: 534.67s eta: 3 days, 11:28:06
288
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.215 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31792.0 tgs: 60 data_time: 1.08s time: 529.79s eta: 3 days, 10:33:33
289
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.145 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31319.0 tgs: 59 data_time: 0.66s time: 530.09s eta: 3 days, 10:27:30
290
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.148 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32256.0 tgs: 60 data_time: 0.64s time: 529.95s eta: 3 days, 10:17:22
291
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.174 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32359.0 tgs: 60 data_time: 0.92s time: 533.29s eta: 3 days, 10:39:36
292
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31433.0 tgs: 59 data_time: 0.99s time: 529.21s eta: 3 days, 9:52:51
293
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.120 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30457.0 tgs: 57 data_time: 0.89s time: 529.67s eta: 3 days, 9:48:15
294
+ [XTuner][RANK 20][DP 5][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32061.0 tgs: 60 data_time: 0.93s time: 530.39s eta: 3 days, 9:46:03
20250121104251/rank22.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.84s
12
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:37][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:44:38][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 121.39 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.256 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.3GB text_tokens: 29620.0 tgs: 53 data_time: 1.81s time: 550.13s eta: 3 days, 18:37:07
257
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.266 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32369.0 tgs: 61 data_time: 0.77s time: 529.52s eta: 3 days, 15:04:35
258
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.213 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32546.0 tgs: 61 data_time: 0.76s time: 529.10s eta: 3 days, 14:51:40
259
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.210 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32411.0 tgs: 61 data_time: 1.04s time: 529.96s eta: 3 days, 14:51:18
260
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.225 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30927.0 tgs: 58 data_time: 0.88s time: 529.04s eta: 3 days, 14:33:22
261
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.236 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32427.0 tgs: 61 data_time: 1.15s time: 529.86s eta: 3 days, 14:32:40
262
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.182 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31527.0 tgs: 59 data_time: 1.09s time: 529.64s eta: 3 days, 14:21:36
263
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.300 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31799.0 tgs: 59 data_time: 0.66s time: 535.78s eta: 3 days, 15:12:47
264
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.210 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30776.0 tgs: 57 data_time: 0.71s time: 534.00s eta: 3 days, 14:46:31
265
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.226 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31699.0 tgs: 59 data_time: 0.69s time: 533.29s eta: 3 days, 14:30:40
266
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.242 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31472.0 tgs: 59 data_time: 0.92s time: 529.43s eta: 3 days, 13:44:19
267
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.185 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31345.0 tgs: 59 data_time: 0.67s time: 529.02s eta: 3 days, 13:31:27
268
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.161 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30795.0 tgs: 57 data_time: 0.73s time: 533.81s eta: 3 days, 14:09:04
269
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.224 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31328.0 tgs: 59 data_time: 0.71s time: 528.95s eta: 3 days, 13:13:10
270
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.231 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32265.0 tgs: 60 data_time: 0.81s time: 530.40s eta: 3 days, 13:18:22
271
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.158 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 29937.0 tgs: 55 data_time: 0.76s time: 543.34s eta: 3 days, 15:14:12
272
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.220 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30786.0 tgs: 57 data_time: 0.72s time: 536.35s eta: 3 days, 13:57:56
273
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31670.0 tgs: 59 data_time: 0.81s time: 530.96s eta: 3 days, 12:57:14
274
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.219 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31374.0 tgs: 59 data_time: 0.57s time: 529.92s eta: 3 days, 12:38:24
275
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32080.0 tgs: 60 data_time: 0.74s time: 534.36s eta: 3 days, 13:12:01
276
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 30100.0 tgs: 56 data_time: 0.76s time: 529.95s eta: 3 days, 12:20:59
277
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.216 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31525.0 tgs: 58 data_time: 0.86s time: 537.05s eta: 3 days, 13:19:55
278
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32200.0 tgs: 60 data_time: 0.88s time: 529.35s eta: 3 days, 11:57:37
279
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31533.0 tgs: 58 data_time: 0.99s time: 535.74s eta: 3 days, 12:49:33
280
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.213 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31572.0 tgs: 58 data_time: 0.73s time: 538.84s eta: 3 days, 13:10:01
281
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.218 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31711.0 tgs: 59 data_time: 0.56s time: 533.64s eta: 3 days, 12:11:46
282
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31027.0 tgs: 58 data_time: 0.78s time: 529.61s eta: 3 days, 11:24:46
283
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.248 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31174.0 tgs: 58 data_time: 1.01s time: 531.66s eta: 3 days, 11:35:22
284
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.138 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31786.0 tgs: 60 data_time: 0.88s time: 529.52s eta: 3 days, 11:06:15
285
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31831.0 tgs: 60 data_time: 0.90s time: 529.50s eta: 3 days, 10:57:19
286
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31822.0 tgs: 60 data_time: 0.84s time: 529.83s eta: 3 days, 10:51:32
287
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31024.0 tgs: 58 data_time: 0.73s time: 534.68s eta: 3 days, 11:28:07
288
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31792.0 tgs: 60 data_time: 1.00s time: 529.79s eta: 3 days, 10:33:31
289
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.143 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31319.0 tgs: 59 data_time: 0.61s time: 530.10s eta: 3 days, 10:27:33
290
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.132 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32256.0 tgs: 60 data_time: 0.62s time: 529.95s eta: 3 days, 10:17:21
291
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32359.0 tgs: 60 data_time: 0.85s time: 533.29s eta: 3 days, 10:39:36
292
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31433.0 tgs: 59 data_time: 0.90s time: 529.21s eta: 3 days, 9:52:51
293
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.195 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30457.0 tgs: 57 data_time: 0.84s time: 529.66s eta: 3 days, 9:48:12
294
+ [XTuner][RANK 22][DP 5][SP 2][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.160 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32061.0 tgs: 60 data_time: 0.85s time: 530.39s eta: 3 days, 9:46:06
20250121104251/rank26.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:13][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.84s
12
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 129.26 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.223 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31349.0 tgs: 56 data_time: 1.81s time: 551.06s eta: 3 days, 18:46:17
257
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.206 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32045.0 tgs: 60 data_time: 0.57s time: 529.53s eta: 3 days, 15:04:41
258
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.232 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31775.0 tgs: 60 data_time: 0.92s time: 529.11s eta: 3 days, 14:51:41
259
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.195 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31442.0 tgs: 59 data_time: 0.84s time: 529.96s eta: 3 days, 14:51:18
260
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.184 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31310.0 tgs: 59 data_time: 0.75s time: 529.03s eta: 3 days, 14:33:20
261
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.193 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31970.0 tgs: 60 data_time: 0.85s time: 529.86s eta: 3 days, 14:32:39
262
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.193 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32046.0 tgs: 60 data_time: 0.86s time: 529.63s eta: 3 days, 14:21:35
263
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.188 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32280.0 tgs: 60 data_time: 0.67s time: 535.79s eta: 3 days, 15:12:51
264
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.224 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32093.0 tgs: 60 data_time: 1.00s time: 534.00s eta: 3 days, 14:46:29
265
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.189 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32007.0 tgs: 60 data_time: 0.54s time: 533.29s eta: 3 days, 14:30:40
266
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.176 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32301.0 tgs: 61 data_time: 0.77s time: 529.44s eta: 3 days, 13:44:26
267
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.196 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 40.8GB text_tokens: 30685.0 tgs: 58 data_time: 0.81s time: 529.00s eta: 3 days, 13:31:17
268
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.144 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30776.0 tgs: 57 data_time: 0.87s time: 533.83s eta: 3 days, 14:09:12
269
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.187 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31850.0 tgs: 60 data_time: 0.72s time: 528.94s eta: 3 days, 13:13:06
270
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32016.0 tgs: 60 data_time: 0.76s time: 530.40s eta: 3 days, 13:18:22
271
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31587.0 tgs: 58 data_time: 1.02s time: 543.34s eta: 3 days, 15:14:10
272
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30793.0 tgs: 57 data_time: 0.75s time: 536.36s eta: 3 days, 13:57:59
273
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31008.0 tgs: 58 data_time: 0.91s time: 530.96s eta: 3 days, 12:57:11
274
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.217 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32462.0 tgs: 61 data_time: 0.88s time: 529.91s eta: 3 days, 12:38:19
275
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.232 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32193.0 tgs: 60 data_time: 0.67s time: 534.36s eta: 3 days, 13:12:02
276
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.164 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31420.0 tgs: 59 data_time: 0.79s time: 529.95s eta: 3 days, 12:21:02
277
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.249 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31579.0 tgs: 58 data_time: 0.93s time: 537.04s eta: 3 days, 13:19:49
278
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.208 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31823.0 tgs: 60 data_time: 0.81s time: 529.35s eta: 3 days, 11:57:39
279
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30152.0 tgs: 56 data_time: 0.81s time: 535.75s eta: 3 days, 12:49:36
280
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.160 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31164.0 tgs: 57 data_time: 0.93s time: 538.84s eta: 3 days, 13:10:00
281
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.158 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31740.0 tgs: 59 data_time: 0.84s time: 533.64s eta: 3 days, 12:11:46
282
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.227 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32120.0 tgs: 60 data_time: 0.93s time: 529.61s eta: 3 days, 11:24:49
283
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30357.0 tgs: 57 data_time: 0.77s time: 531.66s eta: 3 days, 11:35:22
284
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32083.0 tgs: 60 data_time: 0.89s time: 529.49s eta: 3 days, 11:06:04
285
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.170 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31963.0 tgs: 60 data_time: 0.87s time: 529.52s eta: 3 days, 10:57:26
286
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.153 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31960.0 tgs: 60 data_time: 0.81s time: 529.83s eta: 3 days, 10:51:34
287
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31654.0 tgs: 59 data_time: 0.82s time: 534.67s eta: 3 days, 11:28:05
288
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.177 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31502.0 tgs: 59 data_time: 0.65s time: 529.79s eta: 3 days, 10:33:34
289
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31836.0 tgs: 60 data_time: 0.88s time: 530.09s eta: 3 days, 10:27:30
290
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.174 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31221.0 tgs: 58 data_time: 0.71s time: 529.95s eta: 3 days, 10:17:23
291
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31303.0 tgs: 58 data_time: 0.94s time: 533.29s eta: 3 days, 10:39:35
292
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.214 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31855.0 tgs: 60 data_time: 1.02s time: 529.22s eta: 3 days, 9:52:53
293
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.202 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30932.0 tgs: 58 data_time: 0.71s time: 529.66s eta: 3 days, 9:48:12
294
+ [XTuner][RANK 26][DP 6][SP 2][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.213 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31919.0 tgs: 60 data_time: 0.62s time: 530.39s eta: 3 days, 9:46:05
20250121104251/rank3.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:13][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.86s
12
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:44:28][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 130.89 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.241 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.7GB text_tokens: 31548.0 tgs: 57 data_time: 1.95s time: 550.99s eta: 3 days, 18:45:36
257
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.246 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32129.0 tgs: 60 data_time: 0.77s time: 529.59s eta: 3 days, 15:05:17
258
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.237 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31947.0 tgs: 60 data_time: 0.83s time: 529.08s eta: 3 days, 14:51:24
259
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.184 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31471.0 tgs: 59 data_time: 0.70s time: 529.94s eta: 3 days, 14:51:01
260
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.211 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32503.0 tgs: 61 data_time: 0.74s time: 529.06s eta: 3 days, 14:33:35
261
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.322 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31133.0 tgs: 58 data_time: 0.87s time: 529.85s eta: 3 days, 14:32:31
262
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.259 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31847.0 tgs: 60 data_time: 1.20s time: 529.61s eta: 3 days, 14:21:19
263
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.196 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32030.0 tgs: 59 data_time: 0.87s time: 535.76s eta: 3 days, 15:12:32
264
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.208 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31786.0 tgs: 59 data_time: 0.72s time: 534.09s eta: 3 days, 14:47:20
265
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.197 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32078.0 tgs: 60 data_time: 1.06s time: 533.26s eta: 3 days, 14:30:22
266
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.198 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32240.0 tgs: 60 data_time: 0.83s time: 529.41s eta: 3 days, 13:44:05
267
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.219 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31632.0 tgs: 59 data_time: 0.80s time: 529.07s eta: 3 days, 13:32:00
268
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.183 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 40.6GB text_tokens: 30691.0 tgs: 57 data_time: 0.74s time: 533.78s eta: 3 days, 14:08:49
269
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.215 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32119.0 tgs: 60 data_time: 0.85s time: 528.92s eta: 3 days, 13:12:54
270
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31154.0 tgs: 58 data_time: 0.66s time: 530.37s eta: 3 days, 13:18:05
271
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31896.0 tgs: 58 data_time: 0.63s time: 543.41s eta: 3 days, 15:14:52
272
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.130 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31357.0 tgs: 58 data_time: 0.91s time: 536.33s eta: 3 days, 13:57:40
273
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30112.0 tgs: 56 data_time: 0.70s time: 530.93s eta: 3 days, 12:56:57
274
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31775.0 tgs: 59 data_time: 0.86s time: 529.99s eta: 3 days, 12:39:02
275
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.229 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32263.0 tgs: 60 data_time: 0.81s time: 534.33s eta: 3 days, 13:11:43
276
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.271 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31621.0 tgs: 59 data_time: 1.06s time: 529.92s eta: 3 days, 12:20:45
277
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.279 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32165.0 tgs: 59 data_time: 1.07s time: 537.11s eta: 3 days, 13:20:24
278
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.205 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31731.0 tgs: 59 data_time: 0.60s time: 529.32s eta: 3 days, 11:57:23
279
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31953.0 tgs: 59 data_time: 0.72s time: 535.72s eta: 3 days, 12:49:18
280
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.196 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32071.0 tgs: 59 data_time: 0.82s time: 538.82s eta: 3 days, 13:09:46
281
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.123 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32232.0 tgs: 60 data_time: 0.62s time: 533.71s eta: 3 days, 12:12:28
282
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31872.0 tgs: 60 data_time: 0.69s time: 529.58s eta: 3 days, 11:24:30
283
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31715.0 tgs: 59 data_time: 0.79s time: 531.64s eta: 3 days, 11:35:10
284
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31142.0 tgs: 58 data_time: 0.69s time: 529.56s eta: 3 days, 11:06:38
285
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31246.0 tgs: 59 data_time: 1.11s time: 529.49s eta: 3 days, 10:57:10
286
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31427.0 tgs: 59 data_time: 0.90s time: 529.80s eta: 3 days, 10:51:19
287
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.218 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31009.0 tgs: 57 data_time: 0.84s time: 534.69s eta: 3 days, 11:28:17
288
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31125.0 tgs: 58 data_time: 0.70s time: 529.82s eta: 3 days, 10:33:46
289
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32138.0 tgs: 60 data_time: 0.89s time: 530.06s eta: 3 days, 10:27:15
290
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.153 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31588.0 tgs: 59 data_time: 0.93s time: 529.92s eta: 3 days, 10:17:07
291
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31748.0 tgs: 59 data_time: 0.89s time: 533.35s eta: 3 days, 10:40:09
292
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.206 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32016.0 tgs: 60 data_time: 0.92s time: 529.18s eta: 3 days, 9:52:35
293
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31540.0 tgs: 59 data_time: 0.75s time: 529.64s eta: 3 days, 9:47:57
294
+ [XTuner][RANK 3][DP 0][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.163 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 30666.0 tgs: 57 data_time: 0.80s time: 530.46s eta: 3 days, 9:46:44
20250121104251/rank32.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.82s
12
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.81 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.283 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31323.0 tgs: 56 data_time: 1.93s time: 550.86s eta: 3 days, 18:44:22
257
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.230 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31388.0 tgs: 59 data_time: 0.81s time: 529.55s eta: 3 days, 15:04:50
258
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.231 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31981.0 tgs: 60 data_time: 1.01s time: 529.10s eta: 3 days, 14:51:39
259
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.195 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31503.0 tgs: 59 data_time: 1.26s time: 529.96s eta: 3 days, 14:51:16
260
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.162 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31606.0 tgs: 59 data_time: 0.65s time: 529.04s eta: 3 days, 14:33:22
261
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.203 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31195.0 tgs: 58 data_time: 0.76s time: 529.87s eta: 3 days, 14:32:40
262
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.206 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31593.0 tgs: 59 data_time: 0.74s time: 529.63s eta: 3 days, 14:21:32
263
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.185 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31176.0 tgs: 58 data_time: 0.65s time: 535.78s eta: 3 days, 15:12:48
264
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.225 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32152.0 tgs: 60 data_time: 0.68s time: 534.01s eta: 3 days, 14:46:35
265
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.187 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 40.5GB text_tokens: 29940.0 tgs: 56 data_time: 0.65s time: 533.29s eta: 3 days, 14:30:38
266
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.293 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31618.0 tgs: 59 data_time: 0.88s time: 529.43s eta: 3 days, 13:44:20
267
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.202 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31523.0 tgs: 59 data_time: 0.63s time: 529.02s eta: 3 days, 13:31:26
268
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.220 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31105.0 tgs: 58 data_time: 0.82s time: 533.81s eta: 3 days, 14:09:05
269
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31143.0 tgs: 58 data_time: 0.50s time: 528.95s eta: 3 days, 13:13:09
270
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.200 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31851.0 tgs: 60 data_time: 0.57s time: 530.40s eta: 3 days, 13:18:20
271
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31775.0 tgs: 58 data_time: 0.86s time: 543.35s eta: 3 days, 15:14:15
272
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.301 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31815.0 tgs: 59 data_time: 0.90s time: 536.35s eta: 3 days, 13:57:54
273
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31366.0 tgs: 59 data_time: 0.77s time: 530.96s eta: 3 days, 12:57:11
274
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.226 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31364.0 tgs: 59 data_time: 0.83s time: 529.93s eta: 3 days, 12:38:27
275
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.180 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 32004.0 tgs: 59 data_time: 0.94s time: 534.35s eta: 3 days, 13:11:58
276
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.158 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31624.0 tgs: 59 data_time: 0.83s time: 529.95s eta: 3 days, 12:20:58
277
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.132 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31087.0 tgs: 57 data_time: 0.67s time: 537.06s eta: 3 days, 13:19:58
278
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.132 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31662.0 tgs: 59 data_time: 0.57s time: 529.35s eta: 3 days, 11:57:37
279
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 31237.0 tgs: 58 data_time: 0.81s time: 535.74s eta: 3 days, 12:49:33
280
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31187.0 tgs: 57 data_time: 0.75s time: 538.84s eta: 3 days, 13:09:57
281
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.140 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31088.0 tgs: 58 data_time: 0.66s time: 533.64s eta: 3 days, 12:11:46
282
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31662.0 tgs: 59 data_time: 0.77s time: 529.60s eta: 3 days, 11:24:44
283
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.267 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32039.0 tgs: 60 data_time: 0.96s time: 531.66s eta: 3 days, 11:35:21
284
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.155 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32252.0 tgs: 60 data_time: 0.55s time: 529.51s eta: 3 days, 11:06:11
285
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31895.0 tgs: 60 data_time: 0.97s time: 529.51s eta: 3 days, 10:57:26
286
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31516.0 tgs: 59 data_time: 0.57s time: 529.84s eta: 3 days, 10:51:39
287
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.140 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 29894.0 tgs: 55 data_time: 0.79s time: 534.67s eta: 3 days, 11:28:06
288
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31299.0 tgs: 59 data_time: 0.85s time: 529.79s eta: 3 days, 10:33:34
289
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.288 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31641.0 tgs: 59 data_time: 0.82s time: 530.08s eta: 3 days, 10:27:27
290
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31053.0 tgs: 58 data_time: 0.54s time: 529.95s eta: 3 days, 10:17:19
291
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.180 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31452.0 tgs: 58 data_time: 0.83s time: 533.29s eta: 3 days, 10:39:38
292
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.143 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31537.0 tgs: 59 data_time: 0.80s time: 529.21s eta: 3 days, 9:52:50
293
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32333.0 tgs: 61 data_time: 0.84s time: 529.66s eta: 3 days, 9:48:10
294
+ [XTuner][RANK 32][DP 8][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.144 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32264.0 tgs: 60 data_time: 0.76s time: 530.40s eta: 3 days, 9:46:14
20250121104251/rank36.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.85s
12
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.71 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.339 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.0GB text_tokens: 32281.0 tgs: 58 data_time: 2.26s time: 550.90s eta: 3 days, 18:44:46
257
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.237 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32206.0 tgs: 60 data_time: 0.74s time: 529.50s eta: 3 days, 15:04:26
258
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.217 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32213.0 tgs: 60 data_time: 0.90s time: 529.10s eta: 3 days, 14:51:36
259
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.188 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 40.6GB text_tokens: 29331.0 tgs: 55 data_time: 0.81s time: 529.97s eta: 3 days, 14:51:20
260
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.222 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31870.0 tgs: 60 data_time: 0.84s time: 529.03s eta: 3 days, 14:33:21
261
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.192 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 29472.0 tgs: 55 data_time: 0.67s time: 529.87s eta: 3 days, 14:32:40
262
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.152 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32120.0 tgs: 60 data_time: 0.70s time: 529.63s eta: 3 days, 14:21:32
263
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.351 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30854.0 tgs: 57 data_time: 0.62s time: 535.78s eta: 3 days, 15:12:48
264
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.177 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31900.0 tgs: 59 data_time: 0.74s time: 534.01s eta: 3 days, 14:46:35
265
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.216 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32109.0 tgs: 60 data_time: 0.73s time: 533.29s eta: 3 days, 14:30:38
266
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.220 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31161.0 tgs: 58 data_time: 0.57s time: 529.43s eta: 3 days, 13:44:18
267
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.185 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31536.0 tgs: 59 data_time: 0.76s time: 529.02s eta: 3 days, 13:31:28
268
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.155 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32004.0 tgs: 59 data_time: 0.74s time: 533.81s eta: 3 days, 14:09:05
269
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.214 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32359.0 tgs: 61 data_time: 0.72s time: 528.95s eta: 3 days, 13:13:08
270
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.252 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32408.0 tgs: 61 data_time: 0.97s time: 530.40s eta: 3 days, 13:18:21
271
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.222 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31413.0 tgs: 57 data_time: 0.92s time: 543.35s eta: 3 days, 15:14:15
272
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.209 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31857.0 tgs: 59 data_time: 0.75s time: 536.35s eta: 3 days, 13:57:55
273
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.216 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31570.0 tgs: 59 data_time: 0.78s time: 530.96s eta: 3 days, 12:57:10
274
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31566.0 tgs: 59 data_time: 0.65s time: 529.92s eta: 3 days, 12:38:26
275
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32203.0 tgs: 60 data_time: 0.78s time: 534.35s eta: 3 days, 13:11:58
276
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.129 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30573.0 tgs: 57 data_time: 0.65s time: 529.95s eta: 3 days, 12:20:59
277
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.240 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30164.0 tgs: 56 data_time: 0.67s time: 537.06s eta: 3 days, 13:19:58
278
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31721.0 tgs: 59 data_time: 0.75s time: 529.35s eta: 3 days, 11:57:36
279
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31235.0 tgs: 58 data_time: 0.74s time: 535.74s eta: 3 days, 12:49:33
280
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32083.0 tgs: 59 data_time: 0.66s time: 538.84s eta: 3 days, 13:09:57
281
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31111.0 tgs: 58 data_time: 0.88s time: 533.64s eta: 3 days, 12:11:47
282
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31722.0 tgs: 59 data_time: 0.80s time: 529.61s eta: 3 days, 11:24:46
283
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30903.0 tgs: 58 data_time: 0.78s time: 531.66s eta: 3 days, 11:35:20
284
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.175 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31566.0 tgs: 59 data_time: 0.98s time: 529.51s eta: 3 days, 11:06:11
285
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.153 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 28740.0 tgs: 54 data_time: 0.78s time: 529.51s eta: 3 days, 10:57:25
286
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.209 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31943.0 tgs: 60 data_time: 0.59s time: 529.83s eta: 3 days, 10:51:36
287
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.108 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31226.0 tgs: 58 data_time: 0.81s time: 534.68s eta: 3 days, 11:28:08
288
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.179 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31930.0 tgs: 60 data_time: 0.64s time: 529.79s eta: 3 days, 10:33:32
289
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.152 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31763.0 tgs: 59 data_time: 0.84s time: 530.09s eta: 3 days, 10:27:30
290
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32155.0 tgs: 60 data_time: 0.80s time: 529.94s eta: 3 days, 10:17:17
291
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.195 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31819.0 tgs: 59 data_time: 0.86s time: 533.29s eta: 3 days, 10:39:37
292
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32555.0 tgs: 61 data_time: 0.61s time: 529.21s eta: 3 days, 9:52:50
293
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.163 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31507.0 tgs: 59 data_time: 0.72s time: 529.66s eta: 3 days, 9:48:11
294
+ [XTuner][RANK 36][DP 9][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32485.0 tgs: 61 data_time: 0.85s time: 530.40s eta: 3 days, 9:46:14
20250121104251/rank39.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:06][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.86s
12
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:26][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:44:27][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 132.57 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.300 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.0GB text_tokens: 32281.0 tgs: 58 data_time: 2.31s time: 550.97s eta: 3 days, 18:45:23
257
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.233 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32206.0 tgs: 60 data_time: 0.77s time: 529.47s eta: 3 days, 15:04:06
258
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.283 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32213.0 tgs: 60 data_time: 0.91s time: 529.10s eta: 3 days, 14:51:38
259
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.188 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 40.6GB text_tokens: 29331.0 tgs: 55 data_time: 0.84s time: 529.97s eta: 3 days, 14:51:20
260
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.198 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31870.0 tgs: 60 data_time: 0.87s time: 529.04s eta: 3 days, 14:33:23
261
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.213 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 29472.0 tgs: 55 data_time: 0.69s time: 529.86s eta: 3 days, 14:32:36
262
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.186 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32120.0 tgs: 60 data_time: 0.72s time: 529.63s eta: 3 days, 14:21:33
263
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.185 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30854.0 tgs: 57 data_time: 0.65s time: 535.78s eta: 3 days, 15:12:49
264
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.209 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31900.0 tgs: 59 data_time: 0.73s time: 534.02s eta: 3 days, 14:46:42
265
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.180 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32109.0 tgs: 60 data_time: 0.75s time: 533.27s eta: 3 days, 14:30:31
266
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.125 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31161.0 tgs: 58 data_time: 0.60s time: 529.43s eta: 3 days, 13:44:17
267
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.204 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31536.0 tgs: 59 data_time: 0.79s time: 529.02s eta: 3 days, 13:31:28
268
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.196 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32004.0 tgs: 59 data_time: 0.77s time: 533.82s eta: 3 days, 14:09:09
269
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32359.0 tgs: 61 data_time: 0.75s time: 528.95s eta: 3 days, 13:13:09
270
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.242 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32408.0 tgs: 61 data_time: 1.01s time: 530.39s eta: 3 days, 13:18:17
271
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.216 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31413.0 tgs: 57 data_time: 0.96s time: 543.35s eta: 3 days, 15:14:18
272
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.204 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31857.0 tgs: 59 data_time: 0.77s time: 536.35s eta: 3 days, 13:57:54
273
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.218 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31570.0 tgs: 59 data_time: 0.74s time: 530.95s eta: 3 days, 12:57:09
274
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31566.0 tgs: 59 data_time: 0.68s time: 529.93s eta: 3 days, 12:38:30
275
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.179 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32203.0 tgs: 60 data_time: 0.79s time: 534.35s eta: 3 days, 13:11:57
276
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.133 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30573.0 tgs: 57 data_time: 0.65s time: 529.95s eta: 3 days, 12:21:00
277
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30164.0 tgs: 56 data_time: 0.70s time: 537.06s eta: 3 days, 13:19:59
278
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31721.0 tgs: 59 data_time: 0.78s time: 529.34s eta: 3 days, 11:57:34
279
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.192 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31235.0 tgs: 58 data_time: 0.80s time: 535.75s eta: 3 days, 12:49:35
280
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32083.0 tgs: 59 data_time: 0.68s time: 538.84s eta: 3 days, 13:09:58
281
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.243 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31111.0 tgs: 58 data_time: 0.89s time: 533.64s eta: 3 days, 12:11:45
282
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.210 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31722.0 tgs: 59 data_time: 0.83s time: 529.61s eta: 3 days, 11:24:48
283
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30903.0 tgs: 58 data_time: 0.78s time: 531.66s eta: 3 days, 11:35:21
284
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.214 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31566.0 tgs: 59 data_time: 1.03s time: 529.51s eta: 3 days, 11:06:12
285
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 28740.0 tgs: 54 data_time: 0.80s time: 529.51s eta: 3 days, 10:57:24
286
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31943.0 tgs: 60 data_time: 0.60s time: 529.83s eta: 3 days, 10:51:32
287
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.143 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31226.0 tgs: 58 data_time: 0.84s time: 534.68s eta: 3 days, 11:28:10
288
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.145 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31930.0 tgs: 60 data_time: 0.65s time: 529.79s eta: 3 days, 10:33:33
289
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.172 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31763.0 tgs: 59 data_time: 0.88s time: 530.09s eta: 3 days, 10:27:28
290
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32155.0 tgs: 60 data_time: 0.82s time: 529.95s eta: 3 days, 10:17:22
291
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.165 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31819.0 tgs: 59 data_time: 0.88s time: 533.29s eta: 3 days, 10:39:35
292
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.169 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32555.0 tgs: 61 data_time: 0.64s time: 529.21s eta: 3 days, 9:52:49
293
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.196 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31507.0 tgs: 59 data_time: 0.75s time: 529.66s eta: 3 days, 9:48:12
294
+ [XTuner][RANK 39][DP 9][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32485.0 tgs: 61 data_time: 0.87s time: 530.41s eta: 3 days, 9:46:14
20250121104251/rank48.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:42:57][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 23.01s
12
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:29][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:44:30][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 129.28 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.308 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31357.0 tgs: 57 data_time: 1.92s time: 550.03s eta: 3 days, 18:36:06
257
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.216 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32248.0 tgs: 60 data_time: 0.90s time: 529.58s eta: 3 days, 15:05:14
258
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.236 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31853.0 tgs: 60 data_time: 1.03s time: 529.08s eta: 3 days, 14:51:27
259
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.214 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32083.0 tgs: 60 data_time: 0.86s time: 529.95s eta: 3 days, 14:51:10
260
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.197 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32125.0 tgs: 60 data_time: 0.96s time: 529.05s eta: 3 days, 14:33:30
261
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.178 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31521.0 tgs: 59 data_time: 0.77s time: 529.85s eta: 3 days, 14:32:30
262
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.167 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31512.0 tgs: 59 data_time: 0.87s time: 529.62s eta: 3 days, 14:21:26
263
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.386 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31822.0 tgs: 59 data_time: 0.95s time: 535.77s eta: 3 days, 15:12:38
264
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.233 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31908.0 tgs: 59 data_time: 0.74s time: 534.06s eta: 3 days, 14:47:03
265
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.185 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32433.0 tgs: 60 data_time: 0.79s time: 533.27s eta: 3 days, 14:30:28
266
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.176 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32333.0 tgs: 61 data_time: 0.68s time: 529.42s eta: 3 days, 13:44:10
267
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.186 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30295.0 tgs: 57 data_time: 0.93s time: 529.05s eta: 3 days, 13:31:44
268
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.175 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31362.0 tgs: 58 data_time: 0.77s time: 533.80s eta: 3 days, 14:08:57
269
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32181.0 tgs: 60 data_time: 0.84s time: 528.93s eta: 3 days, 13:13:00
270
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.219 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31770.0 tgs: 59 data_time: 1.02s time: 530.39s eta: 3 days, 13:18:13
271
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32107.0 tgs: 59 data_time: 0.98s time: 543.38s eta: 3 days, 15:14:36
272
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.235 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31655.0 tgs: 59 data_time: 0.64s time: 536.35s eta: 3 days, 13:57:52
273
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31529.0 tgs: 59 data_time: 0.85s time: 530.93s eta: 3 days, 12:56:57
274
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.144 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31277.0 tgs: 59 data_time: 0.75s time: 529.96s eta: 3 days, 12:38:49
275
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31388.0 tgs: 58 data_time: 0.60s time: 534.34s eta: 3 days, 13:11:50
276
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.196 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32144.0 tgs: 60 data_time: 0.71s time: 529.93s eta: 3 days, 12:20:49
277
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.242 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31305.0 tgs: 58 data_time: 0.67s time: 537.09s eta: 3 days, 13:20:17
278
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.144 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32167.0 tgs: 60 data_time: 0.84s time: 529.33s eta: 3 days, 11:57:29
279
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.232 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32301.0 tgs: 60 data_time: 0.78s time: 535.73s eta: 3 days, 12:49:25
280
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31599.0 tgs: 58 data_time: 0.67s time: 538.83s eta: 3 days, 13:09:51
281
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.244 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32431.0 tgs: 60 data_time: 0.72s time: 533.67s eta: 3 days, 12:12:04
282
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.325 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31812.0 tgs: 60 data_time: 0.83s time: 529.59s eta: 3 days, 11:24:37
283
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.243 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30722.0 tgs: 57 data_time: 0.61s time: 531.65s eta: 3 days, 11:35:14
284
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.141 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 29912.0 tgs: 56 data_time: 0.84s time: 529.54s eta: 3 days, 11:06:30
285
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.186 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31300.0 tgs: 59 data_time: 0.99s time: 529.50s eta: 3 days, 10:57:17
286
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.177 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32379.0 tgs: 61 data_time: 1.03s time: 529.81s eta: 3 days, 10:51:24
287
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.236 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31647.0 tgs: 59 data_time: 0.69s time: 534.71s eta: 3 days, 11:28:25
288
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 31134.0 tgs: 58 data_time: 0.77s time: 529.78s eta: 3 days, 10:33:27
289
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31309.0 tgs: 59 data_time: 0.63s time: 530.08s eta: 3 days, 10:27:23
290
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.210 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31520.0 tgs: 59 data_time: 0.59s time: 529.93s eta: 3 days, 10:17:09
291
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32247.0 tgs: 60 data_time: 0.81s time: 533.35s eta: 3 days, 10:40:07
292
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31693.0 tgs: 59 data_time: 0.88s time: 529.19s eta: 3 days, 9:52:41
293
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32127.0 tgs: 60 data_time: 0.91s time: 529.65s eta: 3 days, 9:48:05
294
+ [XTuner][RANK 48][DP 12][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30901.0 tgs: 58 data_time: 0.63s time: 530.41s eta: 3 days, 9:46:18
20250121104251/rank53.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.82s
12
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.54 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.390 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31804.0 tgs: 57 data_time: 1.76s time: 550.00s eta: 3 days, 18:35:49
257
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.235 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31938.0 tgs: 60 data_time: 0.64s time: 529.57s eta: 3 days, 15:05:04
258
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.220 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31420.0 tgs: 59 data_time: 0.73s time: 529.08s eta: 3 days, 14:51:27
259
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.170 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32320.0 tgs: 60 data_time: 0.85s time: 529.95s eta: 3 days, 14:51:09
260
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.203 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31755.0 tgs: 60 data_time: 0.79s time: 529.05s eta: 3 days, 14:33:30
261
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.284 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31116.0 tgs: 58 data_time: 0.89s time: 529.85s eta: 3 days, 14:32:30
262
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.246 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32297.0 tgs: 60 data_time: 0.82s time: 529.62s eta: 3 days, 14:21:27
263
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.236 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32099.0 tgs: 59 data_time: 0.87s time: 535.76s eta: 3 days, 15:12:37
264
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.254 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31943.0 tgs: 59 data_time: 0.90s time: 534.05s eta: 3 days, 14:47:01
265
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.356 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32321.0 tgs: 60 data_time: 0.81s time: 533.28s eta: 3 days, 14:30:32
266
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.205 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31685.0 tgs: 59 data_time: 1.10s time: 529.41s eta: 3 days, 13:44:08
267
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.203 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31570.0 tgs: 59 data_time: 1.03s time: 529.05s eta: 3 days, 13:31:45
268
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.228 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32271.0 tgs: 60 data_time: 1.00s time: 533.80s eta: 3 days, 14:08:56
269
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.178 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31826.0 tgs: 60 data_time: 0.80s time: 528.93s eta: 3 days, 13:13:01
270
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31152.0 tgs: 58 data_time: 0.71s time: 530.38s eta: 3 days, 13:18:12
271
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.160 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31903.0 tgs: 58 data_time: 0.81s time: 543.39s eta: 3 days, 15:14:37
272
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.243 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30941.0 tgs: 57 data_time: 0.63s time: 536.34s eta: 3 days, 13:57:46
273
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 29085.0 tgs: 54 data_time: 0.63s time: 530.94s eta: 3 days, 12:57:03
274
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.202 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32477.0 tgs: 61 data_time: 1.04s time: 529.96s eta: 3 days, 12:38:47
275
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31064.0 tgs: 58 data_time: 0.70s time: 534.34s eta: 3 days, 13:11:50
276
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.224 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31960.0 tgs: 60 data_time: 0.93s time: 529.93s eta: 3 days, 12:20:51
277
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.205 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31106.0 tgs: 57 data_time: 0.93s time: 537.09s eta: 3 days, 13:20:17
278
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.225 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31337.0 tgs: 59 data_time: 0.96s time: 529.33s eta: 3 days, 11:57:29
279
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.285 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31611.0 tgs: 59 data_time: 1.02s time: 535.73s eta: 3 days, 12:49:28
280
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31573.0 tgs: 58 data_time: 0.65s time: 538.82s eta: 3 days, 13:09:47
281
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.202 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31897.0 tgs: 59 data_time: 1.00s time: 533.67s eta: 3 days, 12:12:04
282
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.183 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32058.0 tgs: 60 data_time: 0.89s time: 529.59s eta: 3 days, 11:24:36
283
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31119.0 tgs: 58 data_time: 0.77s time: 531.65s eta: 3 days, 11:35:14
284
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.155 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31356.0 tgs: 59 data_time: 0.54s time: 529.54s eta: 3 days, 11:06:31
285
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.243 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32225.0 tgs: 60 data_time: 1.02s time: 529.50s eta: 3 days, 10:57:18
286
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31625.0 tgs: 59 data_time: 0.73s time: 529.81s eta: 3 days, 10:51:23
287
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.203 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31942.0 tgs: 59 data_time: 0.82s time: 534.71s eta: 3 days, 11:28:27
288
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.200 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32061.0 tgs: 60 data_time: 0.85s time: 529.78s eta: 3 days, 10:33:25
289
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.164 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 30275.0 tgs: 57 data_time: 0.91s time: 530.08s eta: 3 days, 10:27:22
290
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.233 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31108.0 tgs: 58 data_time: 0.76s time: 529.93s eta: 3 days, 10:17:10
291
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.195 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31957.0 tgs: 59 data_time: 0.73s time: 533.34s eta: 3 days, 10:40:06
292
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.154 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 29801.0 tgs: 56 data_time: 0.91s time: 529.20s eta: 3 days, 9:52:44
293
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31102.0 tgs: 58 data_time: 0.80s time: 529.64s eta: 3 days, 9:48:01
294
+ [XTuner][RANK 53][DP 13][SP 1][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32173.0 tgs: 60 data_time: 0.76s time: 530.41s eta: 3 days, 9:46:20
20250121104251/rank55.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:42:58][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:00][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:01][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:03][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.85s
12
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.54 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.329 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 31804.0 tgs: 57 data_time: 1.73s time: 549.88s eta: 3 days, 18:34:38
257
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.326 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31938.0 tgs: 60 data_time: 0.65s time: 529.70s eta: 3 days, 15:06:21
258
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.226 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31420.0 tgs: 59 data_time: 0.72s time: 529.08s eta: 3 days, 14:51:27
259
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.245 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32320.0 tgs: 60 data_time: 0.88s time: 529.95s eta: 3 days, 14:51:10
260
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.206 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31755.0 tgs: 60 data_time: 0.79s time: 529.05s eta: 3 days, 14:33:30
261
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.240 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31116.0 tgs: 58 data_time: 0.89s time: 529.85s eta: 3 days, 14:32:30
262
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.232 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32297.0 tgs: 60 data_time: 0.77s time: 529.62s eta: 3 days, 14:21:25
263
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.272 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32099.0 tgs: 59 data_time: 0.81s time: 535.77s eta: 3 days, 15:12:42
264
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.256 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31943.0 tgs: 59 data_time: 0.87s time: 534.05s eta: 3 days, 14:46:59
265
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.225 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32321.0 tgs: 60 data_time: 0.77s time: 533.28s eta: 3 days, 14:30:33
266
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.216 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31685.0 tgs: 59 data_time: 1.09s time: 529.41s eta: 3 days, 13:44:08
267
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.263 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31570.0 tgs: 59 data_time: 1.00s time: 529.05s eta: 3 days, 13:31:47
268
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.214 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32271.0 tgs: 60 data_time: 0.97s time: 533.79s eta: 3 days, 14:08:54
269
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.187 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31826.0 tgs: 60 data_time: 0.79s time: 528.94s eta: 3 days, 13:13:02
270
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.149 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31152.0 tgs: 58 data_time: 0.70s time: 530.38s eta: 3 days, 13:18:11
271
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.268 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31903.0 tgs: 58 data_time: 0.80s time: 543.39s eta: 3 days, 15:14:36
272
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30941.0 tgs: 57 data_time: 0.62s time: 536.34s eta: 3 days, 13:57:46
273
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.192 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 29085.0 tgs: 54 data_time: 0.63s time: 530.94s eta: 3 days, 12:57:03
274
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32477.0 tgs: 61 data_time: 1.03s time: 529.96s eta: 3 days, 12:38:47
275
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31064.0 tgs: 58 data_time: 0.69s time: 534.34s eta: 3 days, 13:11:52
276
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31960.0 tgs: 60 data_time: 0.92s time: 529.93s eta: 3 days, 12:20:49
277
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31106.0 tgs: 57 data_time: 0.91s time: 537.09s eta: 3 days, 13:20:17
278
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.201 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31337.0 tgs: 59 data_time: 0.94s time: 529.34s eta: 3 days, 11:57:31
279
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.204 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31611.0 tgs: 59 data_time: 0.92s time: 535.73s eta: 3 days, 12:49:26
280
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31573.0 tgs: 58 data_time: 0.63s time: 538.82s eta: 3 days, 13:09:47
281
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.213 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31897.0 tgs: 59 data_time: 0.99s time: 533.67s eta: 3 days, 12:12:04
282
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.213 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32058.0 tgs: 60 data_time: 0.89s time: 529.59s eta: 3 days, 11:24:36
283
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31119.0 tgs: 58 data_time: 0.76s time: 531.65s eta: 3 days, 11:35:14
284
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.255 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31356.0 tgs: 59 data_time: 0.54s time: 529.54s eta: 3 days, 11:06:31
285
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.227 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32225.0 tgs: 60 data_time: 1.01s time: 529.50s eta: 3 days, 10:57:16
286
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.143 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31625.0 tgs: 59 data_time: 0.73s time: 529.81s eta: 3 days, 10:51:24
287
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31942.0 tgs: 59 data_time: 0.82s time: 534.71s eta: 3 days, 11:28:25
288
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.154 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32061.0 tgs: 60 data_time: 0.84s time: 529.78s eta: 3 days, 10:33:25
289
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.193 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 30275.0 tgs: 57 data_time: 0.90s time: 530.08s eta: 3 days, 10:27:22
290
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.163 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31108.0 tgs: 58 data_time: 0.74s time: 529.93s eta: 3 days, 10:17:13
291
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.215 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31957.0 tgs: 59 data_time: 0.72s time: 533.34s eta: 3 days, 10:40:04
292
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.233 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 29801.0 tgs: 56 data_time: 0.90s time: 529.20s eta: 3 days, 9:52:42
293
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.140 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31102.0 tgs: 58 data_time: 0.76s time: 529.65s eta: 3 days, 9:48:03
294
+ [XTuner][RANK 55][DP 13][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.160 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32173.0 tgs: 60 data_time: 0.75s time: 530.41s eta: 3 days, 9:46:20
20250121104251/rank58.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.82s
12
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.63 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.300 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 33.1GB text_tokens: 32438.0 tgs: 58 data_time: 2.33s time: 551.28s eta: 3 days, 18:48:29
257
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.225 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31049.0 tgs: 58 data_time: 0.92s time: 529.60s eta: 3 days, 15:05:21
258
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.225 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31400.0 tgs: 59 data_time: 1.01s time: 529.09s eta: 3 days, 14:51:33
259
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.177 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31528.0 tgs: 59 data_time: 0.76s time: 529.95s eta: 3 days, 14:51:12
260
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.201 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31808.0 tgs: 60 data_time: 0.99s time: 529.06s eta: 3 days, 14:33:34
261
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.228 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32312.0 tgs: 60 data_time: 0.78s time: 529.85s eta: 3 days, 14:32:33
262
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.198 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31349.0 tgs: 59 data_time: 0.75s time: 529.63s eta: 3 days, 14:21:30
263
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.297 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31993.0 tgs: 59 data_time: 0.78s time: 535.77s eta: 3 days, 15:12:43
264
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.226 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32365.0 tgs: 60 data_time: 0.70s time: 534.03s eta: 3 days, 14:46:45
265
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.160 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31497.0 tgs: 59 data_time: 0.91s time: 533.28s eta: 3 days, 14:30:34
266
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.189 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31507.0 tgs: 59 data_time: 0.83s time: 529.43s eta: 3 days, 13:44:16
267
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.195 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 40.8GB text_tokens: 29899.0 tgs: 56 data_time: 0.79s time: 529.03s eta: 3 days, 13:31:37
268
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.205 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32129.0 tgs: 60 data_time: 0.87s time: 533.81s eta: 3 days, 14:09:02
269
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 31562.0 tgs: 59 data_time: 0.89s time: 528.94s eta: 3 days, 13:13:04
270
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.240 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 30596.0 tgs: 57 data_time: 0.83s time: 530.39s eta: 3 days, 13:18:17
271
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.158 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31446.0 tgs: 57 data_time: 0.90s time: 543.36s eta: 3 days, 15:14:24
272
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.156 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30935.0 tgs: 57 data_time: 0.62s time: 536.34s eta: 3 days, 13:57:50
273
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.188 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31582.0 tgs: 59 data_time: 0.72s time: 530.95s eta: 3 days, 12:57:07
274
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32037.0 tgs: 60 data_time: 0.79s time: 529.94s eta: 3 days, 12:38:35
275
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.162 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32131.0 tgs: 60 data_time: 0.70s time: 534.35s eta: 3 days, 13:11:54
276
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.162 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32342.0 tgs: 61 data_time: 0.73s time: 529.94s eta: 3 days, 12:20:54
277
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31716.0 tgs: 59 data_time: 0.70s time: 537.08s eta: 3 days, 13:20:08
278
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31956.0 tgs: 60 data_time: 0.95s time: 529.34s eta: 3 days, 11:57:34
279
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.189 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32071.0 tgs: 59 data_time: 1.11s time: 535.73s eta: 3 days, 12:49:27
280
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.184 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31325.0 tgs: 58 data_time: 0.77s time: 538.83s eta: 3 days, 13:09:54
281
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30772.0 tgs: 57 data_time: 1.04s time: 533.67s eta: 3 days, 12:12:05
282
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.221 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31620.0 tgs: 59 data_time: 1.11s time: 529.60s eta: 3 days, 11:24:42
283
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.231 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31970.0 tgs: 60 data_time: 1.08s time: 531.66s eta: 3 days, 11:35:17
284
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.181 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32184.0 tgs: 60 data_time: 0.68s time: 529.52s eta: 3 days, 11:06:16
285
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.177 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31024.0 tgs: 58 data_time: 0.98s time: 529.50s eta: 3 days, 10:57:20
286
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.201 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 40.9GB text_tokens: 30697.0 tgs: 57 data_time: 0.95s time: 529.82s eta: 3 days, 10:51:30
287
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32236.0 tgs: 60 data_time: 0.84s time: 534.70s eta: 3 days, 11:28:21
288
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31376.0 tgs: 59 data_time: 0.73s time: 529.78s eta: 3 days, 10:33:28
289
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31253.0 tgs: 58 data_time: 0.72s time: 530.09s eta: 3 days, 10:27:30
290
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.182 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 40.1GB text_tokens: 27839.0 tgs: 52 data_time: 0.70s time: 529.93s eta: 3 days, 10:17:11
291
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.162 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31971.0 tgs: 59 data_time: 0.93s time: 533.32s eta: 3 days, 10:39:51
292
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.207 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32205.0 tgs: 60 data_time: 0.83s time: 529.20s eta: 3 days, 9:52:46
293
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30720.0 tgs: 58 data_time: 0.86s time: 529.65s eta: 3 days, 9:48:05
294
+ [XTuner][RANK 58][DP 14][SP 2][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.154 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32017.0 tgs: 60 data_time: 0.93s time: 530.41s eta: 3 days, 9:46:19
20250121104251/rank60.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:09][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:11][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.82s
12
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.60 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.246 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.9GB text_tokens: 31861.0 tgs: 57 data_time: 1.74s time: 551.39s eta: 3 days, 18:49:32
257
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.219 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31345.0 tgs: 59 data_time: 0.62s time: 529.49s eta: 3 days, 15:04:19
258
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.213 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30897.0 tgs: 58 data_time: 0.81s time: 529.09s eta: 3 days, 14:51:32
259
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.169 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30727.0 tgs: 57 data_time: 0.82s time: 529.96s eta: 3 days, 14:51:14
260
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.173 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31993.0 tgs: 60 data_time: 0.56s time: 529.06s eta: 3 days, 14:33:33
261
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.198 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32528.0 tgs: 61 data_time: 0.72s time: 529.85s eta: 3 days, 14:32:34
262
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.241 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31985.0 tgs: 60 data_time: 0.58s time: 529.62s eta: 3 days, 14:21:29
263
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.143 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32411.0 tgs: 60 data_time: 0.77s time: 535.77s eta: 3 days, 15:12:43
264
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.182 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30816.0 tgs: 57 data_time: 0.67s time: 534.03s eta: 3 days, 14:46:45
265
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.211 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32133.0 tgs: 60 data_time: 1.05s time: 533.28s eta: 3 days, 14:30:35
266
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.188 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31934.0 tgs: 60 data_time: 0.76s time: 529.43s eta: 3 days, 13:44:15
267
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.255 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31332.0 tgs: 59 data_time: 0.82s time: 529.03s eta: 3 days, 13:31:37
268
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.177 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32182.0 tgs: 60 data_time: 0.81s time: 533.81s eta: 3 days, 14:09:00
269
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.195 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30812.0 tgs: 58 data_time: 0.56s time: 528.94s eta: 3 days, 13:13:07
270
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.163 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32580.0 tgs: 61 data_time: 0.66s time: 530.39s eta: 3 days, 13:18:14
271
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30706.0 tgs: 56 data_time: 0.75s time: 543.37s eta: 3 days, 15:14:26
272
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.195 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31982.0 tgs: 59 data_time: 0.79s time: 536.34s eta: 3 days, 13:57:50
273
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.166 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32212.0 tgs: 60 data_time: 0.83s time: 530.95s eta: 3 days, 12:57:07
274
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31965.0 tgs: 60 data_time: 0.89s time: 529.94s eta: 3 days, 12:38:35
275
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.176 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31887.0 tgs: 59 data_time: 0.83s time: 534.35s eta: 3 days, 13:11:54
276
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.148 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31686.0 tgs: 59 data_time: 0.96s time: 529.94s eta: 3 days, 12:20:54
277
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.145 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31729.0 tgs: 59 data_time: 0.70s time: 537.07s eta: 3 days, 13:20:06
278
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31873.0 tgs: 60 data_time: 0.74s time: 529.34s eta: 3 days, 11:57:33
279
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.211 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31838.0 tgs: 59 data_time: 0.76s time: 535.73s eta: 3 days, 12:49:28
280
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.177 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31923.0 tgs: 59 data_time: 0.85s time: 538.83s eta: 3 days, 13:09:54
281
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.185 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31822.0 tgs: 59 data_time: 0.67s time: 533.67s eta: 3 days, 12:12:04
282
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.194 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32115.0 tgs: 60 data_time: 1.12s time: 529.59s eta: 3 days, 11:24:40
283
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.206 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32433.0 tgs: 61 data_time: 0.81s time: 531.66s eta: 3 days, 11:35:18
284
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.208 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31453.0 tgs: 59 data_time: 0.89s time: 529.52s eta: 3 days, 11:06:17
285
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32050.0 tgs: 60 data_time: 0.85s time: 529.51s eta: 3 days, 10:57:20
286
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32259.0 tgs: 60 data_time: 0.64s time: 529.82s eta: 3 days, 10:51:28
287
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32000.0 tgs: 59 data_time: 0.69s time: 534.70s eta: 3 days, 11:28:23
288
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.157 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30717.0 tgs: 57 data_time: 0.73s time: 529.78s eta: 3 days, 10:33:26
289
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.228 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31683.0 tgs: 59 data_time: 0.81s time: 530.09s eta: 3 days, 10:27:29
290
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.279 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32425.0 tgs: 61 data_time: 1.07s time: 529.93s eta: 3 days, 10:17:12
291
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.151 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31127.0 tgs: 58 data_time: 0.62s time: 533.32s eta: 3 days, 10:39:50
292
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31857.0 tgs: 60 data_time: 0.75s time: 529.20s eta: 3 days, 9:52:45
293
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31766.0 tgs: 59 data_time: 0.75s time: 529.66s eta: 3 days, 9:48:08
294
+ [XTuner][RANK 60][DP 15][SP 0][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.198 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32158.0 tgs: 60 data_time: 0.85s time: 530.41s eta: 3 days, 9:46:16
20250121104251/rank63.log ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:42:55][INFO] Namespace(llm='/mnt/hwfile/opendatalab/panzhuoshi/huggingface/hub/models--Qwen--Qwen2.5-72B-Instruct/snapshots/d3d951150c1e5848237cd6a7ad11df4836aee842', tokenizer=None, chat_template='qwen2', use_lora=False, lora_targets=None, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_bias='none', dtype='auto', selective_recompute=1.0, shard_strategy='full', cpu_offload=False, sp_size=4, datasets=['/mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2'], dset_file_types=dict_keys(['.jsonl', '.json']), dset_sources=['local'], dset_formats=['openai'], dset_sample_ratios=[1.0], dset_cache_dir='/mnt/petrelfs/caimengzhang/cached_data/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2', dset_pack_level='soft', global_pack=True, max_length=32768, num_workers=1, file_pattern=None, group_by_length=True, mirco_batch_size=1, global_batch_size=64, lr=2e-05, lr_min=6e-06, wd=0.01, max_grad_norm=1, epochs=1, warmup_ratio=0.025, config=None, work_dir='checkpoints/qwen25_72b_inst_base50v2-new-zh-en30w-combinev9-mls-chatbeta2/20250121104251', feishu_webhook=None, gc_interval=100, checkpoint_interval=200000.0, checkpoint_max_keep=1, checkpoint_drop_optimizer=True, log_interval=1, resume=False, seed=0, debug=False)
2
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:42:56][INFO] Found 8 files in /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2
3
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:42:59][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_1.jsonl has 4 prompt length>32768, discard.
4
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:02][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_2.jsonl has 4 prompt length>32768, discard.
5
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:04][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_3.jsonl has 5 prompt length>32768, discard.
6
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:05][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_4.jsonl has 6 prompt length>32768, discard.
7
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:07][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_5.jsonl has 2 prompt length>32768, discard.
8
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:08][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_6.jsonl has 4 prompt length>32768, discard.
9
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:10][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_7.jsonl has 3 prompt length>32768, discard.
10
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:12][WARNING] /mnt/petrelfs/caimengzhang/data/20b_data//base50v2-new-zh-en30w-combinev9-mls-chatbeta2/data_part_8.jsonl has 1 prompt length>32768, discard.
11
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:43:18][INFO] [Dataset & Dataloader] Cost 22.85s
12
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch (Qwen2ForCausalLM) forward to `qwen2_casual_forward`
13
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
14
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
15
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.0.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
16
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
17
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
18
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.1.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
19
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
20
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
21
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.2.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
22
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
23
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
24
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.3.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
25
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
26
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
27
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.4.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
28
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
29
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
30
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.5.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
31
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
32
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
33
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.6.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
34
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
35
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
36
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.7.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
37
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
38
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
39
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.8.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
40
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
41
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
42
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.9.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
43
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
44
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
45
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.10.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
46
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
47
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
48
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.11.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
49
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
50
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
51
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.12.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
52
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
53
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
54
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.13.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
55
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
56
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
57
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.14.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
58
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
59
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
60
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.15.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
61
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
62
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
63
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.16.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
64
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
65
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
66
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:24][DEBUG] Dispatch model.layers.17.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
67
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
68
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
69
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.18.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
70
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
71
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
72
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.19.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
73
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
74
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
75
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.20.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
76
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
77
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
78
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.21.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
79
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
80
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
81
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.22.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
82
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
83
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
84
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.23.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
85
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
86
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
87
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.24.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
88
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
89
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
90
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.25.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
91
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
92
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
93
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.26.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
94
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
95
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
96
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.27.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
97
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
98
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
99
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.28.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
100
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
101
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
102
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.29.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
103
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
104
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
105
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.30.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
106
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
107
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
108
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.31.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
109
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
110
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
111
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.32.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
112
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
113
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
114
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.33.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
115
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
116
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
117
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.34.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
118
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
119
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
120
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.35.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
121
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
122
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
123
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.36.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
124
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
125
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
126
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.37.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
127
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
128
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
129
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.38.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
130
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
131
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
132
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.39.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
133
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
134
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
135
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.40.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
136
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
137
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
138
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.41.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
139
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
140
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
141
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.42.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
142
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
143
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
144
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.43.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
145
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
146
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
147
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.44.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
148
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
149
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
150
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.45.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
151
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
152
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
153
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.46.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
154
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
155
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
156
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.47.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
157
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
158
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
159
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.48.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
160
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
161
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
162
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.49.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
163
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
164
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
165
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.50.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
166
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
167
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
168
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.51.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
169
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
170
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
171
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.52.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
172
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
173
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
174
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.53.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
175
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
176
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
177
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.54.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
178
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
179
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
180
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.55.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
181
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
182
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
183
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.56.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
184
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
185
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
186
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.57.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
187
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
188
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
189
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.58.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
190
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
191
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
192
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.59.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
193
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
194
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
195
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.60.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
196
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
197
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
198
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.61.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
199
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
200
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
201
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.62.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
202
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
203
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
204
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.63.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
205
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
206
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
207
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.64.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
208
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
209
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
210
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.65.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
211
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
212
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
213
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.66.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
214
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
215
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
216
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.67.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
217
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
218
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
219
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.68.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
220
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
221
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
222
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.69.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
223
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
224
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
225
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.70.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
226
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
227
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
228
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.71.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
229
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
230
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
231
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.72.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
232
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
233
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
234
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.73.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
235
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
236
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
237
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.74.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
238
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
239
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
240
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.75.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
241
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
242
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
243
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.76.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
244
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
245
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
246
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.77.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
247
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
248
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
249
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.78.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
250
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.self_attn(Qwen2FlashAttention2) forward to `qwen2_attn_flash_forward`
251
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.input_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
252
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.layers.79.post_attention_layernorm(Qwen2RMSNorm) forward to `rms_norm_forward`
253
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:44:25][DEBUG] Dispatch model.norm(Qwen2RMSNorm) forward to `rms_norm_forward`
254
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:46:39][SUCCESS] [Parallelize LLM] Elapsed time 134.32 seconds, peak gpu memory 13.4G
255
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:46:40][INFO] [Train] Begin Train Loop. The current GPU memory is 4.2GB
256
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 10:56:05][INFO] [Train] (Epoch 1) Step 1/593 lr: 0.000001 loss: 0.335 loss(reduced): 0.273 grad_norm: 2.54 if_nan_skip: 0 max_memory: 32.9GB text_tokens: 31861.0 tgs: 57 data_time: 1.74s time: 551.31s eta: 3 days, 18:48:46
257
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:04:55][INFO] [Train] (Epoch 1) Step 2/593 lr: 0.000003 loss: 0.236 loss(reduced): 0.257 grad_norm: 2.31 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31345.0 tgs: 59 data_time: 0.62s time: 529.57s eta: 3 days, 15:05:06
258
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:13:44][INFO] [Train] (Epoch 1) Step 3/593 lr: 0.000004 loss: 0.225 loss(reduced): 0.236 grad_norm: 1.21 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 30897.0 tgs: 58 data_time: 0.80s time: 529.09s eta: 3 days, 14:51:32
259
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:22:34][INFO] [Train] (Epoch 1) Step 4/593 lr: 0.000006 loss: 0.209 loss(reduced): 0.212 grad_norm: 0.33 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30727.0 tgs: 57 data_time: 0.81s time: 529.96s eta: 3 days, 14:51:14
260
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:31:23][INFO] [Train] (Epoch 1) Step 5/593 lr: 0.000007 loss: 0.222 loss(reduced): 0.214 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31993.0 tgs: 60 data_time: 0.57s time: 529.06s eta: 3 days, 14:33:34
261
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:40:13][INFO] [Train] (Epoch 1) Step 6/593 lr: 0.000009 loss: 0.207 loss(reduced): 0.221 grad_norm: 0.44 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32528.0 tgs: 61 data_time: 0.71s time: 529.85s eta: 3 days, 14:32:34
262
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:49:03][INFO] [Train] (Epoch 1) Step 7/593 lr: 0.000010 loss: 0.235 loss(reduced): 0.205 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31985.0 tgs: 60 data_time: 0.58s time: 529.63s eta: 3 days, 14:21:30
263
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 11:57:58][INFO] [Train] (Epoch 1) Step 8/593 lr: 0.000011 loss: 0.210 loss(reduced): 0.232 grad_norm: 0.34 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32411.0 tgs: 60 data_time: 0.76s time: 535.77s eta: 3 days, 15:12:43
264
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:06:52][INFO] [Train] (Epoch 1) Step 9/593 lr: 0.000013 loss: 0.183 loss(reduced): 0.215 grad_norm: 0.29 if_nan_skip: 0 max_memory: 41.0GB text_tokens: 30816.0 tgs: 57 data_time: 0.67s time: 534.03s eta: 3 days, 14:46:45
265
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:15:46][INFO] [Train] (Epoch 1) Step 10/593 lr: 0.000014 loss: 0.198 loss(reduced): 0.210 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32133.0 tgs: 60 data_time: 1.05s time: 533.28s eta: 3 days, 14:30:34
266
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:24:35][INFO] [Train] (Epoch 1) Step 11/593 lr: 0.000016 loss: 0.237 loss(reduced): 0.204 grad_norm: 0.26 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31934.0 tgs: 60 data_time: 0.77s time: 529.42s eta: 3 days, 13:44:13
267
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:33:24][INFO] [Train] (Epoch 1) Step 12/593 lr: 0.000017 loss: 0.179 loss(reduced): 0.201 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31332.0 tgs: 59 data_time: 0.81s time: 529.04s eta: 3 days, 13:31:39
268
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:42:18][INFO] [Train] (Epoch 1) Step 13/593 lr: 0.000019 loss: 0.228 loss(reduced): 0.206 grad_norm: 0.25 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32182.0 tgs: 60 data_time: 0.82s time: 533.81s eta: 3 days, 14:09:01
269
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:51:07][INFO] [Train] (Epoch 1) Step 14/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.205 grad_norm: 0.20 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 30812.0 tgs: 58 data_time: 0.56s time: 528.94s eta: 3 days, 13:13:03
270
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 12:59:57][INFO] [Train] (Epoch 1) Step 15/593 lr: 0.000020 loss: 0.190 loss(reduced): 0.198 grad_norm: 0.23 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32580.0 tgs: 61 data_time: 0.66s time: 530.39s eta: 3 days, 13:18:17
271
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:09:01][INFO] [Train] (Epoch 1) Step 16/593 lr: 0.000020 loss: 0.167 loss(reduced): 0.191 grad_norm: 0.22 if_nan_skip: 0 max_memory: 41.2GB text_tokens: 30706.0 tgs: 56 data_time: 0.76s time: 543.36s eta: 3 days, 15:14:24
272
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:17:57][INFO] [Train] (Epoch 1) Step 17/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.205 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31982.0 tgs: 59 data_time: 0.78s time: 536.35s eta: 3 days, 13:57:52
273
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:26:48][INFO] [Train] (Epoch 1) Step 18/593 lr: 0.000020 loss: 0.159 loss(reduced): 0.192 grad_norm: 0.18 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32212.0 tgs: 60 data_time: 0.82s time: 530.95s eta: 3 days, 12:57:05
274
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:35:38][INFO] [Train] (Epoch 1) Step 19/593 lr: 0.000020 loss: 0.171 loss(reduced): 0.192 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31965.0 tgs: 60 data_time: 0.89s time: 529.94s eta: 3 days, 12:38:35
275
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:44:32][INFO] [Train] (Epoch 1) Step 20/593 lr: 0.000020 loss: 0.225 loss(reduced): 0.198 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31887.0 tgs: 59 data_time: 0.83s time: 534.35s eta: 3 days, 13:11:54
276
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 13:53:22][INFO] [Train] (Epoch 1) Step 21/593 lr: 0.000020 loss: 0.180 loss(reduced): 0.189 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31686.0 tgs: 59 data_time: 0.96s time: 529.94s eta: 3 days, 12:20:55
277
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:02:19][INFO] [Train] (Epoch 1) Step 22/593 lr: 0.000020 loss: 0.168 loss(reduced): 0.198 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31729.0 tgs: 59 data_time: 0.69s time: 537.08s eta: 3 days, 13:20:07
278
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:11:09][INFO] [Train] (Epoch 1) Step 23/593 lr: 0.000020 loss: 0.199 loss(reduced): 0.187 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31873.0 tgs: 60 data_time: 0.73s time: 529.34s eta: 3 days, 11:57:33
279
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:20:04][INFO] [Train] (Epoch 1) Step 24/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.190 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31838.0 tgs: 59 data_time: 0.76s time: 535.73s eta: 3 days, 12:49:28
280
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:29:03][INFO] [Train] (Epoch 1) Step 25/593 lr: 0.000020 loss: 0.197 loss(reduced): 0.186 grad_norm: 0.12 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 31923.0 tgs: 59 data_time: 0.84s time: 538.83s eta: 3 days, 13:09:54
281
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:37:57][INFO] [Train] (Epoch 1) Step 26/593 lr: 0.000020 loss: 0.161 loss(reduced): 0.188 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31822.0 tgs: 59 data_time: 0.68s time: 533.67s eta: 3 days, 12:12:06
282
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:46:47][INFO] [Train] (Epoch 1) Step 27/593 lr: 0.000020 loss: 0.191 loss(reduced): 0.195 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 32115.0 tgs: 60 data_time: 1.10s time: 529.60s eta: 3 days, 11:24:42
283
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 14:55:38][INFO] [Train] (Epoch 1) Step 28/593 lr: 0.000020 loss: 0.201 loss(reduced): 0.184 grad_norm: 0.13 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32433.0 tgs: 61 data_time: 0.82s time: 531.66s eta: 3 days, 11:35:16
284
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:04:28][INFO] [Train] (Epoch 1) Step 29/593 lr: 0.000020 loss: 0.163 loss(reduced): 0.183 grad_norm: 0.17 if_nan_skip: 0 max_memory: 41.3GB text_tokens: 31453.0 tgs: 59 data_time: 0.90s time: 529.52s eta: 3 days, 11:06:16
285
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:13:17][INFO] [Train] (Epoch 1) Step 30/593 lr: 0.000020 loss: 0.246 loss(reduced): 0.187 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32050.0 tgs: 60 data_time: 0.86s time: 529.51s eta: 3 days, 10:57:22
286
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:22:07][INFO] [Train] (Epoch 1) Step 31/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.188 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 32259.0 tgs: 60 data_time: 0.65s time: 529.82s eta: 3 days, 10:51:26
287
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:31:02][INFO] [Train] (Epoch 1) Step 32/593 lr: 0.000020 loss: 0.211 loss(reduced): 0.183 grad_norm: 0.16 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32000.0 tgs: 59 data_time: 0.71s time: 534.70s eta: 3 days, 11:28:23
288
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:39:52][INFO] [Train] (Epoch 1) Step 33/593 lr: 0.000020 loss: 0.220 loss(reduced): 0.182 grad_norm: 0.15 if_nan_skip: 0 max_memory: 41.1GB text_tokens: 30717.0 tgs: 57 data_time: 0.73s time: 529.78s eta: 3 days, 10:33:28
289
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:48:42][INFO] [Train] (Epoch 1) Step 34/593 lr: 0.000020 loss: 0.264 loss(reduced): 0.182 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31683.0 tgs: 59 data_time: 0.81s time: 530.08s eta: 3 days, 10:27:24
290
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 15:57:32][INFO] [Train] (Epoch 1) Step 35/593 lr: 0.000020 loss: 0.150 loss(reduced): 0.182 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 32425.0 tgs: 61 data_time: 1.08s time: 529.94s eta: 3 days, 10:17:16
291
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 16:06:25][INFO] [Train] (Epoch 1) Step 36/593 lr: 0.000020 loss: 0.114 loss(reduced): 0.182 grad_norm: 0.09 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31127.0 tgs: 58 data_time: 0.62s time: 533.32s eta: 3 days, 10:39:52
292
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 16:15:14][INFO] [Train] (Epoch 1) Step 37/593 lr: 0.000020 loss: 0.206 loss(reduced): 0.177 grad_norm: 0.11 if_nan_skip: 0 max_memory: 41.6GB text_tokens: 31857.0 tgs: 60 data_time: 0.75s time: 529.20s eta: 3 days, 9:52:46
293
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 16:24:04][INFO] [Train] (Epoch 1) Step 38/593 lr: 0.000020 loss: 0.173 loss(reduced): 0.183 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.5GB text_tokens: 31766.0 tgs: 59 data_time: 0.75s time: 529.65s eta: 3 days, 9:48:07
294
+ [XTuner][RANK 63][DP 15][SP 3][TP 0][2025-01-21 16:32:54][INFO] [Train] (Epoch 1) Step 39/593 lr: 0.000020 loss: 0.192 loss(reduced): 0.175 grad_norm: 0.10 if_nan_skip: 0 max_memory: 41.4GB text_tokens: 32158.0 tgs: 60 data_time: 0.85s time: 530.41s eta: 3 days, 9:46:16
20250121165312/hf-593/generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.45.1"
14
+ }
20250121165312/hf-593/tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "padding_side": "right",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
20250121165312/rank11.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank12.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank15.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank16.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank19.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank2.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank20.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank25.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank26.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank3.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank30.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank33.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank38.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank42.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank43.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank45.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank48.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank49.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank5.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank51.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank55.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank58.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank6.log ADDED
The diff for this file is too large to render. See raw diff
 
20250121165312/rank63.log ADDED
The diff for this file is too large to render. See raw diff