YuchenLi01 commited on
Commit
5771919
·
verified ·
1 Parent(s): 21d4f75

Training in progress, step 2

Browse files
logs/amlt_code_runner.txt CHANGED
@@ -1,22 +1,22 @@
1
- 2025-03-01 06:51:57,456:amlt-code-runner:INFO - SINGULARITY_LOCATION: westus2
2
- 2025-03-01 06:51:57,456:amlt-code-runner:INFO - AISC_INSTANCE_TYPE: Singularity.NC96ad_A100_v4
3
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - Not removing AzureML's cd commands from /etc/profile due to an error: [Errno 13] Permission denied: '/etc/profile'
4
- 2025-03-01 06:51:58,735:amlt-code-runner:WARNING - Environment variable 'NCCL_SOCKET_IFNAME' already set to 'eth0', not changing to '^docker0,lo'
5
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - RANK = 0
6
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - LOCAL_RANK = None
7
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - WORLD_SIZE = 1
8
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - MASTER_ADDR = node-0
9
- 2025-03-01 06:51:58,735:amlt-code-runner:INFO - MASTER_PORT = 9500
10
- 2025-03-01 06:51:58,737:amlt-code-runner:WARNING - Installing amlt runtime dependencies: ['wrapt', 'azure-identity', 'python-dateutil', 'pytz'] into /tmp/amlt-user-base
11
- 2025-03-01 06:51:59,965:amlt-code-runner:INFO - Setting WANDB_RUN_ID to 'kind_onion_8sfvnlnwfk_0'
12
- 2025-03-01 06:51:59,965:amlt-code-runner:INFO - Expanding HyperDrive arguments into /tmp/amlt_run_hd.sh
13
- 2025-03-01 06:52:00,311:amlt-code-runner:INFO - Parsing tracking uri /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws
14
- 2025-03-01 06:52:00,311:amlt-code-runner:INFO - Tracking uri /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws has sub id 2aac527a-de5a-4fe3-95e9-5c8b9d48ed62, resource group cyrilzhang, and workspace cyrilzhangws
15
- 2025-03-01 06:52:00,311:aml_token_auth:WARNING - The AzureMLTokenAuthentication created will not be updated due to missing params. The token expires on 2025-03-20 19:29:14.
16
 
17
- 2025-03-01 06:52:00,313:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eastus.api.azureml.ms:443
18
- 2025-03-01 06:52:00,700:urllib3.connectionpool:DEBUG - https://eastus.api.azureml.ms:443 "POST /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws/api/2.0/mlflow/runs/set-tag HTTP/11" 403 0
19
- 2025-03-01 06:52:00,700:amlt-code-runner:WARNING - Failed to rename job according to the amulet job name template. Run 'amlt list' client side to set the display name according to the amulet job template name. The error we encountered was: Failed to update display name:
20
- 2025-03-01 06:52:00,724:amlt-code-runner:INFO - Executing ./amlt_setup.sh, /tmp/amlt_run_hd.sh
21
- 2025-03-01 06:52:00,794:background_dirsync:INFO - Starting directory syncer from '/scratch/amlt_code/outputs' to '/mnt/output/projects/lmpref/amlt-results/kind_onion_8sfvnlnwfk_0', every 30.000000s
22
- 2025-03-01 06:52:00,797:background_dirsync:INFO - Starting directory syncer from '/scratch/azureml/cr/j/0dbe7baf319148b0a4f7cd7a47d94a60/exe/wd/logs' to '/scratch/amlt_code/outputs/logs', every 30.000000s
 
1
+ 2025-03-02 16:04:22,406:amlt-code-runner:INFO - SINGULARITY_LOCATION: eastus
2
+ 2025-03-02 16:04:22,406:amlt-code-runner:INFO - AISC_INSTANCE_TYPE: Singularity.NC96ad_A100_v4
3
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - Not removing AzureML's cd commands from /etc/profile due to an error: [Errno 13] Permission denied: '/etc/profile'
4
+ 2025-03-02 16:04:23,711:amlt-code-runner:WARNING - Environment variable 'NCCL_SOCKET_IFNAME' already set to 'eth0', not changing to '^docker0,lo'
5
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - RANK = 0
6
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - LOCAL_RANK = None
7
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - WORLD_SIZE = 1
8
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - MASTER_ADDR = node-0
9
+ 2025-03-02 16:04:23,711:amlt-code-runner:INFO - MASTER_PORT = 9500
10
+ 2025-03-02 16:04:23,713:amlt-code-runner:WARNING - Installing amlt runtime dependencies: ['wrapt', 'azure-identity', 'python-dateutil', 'pytz'] into /tmp/amlt-user-base
11
+ 2025-03-02 16:04:24,867:amlt-code-runner:INFO - Setting WANDB_RUN_ID to 'kind_onion_8sfvnlnwfk_8'
12
+ 2025-03-02 16:04:24,867:amlt-code-runner:INFO - Expanding HyperDrive arguments into /tmp/amlt_run_hd.sh
13
+ 2025-03-02 16:04:25,173:amlt-code-runner:INFO - Parsing tracking uri /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws
14
+ 2025-03-02 16:04:25,174:amlt-code-runner:INFO - Tracking uri /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws has sub id 2aac527a-de5a-4fe3-95e9-5c8b9d48ed62, resource group cyrilzhang, and workspace cyrilzhangws
15
+ 2025-03-02 16:04:25,174:aml_token_auth:WARNING - The AzureMLTokenAuthentication created will not be updated due to missing params. The token expires on 2025-03-20 19:29:14.
16
 
17
+ 2025-03-02 16:04:25,176:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): eastus.api.azureml.ms:443
18
+ 2025-03-02 16:04:25,236:urllib3.connectionpool:DEBUG - https://eastus.api.azureml.ms:443 "POST /mlflow/v1.0/subscriptions/2aac527a-de5a-4fe3-95e9-5c8b9d48ed62/resourceGroups/cyrilzhang/providers/Microsoft.MachineLearningServices/workspaces/cyrilzhangws/api/2.0/mlflow/runs/set-tag HTTP/11" 403 0
19
+ 2025-03-02 16:04:25,236:amlt-code-runner:WARNING - Failed to rename job according to the amulet job name template. Run 'amlt list' client side to set the display name according to the amulet job template name. The error we encountered was: Failed to update display name:
20
+ 2025-03-02 16:04:25,258:amlt-code-runner:INFO - Executing ./amlt_setup.sh, /tmp/amlt_run_hd.sh
21
+ 2025-03-02 16:04:25,324:background_dirsync:INFO - Starting directory syncer from '/scratch/amlt_code/outputs' to '/mnt/output/projects/lmpref/amlt-results/kind_onion_8sfvnlnwfk_8', every 30.000000s
22
+ 2025-03-02 16:04:25,327:background_dirsync:INFO - Starting directory syncer from '/scratch/azureml/cr/j/e98bae802613438391006af4bb841ebd/exe/wd/logs' to '/scratch/amlt_code/outputs/logs', every 30.000000s
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:630b856300ae45b2dc44b511ae21517df925530a84d058cc61eb786921431f86
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e30df2a90b7f2d7d1e2122918db1a0850dda71d0bc0c53cd620fa838bc7e3fa
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d41c737b7b9ba77f20e82cc830410e3fae5b592f84407aaba0a5ef8d89d1054c
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d6a6f638568e44a83093f14f6f0d9b542eae3043cd855f7e8a83490d0d08f3c
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8011c92e680b285561e96a2359d759e5ce309f5bad32778216db3244d5862dae
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20bf231ec19175b143043bfce5c11db21800789da91b31aab1790f6f6d822c1d
3
  size 4540516344
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a36eb6931628f55ea1ca86d9290f057cc31180ab0b9cf9f0d4836052c9aaed87
3
  size 7736
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e179a67bc9e9c730f8a78d87169fb9714c0050f18e51176b5b183038ce88196
3
  size 7736