File size: 135,121 Bytes
c3b20da |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 |
import os
import sys
with open(sys.argv[0]) as f:
code = f.read() # read the code of this file ASAP, for logging
import uuid
import time
import copy
import glob
from dataclasses import dataclass
from functools import lru_cache, partial # Added partial for hook registration
from pathlib import Path
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
import torch
torch.empty(1, device="cuda", requires_grad=True).backward() # prevents a bug on some systems
from torch import Tensor, nn, autocast
import torch.nn.functional as F
import torch.distributed as dist
# use of FlexAttention contributed by @KoszarskyB
from torch.nn.attention.flex_attention import BlockMask, flex_attention
#torch._inductor.config.coordinate_descent_tuning = True # we have banned this flag for new records because it causes compilation to take 30min
#import wandb
# -----------------------------------------------------------------------------
# Custom operators: FP8 matmul by @YouJiacheng
@torch.library.custom_op("nanogpt::mm", mutates_args=())
def mm_op(x: Tensor, w: Tensor, x_s: float, w_s: float, grad_s: float) -> tuple[Tensor, Tensor, Tensor]:
@torch.compile
def impl(x: Tensor, w: Tensor):
assert x.is_contiguous() and w.is_contiguous()
x_f8 = x.div(x_s).to(torch.float8_e4m3fn)
w_f8 = w.div(w_s).to(torch.float8_e4m3fn)
out = torch._scaled_mm(
x_f8,
w_f8.T,
out_dtype=torch.bfloat16,
scale_a=x.new_tensor(x_s, dtype=torch.float32),
scale_b=x.new_tensor(w_s, dtype=torch.float32),
use_fast_accum=True,
)
return out, x_f8, w_f8
return impl(x, w)
@mm_op.register_fake
def _(x: Tensor, w: Tensor, *_):
assert x.ndim == w.ndim == 2
assert x.shape[1] == w.shape[1]
assert x.device == w.device
assert x.is_contiguous() and w.is_contiguous()
return x @ w.T, x.to(torch.float8_e4m3fn), w.to(torch.float8_e4m3fn)
@torch.library.custom_op("nanogpt::mm_backward", mutates_args=())
def mm_backward_op(g: Tensor, x_f8: Tensor, w_f8: Tensor, x_s: float, w_s: float, grad_s: float) -> tuple[Tensor, Tensor]:
@torch.compile
def impl(grad: Tensor, x_f8: Tensor, w_f8: Tensor):
assert grad.is_contiguous()
x_inv_s = grad.new_tensor(x_s, dtype=torch.float32)
w_inv_s = grad.new_tensor(w_s, dtype=torch.float32)
grad_inv_s = grad.new_tensor(grad_s, dtype=torch.float32)
grad_f8 = grad.div(grad_s).to(torch.float8_e5m2)
grad_x = torch._scaled_mm(
grad_f8,
w_f8.T.contiguous().T,
out_dtype=torch.bfloat16,
scale_a=grad_inv_s,
scale_b=w_inv_s,
use_fast_accum=False,
)
# faster than grad_f8_t @ x_f8, for (d_out, d_in) == (50304, 768)
grad_w = torch._scaled_mm(
x_f8.T.contiguous(),
grad_f8.T.contiguous().T,
out_dtype=torch.float32,
scale_a=x_inv_s,
scale_b=grad_inv_s,
use_fast_accum=False,
).T
return grad_x, grad_w
return impl(g, x_f8, w_f8)
@mm_backward_op.register_fake
def _(g: Tensor, x_f8: Tensor, w_f8: Tensor, *_):
return x_f8.to(torch.bfloat16), w_f8.T.contiguous().T.to(torch.float32)
def backward(ctx, grad_out: Tensor, *_):
x_f8, w_f8 = ctx.saved_tensors
x_s, w_s, grad_s = ctx.scales
grad_x, grad_w = torch.ops.nanogpt.mm_backward(
grad_out, x_f8, w_f8, x_s, w_s, grad_s
)
return grad_x, grad_w, None, None, None
def setup_context(ctx: torch.autograd.function.FunctionCtx, inputs, output):
*_, x_s, w_s, grad_s = inputs
_, x_f8, w_f8 = output
ctx.save_for_backward(x_f8, w_f8)
ctx.scales = x_s, w_s, grad_s
ctx.set_materialize_grads(False)
mm_op.register_autograd(backward, setup_context=setup_context)
# -----------------------------------------------------------------------------
# Muon optimizer
@torch.compile(mode="reduce-overhead", fullgraph=True, dynamic=False)
def zeropower_via_newtonschulz5(G: Tensor, steps: int) -> Tensor:
"""
Newton-Schulz iteration to compute the zeroth power / orthogonalization of G. We opt to use a
quintic iteration whose coefficients are selected to maximize the slope at zero. For the purpose
of minimizing steps, it turns out to be empirically effective to keep increasing the slope at
zero even beyond the point where the iteration no longer converges all the way to one everywhere
on the interval. This iteration therefore does not produce UV^T but rather something like US'V^T
where S' is diagonal with S_{ii}' ~ Uniform(0.5, 1.5), which turns out not to hurt model
performance at all relative to UV^T, where USV^T = G is the SVD.
"""
assert G.ndim >= 2 # batched Muon implementation by @scottjmaddox, and put into practice in the record by @YouJiacheng
a, b, c = (3.4445, -4.7750, 2.0315)
X = G.bfloat16()
if G.size(-2) > G.size(-1):
X = X.mT
# Ensure spectral norm is at most 1
X = X / (X.norm(dim=(-2, -1), keepdim=True) + 1e-7)
# Perform the NS iterations
for _ in range(steps):
A = X @ X.mT
B = b * A + c * A @ A # quintic computation strategy adapted from suggestion by @jxbz, @leloykun, and @YouJiacheng
X = a * X + B @ X
if G.size(-2) > G.size(-1):
X = X.mT
return X.type_as(G)
class Muon(torch.optim.Optimizer):
"""
Muon - MomentUm Orthogonalized by Newton-schulz
https://kellerjordan.github.io/posts/muon/
Muon internally runs standard SGD-momentum, and then performs an orthogonalization post-
processing step, in which each 2D parameter's update is replaced with the nearest orthogonal
matrix. To efficiently orthogonalize each update, we use a Newton-Schulz iteration, which has
the advantage that it can be stably run in bfloat16 on the GPU.
Warning: This optimizer should not be used for the embedding layer, the final fully connected layer,
or any {0,1}-D parameters; those should all be optimized by a standard method (e.g., AdamW).
"""
def __init__(self, params, lr=0.02, weight_decay=0.01, momentum=0.95, rank=0, world_size=1):
self.rank = rank
self.world_size = world_size
defaults = dict(lr=lr, weight_decay=weight_decay, momentum=momentum)
params = list(params)
sizes = {p.shape for p in params}
# create one buffer per unique parameter-size
param_groups = []
for size in sizes:
group_params = [p for p in params if p.shape == size]
param_groups.append(dict(params=group_params,))
super().__init__(param_groups, defaults)
@torch.no_grad()
def step(self):
futures: list[torch.Future] = []
reduce_scatter_futures: list[torch.Future] = []
for group in self.param_groups:
params: list[Tensor] = group["params"]
grad = torch.empty_like(params[-1])
grad_pad = [param.grad for param in params] + [torch.zeros_like(params[-1])] * self.world_size
for base_i in range(0, len(params), self.world_size):
if base_i + self.rank < len(params):
grad = params[base_i + self.rank].grad
# This gives strange dynamo warnings
reduce_scatter_futures.append(dist.reduce_scatter(grad, grad_pad[base_i:base_i + self.world_size], op=dist.ReduceOp.AVG, async_op=True).get_future())
idx = 0
for group in self.param_groups:
params: list[Tensor] = group["params"]
params_pad = params + [torch.empty_like(params[-1])] * self.world_size
momentum = group["momentum"]
for base_i in range(0, len(params), self.world_size):
reduce_scatter_futures[idx].wait()
if base_i + self.rank < len(params):
p = params[base_i + self.rank]
grad = p.grad
eff_lr = group["lr"] * max(1, p.size(-2) / p.size(-1)) ** 0.5 * getattr(p, "lr_mul", 1.0)
eff_weight_decay = group["lr"] * group["weight_decay"] * getattr(p, "wd_mul", 1.0)
state = self.state[p]
if len(state) == 0:
state["momentum_buffer"] = torch.zeros_like(grad)
momentum_buffer = state["momentum_buffer"]
p.mul_(1 - eff_weight_decay)
momentum_buffer.lerp_(grad, 1 - momentum)
grad = grad.lerp_(momentum_buffer, momentum)
v = zeropower_via_newtonschulz5(grad, 5)
p.add_(other=v, alpha=-eff_lr)
idx += 1
futures.append(dist.all_gather(params_pad[base_i:base_i + self.world_size], params_pad[base_i + self.rank], async_op=True).get_future())
# TODO: Check if commenting it is dangerous
torch.futures.collect_all(futures).wait()
class DistAdam(torch.optim.Optimizer):
def __init__(self, params, lr: float = 1e-3, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-8, weight_decay: float = 0.01, rank: int = 0, world_size: int = 1):
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
params = list(params)
sizes = {p.shape for p in params}
self.rank = rank
self.world_size = world_size
# create one buffer per unique parameter-size
param_groups = []
for size in sizes:
group_params = [p for p in params if p.shape == size]
param_groups.append(dict(
params=group_params,
))
super().__init__(param_groups, defaults)
@torch.no_grad()
def step(self):
futures: list[torch.Future] = []
reduce_scatter_futures: list[torch.Future] = []
grad_slices = []
for group in self.param_groups:
params: list[Tensor] = group["params"]
grad = torch.empty_like(params[-1])
for base_i in range(len(params)):
grad = params[base_i].grad
rank_size = grad.shape[0] // self.world_size
grad_slice = torch.empty_like(grad[:rank_size])
reduce_scatter_futures.append(dist.reduce_scatter_tensor(grad_slice, grad, op=dist.ReduceOp.AVG, async_op=True).get_future())
grad_slices.append(grad_slice)
idx = 0
for group in self.param_groups:
beta1, beta2 = group['betas']
eps = group['eps']
wd = group['weight_decay']
params = group['params']
for base in range(len(params)):
reduce_scatter_futures[idx].wait()
p = params[base]
rank_size = p.shape[0] // self.world_size
p_slice = p[rank * rank_size:(rank + 1) * rank_size]
lr = group['lr'] * getattr(p, "lr_mul", 1.0)
state = self.state[p]
g_slice = grad_slices[idx]
# State init
if not state:
state['step'] = torch.tensor(0, dtype=torch.int64, device=p.device)
state['exp_avg'] = torch.zeros_like(p_slice)
state['exp_avg_sq'] = torch.zeros_like(p_slice)
exp_avg = state['exp_avg']
exp_avg_sq = state['exp_avg_sq']
state['step'] += 1
t = state['step']
# weight decay
if wd != 0:
eff_weight_decay = lr * wd * getattr(p, "wd_mul", 1.0)
p_slice.mul_(1 - eff_weight_decay)
# update running averages
exp_avg.mul_(beta1).add_(g_slice, alpha=1 - beta1)
exp_avg_sq.mul_(beta2).addcmul_(g_slice, g_slice, value=1 - beta2)
# bias corrections
bias1 = 1 - beta1 ** t
bias2 = 1 - beta2 ** t
# compute step
denom = exp_avg_sq.sqrt().add_(eps)
step_size = lr * (torch.sqrt(bias2) / bias1)
update = exp_avg.div(denom).mul_(step_size)
p_slice.add_(other=update, alpha=-1.0)
idx += 1
futures.append(dist.all_gather_into_tensor(p, p_slice, async_op=True).get_future())
# TODO: Check if commenting it is dangerous
torch.futures.collect_all(futures).wait()
# -----------------------------------------------------------------------------
# PyTorch nn.Module definitions for the model
def norm(x: Tensor):
return F.rms_norm(x, (x.size(-1),))
class CastedLinear(nn.Linear):
def __init__(self, in_features: int, out_features: int, use_fp8=False, x_s=1.0, w_s=1.0, grad_s=1.0):
super().__init__(in_features, out_features, bias=False)
self.use_fp8 = use_fp8
self.x_s = x_s
self.w_s = w_s
self.grad_s = grad_s
def reset_parameters(self) -> None:
std = 0.5 * (self.in_features ** -0.5) # 0.5 is a bit better than the default 1/sqrt(3)
bound = (3 ** 0.5) * std
with torch.no_grad():
self.weight.uniform_(-bound, bound)
def forward(self, x: Tensor):
if self.use_fp8 and self.training:
_x = x.flatten(0, -2)
out: Tensor = torch.ops.nanogpt.mm(_x, self.weight, x_s=self.x_s, w_s=self.w_s, grad_s=self.grad_s)[0]
return out.reshape(*x.shape[:-1], -1)
else:
return F.linear(x, self.weight)
class Rotary(nn.Module):
def __init__(self, dim: int, max_seq_len: int):
super().__init__()
# half-truncate RoPE by @YouJiacheng (w/ base freq tuning)
angular_freq = (1 / 1024) ** torch.linspace(0, 1, steps=dim//4, dtype=torch.float32)
angular_freq = torch.cat([angular_freq, angular_freq.new_zeros(dim//4)])
t = torch.arange(max_seq_len, dtype=torch.float32)
theta = torch.einsum("i,j -> ij", t, angular_freq)
self.cos = nn.Buffer(theta.cos(), persistent=False)
self.sin = nn.Buffer(theta.sin(), persistent=False)
def forward(self, x_BTHD: Tensor):
assert self.cos.size(0) >= x_BTHD.size(-3)
cos, sin = self.cos[None, :x_BTHD.size(-3), None, :], self.sin[None, :x_BTHD.size(-3), None, :]
x1, x2 = x_BTHD.to(dtype=torch.float32).chunk(2, dim=-1)
y1 = x1 * cos + x2 * sin
y2 = x1 * (-sin) + x2 * cos
return torch.cat((y1, y2), 3).type_as(x_BTHD)
class CausalSelfAttention(nn.Module):
def __init__(self, dim: int, num_heads: int, max_seq_len: int, head_dim=128):
super().__init__()
self.num_heads = num_heads
self.head_dim = head_dim
hdim = num_heads * head_dim
std = 0.5 * (dim ** -0.5)
bound = (3 ** 0.5) * std # improved init scale by @YouJiacheng
# merged QKV weights: suggested by many, implemented by @fernbear.bsky.social, and further improved by @YouJiacheng
# https://x.com/hi_tysam/status/1879699187107033311
self.qkv_w = nn.Parameter(torch.empty(3, hdim, dim).uniform_(-bound, bound))
self.rotary = Rotary(head_dim, max_seq_len)
self.c_proj = CastedLinear(hdim, dim)
self.c_proj.weight.detach().zero_() # zero init suggested by @Grad62304977
# scale the attention logits by given constant, instead of the default head_dim**-0.5, by @leloykun
# inspired by learnable scalars used by @brendanh0gan https://x.com/hi_tysam/status/1879693583898591283
self.attn_scale = 0.12
def forward(self, x: Tensor, ve: Tensor | None, lambdas: Tensor, block_mask: BlockMask):
B, T = x.size(0), x.size(1) # batch size, sequence length
assert B == 1, "Must use batch size = 1 for FlexAttention"
q, k, v = F.linear(x, self.qkv_w.flatten(end_dim=1)).view(B, T, 3 * self.num_heads, self.head_dim).chunk(3, dim=-2)
q, k = norm(q), norm(k) # QK norm @Grad62304977
q, k = self.rotary(q), self.rotary(k)
if ve is not None:
v = lambdas[0] * v + lambdas[1] * ve.view_as(v) # @KoszarskyB & @Grad62304977
else: # skip mid-layers token value embeddings by @YouJiacheng
v = lambdas[0] * v
y = flex_attention(q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2), block_mask=block_mask, scale=self.attn_scale).transpose(1, 2)
y = y.contiguous().view(B, T, self.num_heads * self.head_dim) # re-assemble all head outputs side by side
y = self.c_proj(y)
return y
class MLP(nn.Module):
def __init__(self, dim: int):
super().__init__()
hdim = 4 * dim
self.c_fc = CastedLinear(dim, hdim)
self.c_proj = CastedLinear(hdim, dim)
self.c_proj.weight.detach().zero_() # zero init suggested by @Grad62304977
def forward(self, x: Tensor):
x = self.c_fc(x)
x = F.relu(x).square() # https://arxiv.org/abs/2109.08668v2; ~1-2% better than GELU; suggested by @SKYLINEZ007 and @Grad62304977
x = self.c_proj(x)
return x
class Block(nn.Module):
def __init__(self, dim: int, num_heads: int, max_seq_len: int, layer_idx: int):
super().__init__()
# skip attention of blocks.7 (the 8th layer) by @YouJiacheng
self.attn = CausalSelfAttention(dim, num_heads, max_seq_len) if layer_idx != 7 else None
self.mlp = MLP(dim)
def forward(self, x: Tensor, ve: Tensor | None, x0: Tensor, lambdas: Tensor, sa_lambdas: Tensor, block_mask: BlockMask):
x = lambdas[0] * x + lambdas[1] * x0
if self.attn is not None:
x = x + self.attn(norm(x), ve, sa_lambdas, block_mask)
x = x + self.mlp(norm(x))
return x
# -----------------------------------------------------------------------------
# The main model
def next_multiple_of_n(v: float | int, *, n: int):
return next(x for x in range(n, int(v) + 1 + n, n) if x >= v)
class GPT(nn.Module):
def __init__(self, vocab_size: int, num_layers: int, num_heads: int, model_dim: int, max_seq_len: int):
super().__init__()
self.embed = nn.Embedding(vocab_size, model_dim)
for param in self.embed.parameters():
param.lr_mul = 75.
# token value embeddings by @KoszarskyB - inspired by @Grad62304977's value residual implementation following https://arxiv.org/abs/2410.17897
# value embedding code simplification inspired by @ragulpr https://github.com/KellerJordan/modded-nanogpt/pull/78
self.value_embeds = nn.ModuleList([nn.Embedding(vocab_size, model_dim) for _ in range(3)])
for embeds in self.value_embeds:
for param in self.value_embeds.parameters():
param.lr_mul = 75.
self.blocks = nn.ModuleList([Block(model_dim, num_heads, max_seq_len, i) for i in range(num_layers)])
# there are only 50257 unique GPT-2 tokens; we extend to nearest multiple of 128 for efficiency.
# suggested to me by @Grad62304977. this originates from Karpathy's experiments.
self.lm_head = CastedLinear(model_dim, next_multiple_of_n(vocab_size, n=128), use_fp8=True, x_s=(model_dim**0.5)/448, w_s=24/448, grad_s=1/448)
self.lm_head.weight.lr_mul = 27.5
self.lm_head.weight.detach().zero_() # @Grad62304977
# Add learnable skip connection weights for decoder layers
assert num_layers % 2 == 0
pad = (-num_layers * 5) % world_size
self.scalars = nn.Parameter(torch.cat([
torch.ones(num_layers), # skip_weights
*[torch.tensor([1.0, 0.0]) for _ in range(num_layers)], # block lambdas
*[torch.tensor([0.5, 0.5]) for _ in range(num_layers)], # SA lambdas
torch.ones(pad),
]))
self.scalars.lr_mul = 5.0
def create_blockmasks(self, input_seq: Tensor, sliding_window_num_blocks: Tensor):
BLOCK_SIZE = 128
docs = (input_seq == 50256).cumsum(0)
def document_causal(b, h, q_idx, kv_idx):
causal_mask = q_idx >= kv_idx
#return causal_mask
document_mask = docs[q_idx] == docs[kv_idx]
return causal_mask & document_mask
def dense_to_ordered(dense_blockmask: Tensor):
num_blocks = dense_blockmask.sum(dim=-1, dtype=torch.int32)
indices = dense_blockmask.argsort(dim=-1, descending=False, stable=True).flip(-1).to(torch.int32)
return num_blocks[None, None].contiguous(), indices[None, None].contiguous()
# manual block mask creation by @YouJiacheng
assert len(input_seq) % BLOCK_SIZE == 0
NUM_BLOCKS = len(input_seq) // BLOCK_SIZE
block_idx = torch.arange(NUM_BLOCKS, dtype=torch.int32, device="cuda")
causal_blockmask_any = block_idx[:, None] >= block_idx
causal_blockmask_all = block_idx[:, None] > block_idx
docs_low = docs.view(-1, BLOCK_SIZE)[:, 0].contiguous()
docs_high = docs.view(-1, BLOCK_SIZE)[:, -1].contiguous()
document_blockmask_any = (docs_low[:, None] <= docs_high) & (docs_high[:, None] >= docs_low)
document_blockmask_all = (docs_low[:, None] == docs_high) & (docs_high[:, None] == docs_low)
blockmask_any = causal_blockmask_any & document_blockmask_any
blockmask_all = causal_blockmask_all & document_blockmask_all
partial_kv_num_blocks, partial_kv_indices = dense_to_ordered(blockmask_any & ~blockmask_all)
full_kv_num_blocks, full_kv_indices = dense_to_ordered(blockmask_all)
def build_bm(window_size_blocks: Tensor) -> BlockMask:
return BlockMask.from_kv_blocks(
torch.clamp_max(partial_kv_num_blocks, torch.clamp_min(window_size_blocks - full_kv_num_blocks, 1)),
partial_kv_indices,
torch.clamp_max(full_kv_num_blocks, window_size_blocks - 1),
full_kv_indices,
BLOCK_SIZE=BLOCK_SIZE,
mask_mod=document_causal,
)
# Long-short SWA block masks by @leloykun & @YouJiacheng, adapated from suggestion by @Grad62304977, following Gemma 2 paper
return build_bm(sliding_window_num_blocks), build_bm(sliding_window_num_blocks // 2)
def forward(self, input_seq: Tensor, target_seq: Tensor, sliding_window_num_blocks: Tensor):
assert input_seq.ndim == 1
ve = [value_embed(input_seq) for value_embed in self.value_embeds]
# 012 ... 012 structure on token value embeddings by @YouJiacheng, improved on @leloykun's U-net structure
ve = [ve[0], ve[1], ve[2]] + [None] * (len(self.blocks) - 6) + [ve[0], ve[1], ve[2]]
assert len(ve) == len(self.blocks)
long_bm, short_bm = self.create_blockmasks(input_seq, sliding_window_num_blocks)
block_masks = [long_bm, short_bm, short_bm, short_bm, long_bm, short_bm, short_bm, long_bm, short_bm, short_bm, short_bm, long_bm]
assert len(block_masks) == len(self.blocks)
x = x0 = norm(self.embed(input_seq)[None]) # use of norm here by @Grad62304977
# U-net design by @brendanh0gan
skip_connections = []
skip_weights = self.scalars[:(len(self.blocks) // 2)]
lambdas = self.scalars[1 * len(self.blocks): 3 * len(self.blocks)].view(-1, 2)
sa_lambdas = self.scalars[3 * len(self.blocks): 5 * len(self.blocks)].view(-1, 2)
n = len(self.blocks) // 2
for i in range(len(self.blocks)):
if i >= n:
x = x + skip_weights[i - n] * skip_connections.pop()
x = self.blocks[i](x, ve[i], x0, lambdas[i], sa_lambdas[i], block_masks[i])
if i < n:
skip_connections.append(x)
x = norm(x)
logits = self.lm_head(x).float()
# @Grad62304977 added tanh softcapping following Gemma 2 paper, @KoszarskyB reduced it from 30 to 15, @YouJiacheng shifted it by +15 (2*sigmoid(2*x)=tanh(x)+1)
logits = 30 * torch.sigmoid(logits / (7.5 * x.size(-1)**0.5))
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), target_seq, reduction='sum' if self.training else 'mean')
return loss
# -----------------------------------------------------------------------------
# Our own simple Distributed Data Loader
def _load_data_shard(file: Path):
header = torch.from_file(str(file), False, 256, dtype=torch.int32) # header is 256 int32
assert header[0] == 20240520, "magic number mismatch in the data .bin file"
assert header[1] == 1, "unsupported version"
num_tokens = int(header[2]) # number of tokens (claimed)
with file.open("rb", buffering=0) as f:
tokens = torch.empty(num_tokens, dtype=torch.uint16, pin_memory=True) # avoid pin_memory copy by @YouJiacheng
f.seek(256 * 4)
nbytes = f.readinto(tokens.numpy()) # avoid bytes->array copy by @YouJiacheng
assert nbytes == 2 * num_tokens, "number of tokens read does not match header"
return tokens
def distributed_data_generator(filename_pattern: str, batch_size: int, rank : int, world_size : int):
files = [Path(file) for file in sorted(glob.glob(filename_pattern))]
assert batch_size % world_size == 0
local_batch_size = batch_size // world_size
file_iter = iter(files) # use itertools.cycle(files) instead if you want to do multi-epoch training
tokens, pos = _load_data_shard(next(file_iter)), 0
while True:
if pos + batch_size + 1 >= len(tokens):
tokens, pos = _load_data_shard(next(file_iter)), 0
buf = tokens[pos + rank * local_batch_size:][:local_batch_size + 1]
inputs = buf[:-1].to(device="cuda", dtype=torch.int32, non_blocking=True) # no sync on host side;
targets = buf[1:].to(device="cuda", dtype=torch.int64, non_blocking=True) # H2D in another stream isn't helpful.
pos += batch_size
yield inputs, targets
# -----------------------------------------------------------------------------
# int main
@dataclass
class Hyperparameters:
# data
train_files = "data/fineweb10B/fineweb_train_*.bin" # input .bin to train on
val_files = "data/fineweb10B/fineweb_val_*.bin" # input .bin to eval validation loss on
val_tokens = 10485760 # how many tokens of validation data? it's important to keep this fixed for consistent comparisons
# optimization
num_iterations = 1770 # number of iterations to run
cooldown_frac = 0.4 # fraction of training spent cooling down the learning rate
# evaluation and logging
val_loss_every = 125 # every how many steps to evaluate val loss? 0 for only at the end
# implementation
seq_len = 48*1024 # FlexAttention sequence length
val_seq_len = 4*64*1024 # FlexAttention sequence length for validation
save_checkpoint = False
args = Hyperparameters()
# torchrun sets these env variables
rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])
assert torch.cuda.is_available()
device = torch.device("cuda", int(os.environ["LOCAL_RANK"]))
torch.cuda.set_device(device)
dist.init_process_group(backend="nccl", device_id=device)
dist.barrier()
master_process = (rank == 0) # this process will do logging, checkpointing etc.
#if master_process:
# wandb.init(project="modded-nanogpt-tiny", name=f"run-{os.path.basename(__file__)}", save_code=True)
# begin logging
logfile = None
if master_process:
run_id = uuid.uuid4()
os.makedirs("logs", exist_ok=True)
logfile = f"logs/{run_id}.txt"
print(logfile)
def print0(s, console=True):
if master_process:
with open(logfile, "a") as f:
if console:
print(s)
print(s, file=f)
# begin by printing this file (the Python code)
print0(code)
print0("="*100)
# log information about the hardware/software environment this is running on
print0(f"Running Python {sys.version}")
print0(f"Running PyTorch {torch.version.__version__} compiled for CUDA {torch.version.cuda}")
def nvidia_smi():
import subprocess # avoid top level import
return subprocess.run(["nvidia-smi"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True).stdout
if master_process:
print0(nvidia_smi())
print0("="*100)
model: nn.Module = GPT(vocab_size=next_multiple_of_n(50257, n=128), num_layers=12, num_heads=6, model_dim=768, max_seq_len=max(args.seq_len, args.val_seq_len)).cuda()
for m in model.modules():
if isinstance(m, nn.Embedding):
m.bfloat16()
for param in model.parameters():
dist.broadcast(param.detach(), 0)
# collect the parameters to optimize
hidden_matrix_params = [p for n, p in model.blocks.named_parameters() if p.ndim >= 2 and "embed" not in n]
embed_params = [p for n, p in model.named_parameters() if "embed" in n]
scalar_params = [p for p in model.parameters() if p.ndim < 2]
head_params = [model.lm_head.weight]
# init the optimizer(s)
# small adam epsilon by @YouJiacheng. this is an alternate method of fixing the world_size dependence
# discovered by @fernbear.bsky.social https://x.com/hi_tysam/status/1879692937589875094
optimizer1 = DistAdam(scalar_params + head_params + embed_params, lr=0.008, betas=(0.8, 0.95), eps=1e-10, weight_decay=0.0, rank=rank, world_size=world_size)
optimizer2 = Muon(hidden_matrix_params, lr=0.05, momentum=0.95, rank=rank, world_size=world_size, weight_decay=0.0)
optimizers = [optimizer1, optimizer2]
for opt in optimizers:
for group in opt.param_groups:
group["initial_lr"] = group["lr"]
for n, p in model.named_parameters():
wd_mul = getattr(p, "wd_mul", 1.0)
lr_mul = getattr(p, "lr_mul", 1.0)
print0(f"{n}: {p.shape} {p.dtype} {wd_mul} {lr_mul}")
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
embedding_params = sum(p.numel() for n, p in model.named_parameters() if "embed" in n)
non_embedding_params = total_params - embedding_params
print0(f"")
print0(f"Model parameters:")
print0(f" Total parameters: {total_params:,}")
print0(f" Embedding parameters: {embedding_params:,}")
print0(f" Non-embedding parameters: {non_embedding_params:,}")
# learning rate schedule: stable then decay
def get_lr(step: int):
x = step / args.num_iterations # progress in training
assert 0 <= x <= 1
w = min((1 - x) / args.cooldown_frac, 1.0) # 1 -> 0
return w * 1.0 + (1 - w) * 0.1
@lru_cache(1)
def get_window_size_blocks_helper(window_size: int):
return torch.tensor(window_size // 128, dtype=torch.int32, pin_memory=True).cuda(non_blocking=True)
def get_window_size_blocks(step: int):
x = step / args.num_iterations # progress in training
assert 0 <= x <= 1
# Linearly increase the block-wise sliding window size over training 128 -> 1792
# increase by @fernbear.bsky.social; block-wise by @YouJiacheng
window_size = next_multiple_of_n(1728 * x, n=128)
return get_window_size_blocks_helper(window_size)
model: nn.Module = torch.compile(model, mode="reduce-overhead", fullgraph=True, dynamic=False)
# Warmup the training kernels, then re-initialize the state so we aren't cheating
warmup_steps = 10
initial_state = dict(model=copy.deepcopy(model.state_dict()),
optimizers=[copy.deepcopy(opt.state_dict()) for opt in optimizers]) # save the initial state
train_loader = distributed_data_generator(args.train_files, world_size * args.seq_len, rank, world_size)
for _ in range(warmup_steps):
inputs, targets = next(train_loader)
torch.compiler.cudagraph_mark_step_begin()
with autocast(device_type="cuda", dtype=torch.bfloat16):
loss = model(inputs, targets, get_window_size_blocks(1))
loss.backward()
for opt in optimizers:
opt.step()
model.zero_grad(set_to_none=True)
torch.cuda.synchronize()
dist.barrier()
with torch.profiler.profile() as prof:
for _ in range(warmup_steps):
torch.compiler.cudagraph_mark_step_begin()
inputs, targets = next(train_loader)
with autocast(device_type="cuda", dtype=torch.bfloat16):
loss = model(inputs, targets, get_window_size_blocks(1))
loss.backward()
for opt in optimizers:
opt.step()
model.zero_grad(set_to_none=True)
torch.cuda.synchronize()
dist.barrier()
os.makedirs("traces", exist_ok=True)
prof.export_chrome_trace(f"traces/trace_{rank}.json")
model.load_state_dict(initial_state['model'])
for opt, opt_state in zip(optimizers, initial_state['optimizers']):
opt.load_state_dict(opt_state)
del train_loader, initial_state
train_loader = distributed_data_generator(args.train_files, world_size * args.seq_len, rank, world_size)
training_time_ms = 0
# start the clock
torch.cuda.synchronize()
t0 = time.perf_counter()
# begin training
train_steps = args.num_iterations
for step in range(train_steps + 1):
last_step = (step == train_steps)
torch.compiler.cudagraph_mark_step_begin()
# --------------- VALIDATION SECTION -----------------
if last_step or (args.val_loss_every > 0 and step % args.val_loss_every == 0):
# stop the clock
torch.cuda.synchronize()
training_time_ms += 1000 * (time.perf_counter() - t0)
model.eval()
val_batch_size = world_size * args.val_seq_len
assert args.val_tokens % val_batch_size == 0
val_steps = args.val_tokens // val_batch_size
val_loader = distributed_data_generator(args.val_files, val_batch_size, rank, world_size)
val_loss = 0
with torch.no_grad():
for _ in range(val_steps):
inputs, targets = next(val_loader)
with autocast(device_type="cuda", dtype=torch.bfloat16):
val_loss += model(inputs, targets, get_window_size_blocks(step))
val_loss /= val_steps
del val_loader
dist.all_reduce(val_loss, op=dist.ReduceOp.AVG)
#if master_process:
# wandb.log({"val/loss": val_loss}, step=step)
print0(f"step:{step}/{train_steps} val_loss:{val_loss:.4f} train_time:{training_time_ms:.0f}ms step_avg:{training_time_ms/max(step, 1):.2f}ms", console=True)
model.train()
# start the clock again
torch.cuda.synchronize()
t0 = time.perf_counter()
if last_step:
if master_process and args.save_checkpoint:
log = dict(step=step, code=code, model=model.state_dict(), optimizers=[opt.state_dict() for opt in optimizers])
os.makedirs(f"logs/{run_id}", exist_ok=True)
torch.save(log, f"logs/{run_id}/state_step{step:06d}.pt")
# the last step only has the validation loop, so break to avoid training
break
# --------------- TRAINING SECTION -----------------
inputs, targets = next(train_loader)
with autocast(device_type="cuda", dtype=torch.bfloat16):
loss = model(inputs, targets, get_window_size_blocks(step))
loss.backward()
# set optimization hyperparameters
for opt in optimizers:
for group in opt.param_groups:
group["lr"] = group["initial_lr"] * get_lr(step)
frac = min(step / 300, 1)
for group in optimizer2.param_groups:
group["momentum"] = (1 - frac) * 0.85 + frac * 0.95
# step the optimizers and schedulers
for opt in optimizers:
opt.step()
# null the gradients
model.zero_grad(set_to_none=True)
# logging
approx_training_time_ms = training_time_ms + 1000 * (time.perf_counter() - t0)
print0(f"step:{step+1}/{train_steps} train_time:{approx_training_time_ms:.0f}ms step_avg:{approx_training_time_ms/(step + 1):.2f}ms", console=True)
print0(f"peak memory allocated: {torch.cuda.max_memory_allocated() // 1024 // 1024} MiB "
f"reserved: {torch.cuda.max_memory_reserved() // 1024 // 1024} MiB", console=True)
dist.destroy_process_group()
====================================================================================================
Running Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0]
Running PyTorch 2.7.0a0+79aa17489c.nv25.04 compiled for CUDA 12.9
Fri May 30 12:25:55 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:04:00.0 Off | 0 |
| N/A 44C P0 129W / 700W | 5856MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:05:00.0 Off | 0 |
| N/A 39C P0 126W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:0B:00.0 Off | 0 |
| N/A 45C P0 132W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:0C:00.0 Off | 0 |
| N/A 38C P0 124W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:84:00.0 Off | 0 |
| N/A 44C P0 139W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:85:00.0 Off | 0 |
| N/A 37C P0 117W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:8B:00.0 Off | 0 |
| N/A 41C P0 119W / 700W | 1518MiB / 81559MiB | 1% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:8C:00.0 Off | 0 |
| N/A 38C P0 117W / 700W | 1518MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
====================================================================================================
scalars: torch.Size([64]) torch.float32 1.0 5.0
embed.weight: torch.Size([50304, 768]) torch.bfloat16 1.0 75.0
value_embeds.0.weight: torch.Size([50304, 768]) torch.bfloat16 1.0 75.0
value_embeds.1.weight: torch.Size([50304, 768]) torch.bfloat16 1.0 75.0
value_embeds.2.weight: torch.Size([50304, 768]) torch.bfloat16 1.0 75.0
blocks.0.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.0.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.0.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.0.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.1.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.1.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.1.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.1.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.2.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.2.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.2.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.2.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.3.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.3.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.3.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.3.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.4.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.4.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.4.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.4.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.5.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.5.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.5.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.5.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.6.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.6.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.6.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.6.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.7.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.7.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.8.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.8.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.8.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.8.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.9.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.9.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.9.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.9.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.10.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.10.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.10.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.10.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
blocks.11.attn.qkv_w: torch.Size([3, 768, 768]) torch.float32 1.0 1.0
blocks.11.attn.c_proj.weight: torch.Size([768, 768]) torch.float32 1.0 1.0
blocks.11.mlp.c_fc.weight: torch.Size([3072, 768]) torch.float32 1.0 1.0
blocks.11.mlp.c_proj.weight: torch.Size([768, 3072]) torch.float32 1.0 1.0
lm_head.weight: torch.Size([50304, 768]) torch.float32 1.0 27.5
Model parameters:
Total parameters: 275,742,784
Embedding parameters: 154,533,888
Non-embedding parameters: 121,208,896
step:0/1770 val_loss:10.8258 train_time:0ms step_avg:0.03ms
step:1/1770 train_time:157ms step_avg:157.17ms
step:2/1770 train_time:170ms step_avg:84.81ms
step:3/1770 train_time:178ms step_avg:59.47ms
step:4/1770 train_time:187ms step_avg:46.80ms
step:5/1770 train_time:260ms step_avg:52.10ms
step:6/1770 train_time:353ms step_avg:58.84ms
step:7/1770 train_time:446ms step_avg:63.76ms
step:8/1770 train_time:539ms step_avg:67.37ms
step:9/1770 train_time:633ms step_avg:70.32ms
step:10/1770 train_time:726ms step_avg:72.56ms
step:11/1770 train_time:818ms step_avg:74.41ms
step:12/1770 train_time:912ms step_avg:75.96ms
step:13/1770 train_time:1005ms step_avg:77.32ms
step:14/1770 train_time:1102ms step_avg:78.72ms
step:15/1770 train_time:1199ms step_avg:79.96ms
step:16/1770 train_time:1295ms step_avg:80.92ms
step:17/1770 train_time:1389ms step_avg:81.69ms
step:18/1770 train_time:1483ms step_avg:82.41ms
step:19/1770 train_time:1577ms step_avg:82.99ms
step:20/1770 train_time:1670ms step_avg:83.52ms
step:21/1770 train_time:1764ms step_avg:84.00ms
step:22/1770 train_time:1857ms step_avg:84.41ms
step:23/1770 train_time:1951ms step_avg:84.81ms
step:24/1770 train_time:2046ms step_avg:85.25ms
step:25/1770 train_time:2142ms step_avg:85.67ms
step:26/1770 train_time:2238ms step_avg:86.06ms
step:27/1770 train_time:2333ms step_avg:86.42ms
step:28/1770 train_time:2428ms step_avg:86.71ms
step:29/1770 train_time:2522ms step_avg:86.97ms
step:30/1770 train_time:2616ms step_avg:87.20ms
step:31/1770 train_time:2709ms step_avg:87.40ms
step:32/1770 train_time:2803ms step_avg:87.60ms
step:33/1770 train_time:2897ms step_avg:87.78ms
step:34/1770 train_time:2992ms step_avg:87.99ms
step:35/1770 train_time:3086ms step_avg:88.18ms
step:36/1770 train_time:3181ms step_avg:88.35ms
step:37/1770 train_time:3276ms step_avg:88.55ms
step:38/1770 train_time:3372ms step_avg:88.73ms
step:39/1770 train_time:3466ms step_avg:88.88ms
step:40/1770 train_time:3560ms step_avg:89.01ms
step:41/1770 train_time:3654ms step_avg:89.13ms
step:42/1770 train_time:3748ms step_avg:89.25ms
step:43/1770 train_time:3841ms step_avg:89.33ms
step:44/1770 train_time:3936ms step_avg:89.46ms
step:45/1770 train_time:4030ms step_avg:89.57ms
step:46/1770 train_time:4125ms step_avg:89.68ms
step:47/1770 train_time:4219ms step_avg:89.77ms
step:48/1770 train_time:4315ms step_avg:89.89ms
step:49/1770 train_time:4411ms step_avg:90.01ms
step:50/1770 train_time:4506ms step_avg:90.12ms
step:51/1770 train_time:4599ms step_avg:90.17ms
step:52/1770 train_time:4693ms step_avg:90.26ms
step:53/1770 train_time:4788ms step_avg:90.33ms
step:54/1770 train_time:4882ms step_avg:90.41ms
step:55/1770 train_time:4976ms step_avg:90.48ms
step:56/1770 train_time:5072ms step_avg:90.57ms
step:57/1770 train_time:5166ms step_avg:90.63ms
step:58/1770 train_time:5260ms step_avg:90.69ms
step:59/1770 train_time:5355ms step_avg:90.77ms
step:60/1770 train_time:5451ms step_avg:90.85ms
step:61/1770 train_time:5545ms step_avg:90.91ms
step:62/1770 train_time:5639ms step_avg:90.95ms
step:63/1770 train_time:5733ms step_avg:91.01ms
step:64/1770 train_time:5827ms step_avg:91.05ms
step:65/1770 train_time:5921ms step_avg:91.09ms
step:66/1770 train_time:6016ms step_avg:91.15ms
step:67/1770 train_time:6110ms step_avg:91.20ms
step:68/1770 train_time:6204ms step_avg:91.24ms
step:69/1770 train_time:6299ms step_avg:91.28ms
step:70/1770 train_time:6394ms step_avg:91.34ms
step:71/1770 train_time:6489ms step_avg:91.40ms
step:72/1770 train_time:6583ms step_avg:91.43ms
step:73/1770 train_time:6677ms step_avg:91.47ms
step:74/1770 train_time:6772ms step_avg:91.51ms
step:75/1770 train_time:6866ms step_avg:91.55ms
step:76/1770 train_time:6960ms step_avg:91.58ms
step:77/1770 train_time:7056ms step_avg:91.63ms
step:78/1770 train_time:7151ms step_avg:91.67ms
step:79/1770 train_time:7244ms step_avg:91.70ms
step:80/1770 train_time:7338ms step_avg:91.72ms
step:81/1770 train_time:7434ms step_avg:91.77ms
step:82/1770 train_time:7528ms step_avg:91.80ms
step:83/1770 train_time:7621ms step_avg:91.82ms
step:84/1770 train_time:7717ms step_avg:91.87ms
step:85/1770 train_time:7812ms step_avg:91.91ms
step:86/1770 train_time:7907ms step_avg:91.94ms
step:87/1770 train_time:8001ms step_avg:91.96ms
step:88/1770 train_time:8095ms step_avg:91.99ms
step:89/1770 train_time:8191ms step_avg:92.03ms
step:90/1770 train_time:8284ms step_avg:92.05ms
step:91/1770 train_time:8379ms step_avg:92.08ms
step:92/1770 train_time:8474ms step_avg:92.11ms
step:93/1770 train_time:8568ms step_avg:92.13ms
step:94/1770 train_time:8663ms step_avg:92.16ms
step:95/1770 train_time:8757ms step_avg:92.18ms
step:96/1770 train_time:8851ms step_avg:92.20ms
step:97/1770 train_time:8944ms step_avg:92.21ms
step:98/1770 train_time:9038ms step_avg:92.23ms
step:99/1770 train_time:9134ms step_avg:92.26ms
step:100/1770 train_time:9228ms step_avg:92.28ms
step:101/1770 train_time:9322ms step_avg:92.30ms
step:102/1770 train_time:9416ms step_avg:92.32ms
step:103/1770 train_time:9511ms step_avg:92.34ms
step:104/1770 train_time:9605ms step_avg:92.36ms
step:105/1770 train_time:9699ms step_avg:92.37ms
step:106/1770 train_time:9794ms step_avg:92.39ms
step:107/1770 train_time:9887ms step_avg:92.41ms
step:108/1770 train_time:9982ms step_avg:92.42ms
step:109/1770 train_time:10076ms step_avg:92.44ms
step:110/1770 train_time:10171ms step_avg:92.46ms
step:111/1770 train_time:10266ms step_avg:92.48ms
step:112/1770 train_time:10360ms step_avg:92.50ms
step:113/1770 train_time:10455ms step_avg:92.52ms
step:114/1770 train_time:10549ms step_avg:92.54ms
step:115/1770 train_time:10643ms step_avg:92.55ms
step:116/1770 train_time:10737ms step_avg:92.56ms
step:117/1770 train_time:10832ms step_avg:92.58ms
step:118/1770 train_time:10926ms step_avg:92.60ms
step:119/1770 train_time:11020ms step_avg:92.61ms
step:120/1770 train_time:11115ms step_avg:92.63ms
step:121/1770 train_time:11210ms step_avg:92.64ms
step:122/1770 train_time:11305ms step_avg:92.66ms
step:123/1770 train_time:11399ms step_avg:92.68ms
step:124/1770 train_time:11494ms step_avg:92.69ms
step:125/1770 train_time:11589ms step_avg:92.71ms
step:125/1770 val_loss:4.6449 train_time:11860ms step_avg:94.88ms
step:126/1770 train_time:11962ms step_avg:94.94ms
step:127/1770 train_time:11986ms step_avg:94.38ms
step:128/1770 train_time:12059ms step_avg:94.21ms
step:129/1770 train_time:12100ms step_avg:93.80ms
step:130/1770 train_time:12157ms step_avg:93.52ms
step:131/1770 train_time:12217ms step_avg:93.26ms
step:132/1770 train_time:12258ms step_avg:92.86ms
step:133/1770 train_time:12351ms step_avg:92.86ms
step:134/1770 train_time:12444ms step_avg:92.87ms
step:135/1770 train_time:12537ms step_avg:92.87ms
step:136/1770 train_time:12631ms step_avg:92.88ms
step:137/1770 train_time:12727ms step_avg:92.90ms
step:138/1770 train_time:12826ms step_avg:92.94ms
step:139/1770 train_time:12923ms step_avg:92.97ms
step:140/1770 train_time:13018ms step_avg:92.99ms
step:141/1770 train_time:13114ms step_avg:93.00ms
step:142/1770 train_time:13208ms step_avg:93.02ms
step:143/1770 train_time:13303ms step_avg:93.03ms
step:144/1770 train_time:13397ms step_avg:93.04ms
step:145/1770 train_time:13491ms step_avg:93.04ms
step:146/1770 train_time:13585ms step_avg:93.05ms
step:147/1770 train_time:13680ms step_avg:93.06ms
step:148/1770 train_time:13776ms step_avg:93.08ms
step:149/1770 train_time:13873ms step_avg:93.11ms
step:150/1770 train_time:13971ms step_avg:93.14ms
step:151/1770 train_time:14068ms step_avg:93.17ms
step:152/1770 train_time:14163ms step_avg:93.18ms
step:153/1770 train_time:14257ms step_avg:93.18ms
step:154/1770 train_time:14352ms step_avg:93.19ms
step:155/1770 train_time:14447ms step_avg:93.21ms
step:156/1770 train_time:14541ms step_avg:93.21ms
step:157/1770 train_time:14635ms step_avg:93.22ms
step:158/1770 train_time:14731ms step_avg:93.23ms
step:159/1770 train_time:14828ms step_avg:93.26ms
step:160/1770 train_time:14924ms step_avg:93.27ms
step:161/1770 train_time:15018ms step_avg:93.28ms
step:162/1770 train_time:15113ms step_avg:93.29ms
step:163/1770 train_time:15209ms step_avg:93.31ms
step:164/1770 train_time:15304ms step_avg:93.32ms
step:165/1770 train_time:15399ms step_avg:93.33ms
step:166/1770 train_time:15493ms step_avg:93.33ms
step:167/1770 train_time:15588ms step_avg:93.34ms
step:168/1770 train_time:15683ms step_avg:93.35ms
step:169/1770 train_time:15778ms step_avg:93.36ms
step:170/1770 train_time:15873ms step_avg:93.37ms
step:171/1770 train_time:15971ms step_avg:93.39ms
step:172/1770 train_time:16066ms step_avg:93.41ms
step:173/1770 train_time:16161ms step_avg:93.42ms
step:174/1770 train_time:16256ms step_avg:93.42ms
step:175/1770 train_time:16351ms step_avg:93.43ms
step:176/1770 train_time:16447ms step_avg:93.45ms
step:177/1770 train_time:16541ms step_avg:93.45ms
step:178/1770 train_time:16635ms step_avg:93.46ms
step:179/1770 train_time:16731ms step_avg:93.47ms
step:180/1770 train_time:16828ms step_avg:93.49ms
step:181/1770 train_time:16924ms step_avg:93.50ms
step:182/1770 train_time:17019ms step_avg:93.51ms
step:183/1770 train_time:17114ms step_avg:93.52ms
step:184/1770 train_time:17210ms step_avg:93.53ms
step:185/1770 train_time:17305ms step_avg:93.54ms
step:186/1770 train_time:17399ms step_avg:93.54ms
step:187/1770 train_time:17494ms step_avg:93.55ms
step:188/1770 train_time:17589ms step_avg:93.56ms
step:189/1770 train_time:17684ms step_avg:93.57ms
step:190/1770 train_time:17779ms step_avg:93.57ms
step:191/1770 train_time:17874ms step_avg:93.58ms
step:192/1770 train_time:17970ms step_avg:93.59ms
step:193/1770 train_time:18066ms step_avg:93.61ms
step:194/1770 train_time:18162ms step_avg:93.62ms
step:195/1770 train_time:18257ms step_avg:93.62ms
step:196/1770 train_time:18351ms step_avg:93.63ms
step:197/1770 train_time:18447ms step_avg:93.64ms
step:198/1770 train_time:18542ms step_avg:93.65ms
step:199/1770 train_time:18637ms step_avg:93.65ms
step:200/1770 train_time:18731ms step_avg:93.65ms
step:201/1770 train_time:18827ms step_avg:93.67ms
step:202/1770 train_time:18923ms step_avg:93.68ms
step:203/1770 train_time:19018ms step_avg:93.68ms
step:204/1770 train_time:19113ms step_avg:93.69ms
step:205/1770 train_time:19210ms step_avg:93.71ms
step:206/1770 train_time:19305ms step_avg:93.71ms
step:207/1770 train_time:19400ms step_avg:93.72ms
step:208/1770 train_time:19495ms step_avg:93.72ms
step:209/1770 train_time:19590ms step_avg:93.73ms
step:210/1770 train_time:19685ms step_avg:93.74ms
step:211/1770 train_time:19779ms step_avg:93.74ms
step:212/1770 train_time:19874ms step_avg:93.75ms
step:213/1770 train_time:19970ms step_avg:93.76ms
step:214/1770 train_time:20067ms step_avg:93.77ms
step:215/1770 train_time:20163ms step_avg:93.78ms
step:216/1770 train_time:20258ms step_avg:93.79ms
step:217/1770 train_time:20352ms step_avg:93.79ms
step:218/1770 train_time:20449ms step_avg:93.80ms
step:219/1770 train_time:20544ms step_avg:93.81ms
step:220/1770 train_time:20640ms step_avg:93.82ms
step:221/1770 train_time:20735ms step_avg:93.82ms
step:222/1770 train_time:20831ms step_avg:93.83ms
step:223/1770 train_time:20926ms step_avg:93.84ms
step:224/1770 train_time:21022ms step_avg:93.85ms
step:225/1770 train_time:21117ms step_avg:93.85ms
step:226/1770 train_time:21212ms step_avg:93.86ms
step:227/1770 train_time:21309ms step_avg:93.87ms
step:228/1770 train_time:21404ms step_avg:93.88ms
step:229/1770 train_time:21498ms step_avg:93.88ms
step:230/1770 train_time:21593ms step_avg:93.88ms
step:231/1770 train_time:21688ms step_avg:93.89ms
step:232/1770 train_time:21784ms step_avg:93.89ms
step:233/1770 train_time:21878ms step_avg:93.90ms
step:234/1770 train_time:21974ms step_avg:93.91ms
step:235/1770 train_time:22070ms step_avg:93.92ms
step:236/1770 train_time:22166ms step_avg:93.92ms
step:237/1770 train_time:22261ms step_avg:93.93ms
step:238/1770 train_time:22356ms step_avg:93.93ms
step:239/1770 train_time:22450ms step_avg:93.93ms
step:240/1770 train_time:22546ms step_avg:93.94ms
step:241/1770 train_time:22640ms step_avg:93.94ms
step:242/1770 train_time:22735ms step_avg:93.95ms
step:243/1770 train_time:22831ms step_avg:93.96ms
step:244/1770 train_time:22928ms step_avg:93.97ms
step:245/1770 train_time:23024ms step_avg:93.98ms
step:246/1770 train_time:23119ms step_avg:93.98ms
step:247/1770 train_time:23214ms step_avg:93.98ms
step:248/1770 train_time:23311ms step_avg:93.99ms
step:249/1770 train_time:23407ms step_avg:94.00ms
step:250/1770 train_time:23502ms step_avg:94.01ms
step:250/1770 val_loss:4.1038 train_time:23775ms step_avg:95.10ms
step:251/1770 train_time:23786ms step_avg:94.76ms
step:252/1770 train_time:23795ms step_avg:94.42ms
step:253/1770 train_time:23804ms step_avg:94.09ms
step:254/1770 train_time:23891ms step_avg:94.06ms
step:255/1770 train_time:23988ms step_avg:94.07ms
step:256/1770 train_time:24085ms step_avg:94.08ms
step:257/1770 train_time:24182ms step_avg:94.09ms
step:258/1770 train_time:24277ms step_avg:94.10ms
step:259/1770 train_time:24370ms step_avg:94.09ms
step:260/1770 train_time:24465ms step_avg:94.09ms
step:261/1770 train_time:24560ms step_avg:94.10ms
step:262/1770 train_time:24653ms step_avg:94.10ms
step:263/1770 train_time:24749ms step_avg:94.10ms
step:264/1770 train_time:24847ms step_avg:94.12ms
step:265/1770 train_time:24945ms step_avg:94.13ms
step:266/1770 train_time:25042ms step_avg:94.14ms
step:267/1770 train_time:25139ms step_avg:94.15ms
step:268/1770 train_time:25235ms step_avg:94.16ms
step:269/1770 train_time:25331ms step_avg:94.17ms
step:270/1770 train_time:25425ms step_avg:94.17ms
step:271/1770 train_time:25520ms step_avg:94.17ms
step:272/1770 train_time:25616ms step_avg:94.18ms
step:273/1770 train_time:25711ms step_avg:94.18ms
step:274/1770 train_time:25807ms step_avg:94.19ms
step:275/1770 train_time:25906ms step_avg:94.20ms
step:276/1770 train_time:26004ms step_avg:94.22ms
step:277/1770 train_time:26100ms step_avg:94.23ms
step:278/1770 train_time:26197ms step_avg:94.23ms
step:279/1770 train_time:26292ms step_avg:94.24ms
step:280/1770 train_time:26388ms step_avg:94.24ms
step:281/1770 train_time:26484ms step_avg:94.25ms
step:282/1770 train_time:26581ms step_avg:94.26ms
step:283/1770 train_time:26677ms step_avg:94.26ms
step:284/1770 train_time:26773ms step_avg:94.27ms
step:285/1770 train_time:26870ms step_avg:94.28ms
step:286/1770 train_time:26967ms step_avg:94.29ms
step:287/1770 train_time:27064ms step_avg:94.30ms
step:288/1770 train_time:27160ms step_avg:94.31ms
step:289/1770 train_time:27257ms step_avg:94.32ms
step:290/1770 train_time:27352ms step_avg:94.32ms
step:291/1770 train_time:27448ms step_avg:94.32ms
step:292/1770 train_time:27544ms step_avg:94.33ms
step:293/1770 train_time:27640ms step_avg:94.33ms
step:294/1770 train_time:27736ms step_avg:94.34ms
step:295/1770 train_time:27832ms step_avg:94.35ms
step:296/1770 train_time:27928ms step_avg:94.35ms
step:297/1770 train_time:28025ms step_avg:94.36ms
step:298/1770 train_time:28122ms step_avg:94.37ms
step:299/1770 train_time:28219ms step_avg:94.38ms
step:300/1770 train_time:28314ms step_avg:94.38ms
step:301/1770 train_time:28409ms step_avg:94.38ms
step:302/1770 train_time:28505ms step_avg:94.39ms
step:303/1770 train_time:28602ms step_avg:94.40ms
step:304/1770 train_time:28697ms step_avg:94.40ms
step:305/1770 train_time:28793ms step_avg:94.40ms
step:306/1770 train_time:28888ms step_avg:94.41ms
step:307/1770 train_time:28985ms step_avg:94.41ms
step:308/1770 train_time:29082ms step_avg:94.42ms
step:309/1770 train_time:29180ms step_avg:94.43ms
step:310/1770 train_time:29277ms step_avg:94.44ms
step:311/1770 train_time:29372ms step_avg:94.44ms
step:312/1770 train_time:29468ms step_avg:94.45ms
step:313/1770 train_time:29564ms step_avg:94.45ms
step:314/1770 train_time:29661ms step_avg:94.46ms
step:315/1770 train_time:29757ms step_avg:94.47ms
step:316/1770 train_time:29853ms step_avg:94.47ms
step:317/1770 train_time:29949ms step_avg:94.48ms
step:318/1770 train_time:30045ms step_avg:94.48ms
step:319/1770 train_time:30142ms step_avg:94.49ms
step:320/1770 train_time:30238ms step_avg:94.49ms
step:321/1770 train_time:30333ms step_avg:94.50ms
step:322/1770 train_time:30429ms step_avg:94.50ms
step:323/1770 train_time:30525ms step_avg:94.51ms
step:324/1770 train_time:30622ms step_avg:94.51ms
step:325/1770 train_time:30718ms step_avg:94.52ms
step:326/1770 train_time:30813ms step_avg:94.52ms
step:327/1770 train_time:30909ms step_avg:94.52ms
step:328/1770 train_time:31006ms step_avg:94.53ms
step:329/1770 train_time:31103ms step_avg:94.54ms
step:330/1770 train_time:31200ms step_avg:94.54ms
step:331/1770 train_time:31296ms step_avg:94.55ms
step:332/1770 train_time:31391ms step_avg:94.55ms
step:333/1770 train_time:31487ms step_avg:94.55ms
step:334/1770 train_time:31583ms step_avg:94.56ms
step:335/1770 train_time:31679ms step_avg:94.56ms
step:336/1770 train_time:31774ms step_avg:94.57ms
step:337/1770 train_time:31870ms step_avg:94.57ms
step:338/1770 train_time:31966ms step_avg:94.57ms
step:339/1770 train_time:32063ms step_avg:94.58ms
step:340/1770 train_time:32159ms step_avg:94.59ms
step:341/1770 train_time:32255ms step_avg:94.59ms
step:342/1770 train_time:32351ms step_avg:94.59ms
step:343/1770 train_time:32447ms step_avg:94.60ms
step:344/1770 train_time:32544ms step_avg:94.61ms
step:345/1770 train_time:32641ms step_avg:94.61ms
step:346/1770 train_time:32738ms step_avg:94.62ms
step:347/1770 train_time:32833ms step_avg:94.62ms
step:348/1770 train_time:32930ms step_avg:94.63ms
step:349/1770 train_time:33026ms step_avg:94.63ms
step:350/1770 train_time:33124ms step_avg:94.64ms
step:351/1770 train_time:33221ms step_avg:94.65ms
step:352/1770 train_time:33318ms step_avg:94.65ms
step:353/1770 train_time:33414ms step_avg:94.66ms
step:354/1770 train_time:33509ms step_avg:94.66ms
step:355/1770 train_time:33606ms step_avg:94.66ms
step:356/1770 train_time:33703ms step_avg:94.67ms
step:357/1770 train_time:33799ms step_avg:94.68ms
step:358/1770 train_time:33895ms step_avg:94.68ms
step:359/1770 train_time:33990ms step_avg:94.68ms
step:360/1770 train_time:34086ms step_avg:94.68ms
step:361/1770 train_time:34183ms step_avg:94.69ms
step:362/1770 train_time:34279ms step_avg:94.69ms
step:363/1770 train_time:34376ms step_avg:94.70ms
step:364/1770 train_time:34471ms step_avg:94.70ms
step:365/1770 train_time:34567ms step_avg:94.70ms
step:366/1770 train_time:34664ms step_avg:94.71ms
step:367/1770 train_time:34761ms step_avg:94.72ms
step:368/1770 train_time:34857ms step_avg:94.72ms
step:369/1770 train_time:34953ms step_avg:94.72ms
step:370/1770 train_time:35048ms step_avg:94.73ms
step:371/1770 train_time:35145ms step_avg:94.73ms
step:372/1770 train_time:35242ms step_avg:94.74ms
step:373/1770 train_time:35338ms step_avg:94.74ms
step:374/1770 train_time:35435ms step_avg:94.74ms
step:375/1770 train_time:35529ms step_avg:94.75ms
step:375/1770 val_loss:3.8967 train_time:35806ms step_avg:95.48ms
step:376/1770 train_time:35817ms step_avg:95.26ms
step:377/1770 train_time:35826ms step_avg:95.03ms
step:378/1770 train_time:35835ms step_avg:94.80ms
step:379/1770 train_time:35919ms step_avg:94.77ms
step:380/1770 train_time:36018ms step_avg:94.78ms
step:381/1770 train_time:36113ms step_avg:94.78ms
step:382/1770 train_time:36208ms step_avg:94.79ms
step:383/1770 train_time:36304ms step_avg:94.79ms
step:384/1770 train_time:36399ms step_avg:94.79ms
step:385/1770 train_time:36494ms step_avg:94.79ms
step:386/1770 train_time:36589ms step_avg:94.79ms
step:387/1770 train_time:36685ms step_avg:94.79ms
step:388/1770 train_time:36784ms step_avg:94.80ms
step:389/1770 train_time:36883ms step_avg:94.81ms
step:390/1770 train_time:36981ms step_avg:94.82ms
step:391/1770 train_time:37078ms step_avg:94.83ms
step:392/1770 train_time:37173ms step_avg:94.83ms
step:393/1770 train_time:37269ms step_avg:94.83ms
step:394/1770 train_time:37365ms step_avg:94.84ms
step:395/1770 train_time:37461ms step_avg:94.84ms
step:396/1770 train_time:37556ms step_avg:94.84ms
step:397/1770 train_time:37652ms step_avg:94.84ms
step:398/1770 train_time:37750ms step_avg:94.85ms
step:399/1770 train_time:37850ms step_avg:94.86ms
step:400/1770 train_time:37949ms step_avg:94.87ms
step:401/1770 train_time:38048ms step_avg:94.88ms
step:402/1770 train_time:38147ms step_avg:94.89ms
step:403/1770 train_time:38247ms step_avg:94.91ms
step:404/1770 train_time:38345ms step_avg:94.91ms
step:405/1770 train_time:38444ms step_avg:94.92ms
step:406/1770 train_time:38542ms step_avg:94.93ms
step:407/1770 train_time:38641ms step_avg:94.94ms
step:408/1770 train_time:38739ms step_avg:94.95ms
step:409/1770 train_time:38838ms step_avg:94.96ms
step:410/1770 train_time:38936ms step_avg:94.96ms
step:411/1770 train_time:39034ms step_avg:94.97ms
step:412/1770 train_time:39132ms step_avg:94.98ms
step:413/1770 train_time:39229ms step_avg:94.99ms
step:414/1770 train_time:39326ms step_avg:94.99ms
step:415/1770 train_time:39425ms step_avg:95.00ms
step:416/1770 train_time:39523ms step_avg:95.01ms
step:417/1770 train_time:39621ms step_avg:95.01ms
step:418/1770 train_time:39719ms step_avg:95.02ms
step:419/1770 train_time:39818ms step_avg:95.03ms
step:420/1770 train_time:39916ms step_avg:95.04ms
step:421/1770 train_time:40015ms step_avg:95.05ms
step:422/1770 train_time:40114ms step_avg:95.06ms
step:423/1770 train_time:40212ms step_avg:95.06ms
step:424/1770 train_time:40310ms step_avg:95.07ms
step:425/1770 train_time:40408ms step_avg:95.08ms
step:426/1770 train_time:40507ms step_avg:95.09ms
step:427/1770 train_time:40605ms step_avg:95.09ms
step:428/1770 train_time:40704ms step_avg:95.10ms
step:429/1770 train_time:40803ms step_avg:95.11ms
step:430/1770 train_time:40901ms step_avg:95.12ms
step:431/1770 train_time:41000ms step_avg:95.13ms
step:432/1770 train_time:41099ms step_avg:95.14ms
step:433/1770 train_time:41196ms step_avg:95.14ms
step:434/1770 train_time:41294ms step_avg:95.15ms
step:435/1770 train_time:41391ms step_avg:95.15ms
step:436/1770 train_time:41489ms step_avg:95.16ms
step:437/1770 train_time:41588ms step_avg:95.17ms
step:438/1770 train_time:41687ms step_avg:95.18ms
step:439/1770 train_time:41786ms step_avg:95.18ms
step:440/1770 train_time:41885ms step_avg:95.19ms
step:441/1770 train_time:41985ms step_avg:95.20ms
step:442/1770 train_time:42084ms step_avg:95.21ms
step:443/1770 train_time:42183ms step_avg:95.22ms
step:444/1770 train_time:42283ms step_avg:95.23ms
step:445/1770 train_time:42383ms step_avg:95.24ms
step:446/1770 train_time:42482ms step_avg:95.25ms
step:447/1770 train_time:42579ms step_avg:95.26ms
step:448/1770 train_time:42677ms step_avg:95.26ms
step:449/1770 train_time:42774ms step_avg:95.27ms
step:450/1770 train_time:42872ms step_avg:95.27ms
step:451/1770 train_time:42970ms step_avg:95.28ms
step:452/1770 train_time:43069ms step_avg:95.28ms
step:453/1770 train_time:43167ms step_avg:95.29ms
step:454/1770 train_time:43267ms step_avg:95.30ms
step:455/1770 train_time:43367ms step_avg:95.31ms
step:456/1770 train_time:43466ms step_avg:95.32ms
step:457/1770 train_time:43565ms step_avg:95.33ms
step:458/1770 train_time:43663ms step_avg:95.33ms
step:459/1770 train_time:43762ms step_avg:95.34ms
step:460/1770 train_time:43861ms step_avg:95.35ms
step:461/1770 train_time:43959ms step_avg:95.36ms
step:462/1770 train_time:44058ms step_avg:95.36ms
step:463/1770 train_time:44157ms step_avg:95.37ms
step:464/1770 train_time:44255ms step_avg:95.38ms
step:465/1770 train_time:44353ms step_avg:95.38ms
step:466/1770 train_time:44451ms step_avg:95.39ms
step:467/1770 train_time:44549ms step_avg:95.39ms
step:468/1770 train_time:44648ms step_avg:95.40ms
step:469/1770 train_time:44745ms step_avg:95.41ms
step:470/1770 train_time:44844ms step_avg:95.41ms
step:471/1770 train_time:44943ms step_avg:95.42ms
step:472/1770 train_time:45042ms step_avg:95.43ms
step:473/1770 train_time:45140ms step_avg:95.43ms
step:474/1770 train_time:45238ms step_avg:95.44ms
step:475/1770 train_time:45338ms step_avg:95.45ms
step:476/1770 train_time:45437ms step_avg:95.46ms
step:477/1770 train_time:45535ms step_avg:95.46ms
step:478/1770 train_time:45632ms step_avg:95.46ms
step:479/1770 train_time:45729ms step_avg:95.47ms
step:480/1770 train_time:45829ms step_avg:95.48ms
step:481/1770 train_time:45928ms step_avg:95.48ms
step:482/1770 train_time:46027ms step_avg:95.49ms
step:483/1770 train_time:46127ms step_avg:95.50ms
step:484/1770 train_time:46226ms step_avg:95.51ms
step:485/1770 train_time:46326ms step_avg:95.52ms
step:486/1770 train_time:46424ms step_avg:95.52ms
step:487/1770 train_time:46523ms step_avg:95.53ms
step:488/1770 train_time:46621ms step_avg:95.54ms
step:489/1770 train_time:46719ms step_avg:95.54ms
step:490/1770 train_time:46817ms step_avg:95.55ms
step:491/1770 train_time:46915ms step_avg:95.55ms
step:492/1770 train_time:47013ms step_avg:95.55ms
step:493/1770 train_time:47111ms step_avg:95.56ms
step:494/1770 train_time:47210ms step_avg:95.57ms
step:495/1770 train_time:47308ms step_avg:95.57ms
step:496/1770 train_time:47407ms step_avg:95.58ms
step:497/1770 train_time:47505ms step_avg:95.58ms
step:498/1770 train_time:47604ms step_avg:95.59ms
step:499/1770 train_time:47703ms step_avg:95.60ms
step:500/1770 train_time:47801ms step_avg:95.60ms
step:500/1770 val_loss:3.7501 train_time:48083ms step_avg:96.17ms
step:501/1770 train_time:48093ms step_avg:95.99ms
step:502/1770 train_time:48102ms step_avg:95.82ms
step:503/1770 train_time:48111ms step_avg:95.65ms
step:504/1770 train_time:48199ms step_avg:95.63ms
step:505/1770 train_time:48296ms step_avg:95.64ms
step:506/1770 train_time:48393ms step_avg:95.64ms
step:507/1770 train_time:48492ms step_avg:95.64ms
step:508/1770 train_time:48591ms step_avg:95.65ms
step:509/1770 train_time:48689ms step_avg:95.66ms
step:510/1770 train_time:48786ms step_avg:95.66ms
step:511/1770 train_time:48884ms step_avg:95.66ms
step:512/1770 train_time:48982ms step_avg:95.67ms
step:513/1770 train_time:49083ms step_avg:95.68ms
step:514/1770 train_time:49183ms step_avg:95.69ms
step:515/1770 train_time:49281ms step_avg:95.69ms
step:516/1770 train_time:49379ms step_avg:95.70ms
step:517/1770 train_time:49476ms step_avg:95.70ms
step:518/1770 train_time:49574ms step_avg:95.70ms
step:519/1770 train_time:49672ms step_avg:95.71ms
step:520/1770 train_time:49769ms step_avg:95.71ms
step:521/1770 train_time:49867ms step_avg:95.71ms
step:522/1770 train_time:49965ms step_avg:95.72ms
step:523/1770 train_time:50064ms step_avg:95.72ms
step:524/1770 train_time:50162ms step_avg:95.73ms
step:525/1770 train_time:50261ms step_avg:95.73ms
step:526/1770 train_time:50359ms step_avg:95.74ms
step:527/1770 train_time:50456ms step_avg:95.74ms
step:528/1770 train_time:50553ms step_avg:95.74ms
step:529/1770 train_time:50651ms step_avg:95.75ms
step:530/1770 train_time:50749ms step_avg:95.75ms
step:531/1770 train_time:50846ms step_avg:95.76ms
step:532/1770 train_time:50945ms step_avg:95.76ms
step:533/1770 train_time:51044ms step_avg:95.77ms
step:534/1770 train_time:51142ms step_avg:95.77ms
step:535/1770 train_time:51242ms step_avg:95.78ms
step:536/1770 train_time:51342ms step_avg:95.79ms
step:537/1770 train_time:51440ms step_avg:95.79ms
step:538/1770 train_time:51538ms step_avg:95.80ms
step:539/1770 train_time:51635ms step_avg:95.80ms
step:540/1770 train_time:51732ms step_avg:95.80ms
step:541/1770 train_time:51830ms step_avg:95.80ms
step:542/1770 train_time:51930ms step_avg:95.81ms
step:543/1770 train_time:52030ms step_avg:95.82ms
step:544/1770 train_time:52130ms step_avg:95.83ms
step:545/1770 train_time:52231ms step_avg:95.84ms
step:546/1770 train_time:52332ms step_avg:95.85ms
step:547/1770 train_time:52432ms step_avg:95.85ms
step:548/1770 train_time:52532ms step_avg:95.86ms
step:549/1770 train_time:52631ms step_avg:95.87ms
step:550/1770 train_time:52728ms step_avg:95.87ms
step:551/1770 train_time:52825ms step_avg:95.87ms
step:552/1770 train_time:52924ms step_avg:95.88ms
step:553/1770 train_time:53021ms step_avg:95.88ms
step:554/1770 train_time:53120ms step_avg:95.88ms
step:555/1770 train_time:53218ms step_avg:95.89ms
step:556/1770 train_time:53317ms step_avg:95.89ms
step:557/1770 train_time:53414ms step_avg:95.90ms
step:558/1770 train_time:53512ms step_avg:95.90ms
step:559/1770 train_time:53612ms step_avg:95.91ms
step:560/1770 train_time:53712ms step_avg:95.91ms
step:561/1770 train_time:53811ms step_avg:95.92ms
step:562/1770 train_time:53909ms step_avg:95.92ms
step:563/1770 train_time:54009ms step_avg:95.93ms
step:564/1770 train_time:54108ms step_avg:95.94ms
step:565/1770 train_time:54208ms step_avg:95.94ms
step:566/1770 train_time:54307ms step_avg:95.95ms
step:567/1770 train_time:54405ms step_avg:95.95ms
step:568/1770 train_time:54505ms step_avg:95.96ms
step:569/1770 train_time:54603ms step_avg:95.96ms
step:570/1770 train_time:54701ms step_avg:95.97ms
step:571/1770 train_time:54798ms step_avg:95.97ms
step:572/1770 train_time:54896ms step_avg:95.97ms
step:573/1770 train_time:54994ms step_avg:95.98ms
step:574/1770 train_time:55093ms step_avg:95.98ms
step:575/1770 train_time:55193ms step_avg:95.99ms
step:576/1770 train_time:55292ms step_avg:95.99ms
step:577/1770 train_time:55393ms step_avg:96.00ms
step:578/1770 train_time:55493ms step_avg:96.01ms
step:579/1770 train_time:55592ms step_avg:96.01ms
step:580/1770 train_time:55690ms step_avg:96.02ms
step:581/1770 train_time:55789ms step_avg:96.02ms
step:582/1770 train_time:55887ms step_avg:96.02ms
step:583/1770 train_time:55985ms step_avg:96.03ms
step:584/1770 train_time:56083ms step_avg:96.03ms
step:585/1770 train_time:56182ms step_avg:96.04ms
step:586/1770 train_time:56280ms step_avg:96.04ms
step:587/1770 train_time:56379ms step_avg:96.05ms
step:588/1770 train_time:56477ms step_avg:96.05ms
step:589/1770 train_time:56575ms step_avg:96.05ms
step:590/1770 train_time:56674ms step_avg:96.06ms
step:591/1770 train_time:56773ms step_avg:96.06ms
step:592/1770 train_time:56872ms step_avg:96.07ms
step:593/1770 train_time:56973ms step_avg:96.08ms
step:594/1770 train_time:57072ms step_avg:96.08ms
step:595/1770 train_time:57170ms step_avg:96.08ms
step:596/1770 train_time:57269ms step_avg:96.09ms
step:597/1770 train_time:57368ms step_avg:96.09ms
step:598/1770 train_time:57468ms step_avg:96.10ms
step:599/1770 train_time:57567ms step_avg:96.11ms
step:600/1770 train_time:57666ms step_avg:96.11ms
step:601/1770 train_time:57765ms step_avg:96.12ms
step:602/1770 train_time:57863ms step_avg:96.12ms
step:603/1770 train_time:57961ms step_avg:96.12ms
step:604/1770 train_time:58059ms step_avg:96.12ms
step:605/1770 train_time:58157ms step_avg:96.13ms
step:606/1770 train_time:58255ms step_avg:96.13ms
step:607/1770 train_time:58355ms step_avg:96.14ms
step:608/1770 train_time:58454ms step_avg:96.14ms
step:609/1770 train_time:58554ms step_avg:96.15ms
step:610/1770 train_time:58652ms step_avg:96.15ms
step:611/1770 train_time:58752ms step_avg:96.16ms
step:612/1770 train_time:58852ms step_avg:96.16ms
step:613/1770 train_time:58951ms step_avg:96.17ms
step:614/1770 train_time:59052ms step_avg:96.18ms
step:615/1770 train_time:59151ms step_avg:96.18ms
step:616/1770 train_time:59252ms step_avg:96.19ms
step:617/1770 train_time:59351ms step_avg:96.19ms
step:618/1770 train_time:59450ms step_avg:96.20ms
step:619/1770 train_time:59549ms step_avg:96.20ms
step:620/1770 train_time:59649ms step_avg:96.21ms
step:621/1770 train_time:59749ms step_avg:96.21ms
step:622/1770 train_time:59849ms step_avg:96.22ms
step:623/1770 train_time:59947ms step_avg:96.22ms
step:624/1770 train_time:60046ms step_avg:96.23ms
step:625/1770 train_time:60144ms step_avg:96.23ms
step:625/1770 val_loss:3.6622 train_time:60426ms step_avg:96.68ms
step:626/1770 train_time:60436ms step_avg:96.54ms
step:627/1770 train_time:60444ms step_avg:96.40ms
step:628/1770 train_time:60452ms step_avg:96.26ms
step:629/1770 train_time:60545ms step_avg:96.26ms
step:630/1770 train_time:60643ms step_avg:96.26ms
step:631/1770 train_time:60742ms step_avg:96.26ms
step:632/1770 train_time:60839ms step_avg:96.26ms
step:633/1770 train_time:60937ms step_avg:96.27ms
step:634/1770 train_time:61035ms step_avg:96.27ms
step:635/1770 train_time:61133ms step_avg:96.27ms
step:636/1770 train_time:61230ms step_avg:96.27ms
step:637/1770 train_time:61329ms step_avg:96.28ms
step:638/1770 train_time:61431ms step_avg:96.29ms
step:639/1770 train_time:61533ms step_avg:96.30ms
step:640/1770 train_time:61633ms step_avg:96.30ms
step:641/1770 train_time:61733ms step_avg:96.31ms
step:642/1770 train_time:61831ms step_avg:96.31ms
step:643/1770 train_time:61929ms step_avg:96.31ms
step:644/1770 train_time:62027ms step_avg:96.32ms
step:645/1770 train_time:62125ms step_avg:96.32ms
step:646/1770 train_time:62222ms step_avg:96.32ms
step:647/1770 train_time:62320ms step_avg:96.32ms
step:648/1770 train_time:62420ms step_avg:96.33ms
step:649/1770 train_time:62520ms step_avg:96.33ms
step:650/1770 train_time:62620ms step_avg:96.34ms
step:651/1770 train_time:62719ms step_avg:96.34ms
step:652/1770 train_time:62818ms step_avg:96.35ms
step:653/1770 train_time:62915ms step_avg:96.35ms
step:654/1770 train_time:63014ms step_avg:96.35ms
step:655/1770 train_time:63114ms step_avg:96.36ms
step:656/1770 train_time:63213ms step_avg:96.36ms
step:657/1770 train_time:63313ms step_avg:96.37ms
step:658/1770 train_time:63411ms step_avg:96.37ms
step:659/1770 train_time:63511ms step_avg:96.38ms
step:660/1770 train_time:63613ms step_avg:96.38ms
step:661/1770 train_time:63715ms step_avg:96.39ms
step:662/1770 train_time:63816ms step_avg:96.40ms
step:663/1770 train_time:63916ms step_avg:96.40ms
step:664/1770 train_time:64016ms step_avg:96.41ms
step:665/1770 train_time:64117ms step_avg:96.42ms
step:666/1770 train_time:64217ms step_avg:96.42ms
step:667/1770 train_time:64318ms step_avg:96.43ms
step:668/1770 train_time:64418ms step_avg:96.43ms
step:669/1770 train_time:64519ms step_avg:96.44ms
step:670/1770 train_time:64620ms step_avg:96.45ms
step:671/1770 train_time:64720ms step_avg:96.45ms
step:672/1770 train_time:64821ms step_avg:96.46ms
step:673/1770 train_time:64921ms step_avg:96.46ms
step:674/1770 train_time:65020ms step_avg:96.47ms
step:675/1770 train_time:65120ms step_avg:96.47ms
step:676/1770 train_time:65220ms step_avg:96.48ms
step:677/1770 train_time:65320ms step_avg:96.48ms
step:678/1770 train_time:65421ms step_avg:96.49ms
step:679/1770 train_time:65521ms step_avg:96.50ms
step:680/1770 train_time:65621ms step_avg:96.50ms
step:681/1770 train_time:65721ms step_avg:96.51ms
step:682/1770 train_time:65821ms step_avg:96.51ms
step:683/1770 train_time:65920ms step_avg:96.52ms
step:684/1770 train_time:66020ms step_avg:96.52ms
step:685/1770 train_time:66121ms step_avg:96.53ms
step:686/1770 train_time:66220ms step_avg:96.53ms
step:687/1770 train_time:66320ms step_avg:96.54ms
step:688/1770 train_time:66420ms step_avg:96.54ms
step:689/1770 train_time:66521ms step_avg:96.55ms
step:690/1770 train_time:66621ms step_avg:96.55ms
step:691/1770 train_time:66721ms step_avg:96.56ms
step:692/1770 train_time:66821ms step_avg:96.56ms
step:693/1770 train_time:66920ms step_avg:96.57ms
step:694/1770 train_time:67020ms step_avg:96.57ms
step:695/1770 train_time:67120ms step_avg:96.58ms
step:696/1770 train_time:67222ms step_avg:96.58ms
step:697/1770 train_time:67321ms step_avg:96.59ms
step:698/1770 train_time:67421ms step_avg:96.59ms
step:699/1770 train_time:67521ms step_avg:96.60ms
step:700/1770 train_time:67621ms step_avg:96.60ms
step:701/1770 train_time:67721ms step_avg:96.61ms
step:702/1770 train_time:67820ms step_avg:96.61ms
step:703/1770 train_time:67921ms step_avg:96.62ms
step:704/1770 train_time:68020ms step_avg:96.62ms
step:705/1770 train_time:68121ms step_avg:96.63ms
step:706/1770 train_time:68221ms step_avg:96.63ms
step:707/1770 train_time:68320ms step_avg:96.63ms
step:708/1770 train_time:68420ms step_avg:96.64ms
step:709/1770 train_time:68520ms step_avg:96.64ms
step:710/1770 train_time:68621ms step_avg:96.65ms
step:711/1770 train_time:68721ms step_avg:96.65ms
step:712/1770 train_time:68821ms step_avg:96.66ms
step:713/1770 train_time:68922ms step_avg:96.66ms
step:714/1770 train_time:69021ms step_avg:96.67ms
step:715/1770 train_time:69121ms step_avg:96.67ms
step:716/1770 train_time:69221ms step_avg:96.68ms
step:717/1770 train_time:69320ms step_avg:96.68ms
step:718/1770 train_time:69421ms step_avg:96.69ms
step:719/1770 train_time:69520ms step_avg:96.69ms
step:720/1770 train_time:69620ms step_avg:96.69ms
step:721/1770 train_time:69720ms step_avg:96.70ms
step:722/1770 train_time:69820ms step_avg:96.70ms
step:723/1770 train_time:69920ms step_avg:96.71ms
step:724/1770 train_time:70021ms step_avg:96.71ms
step:725/1770 train_time:70121ms step_avg:96.72ms
step:726/1770 train_time:70221ms step_avg:96.72ms
step:727/1770 train_time:70321ms step_avg:96.73ms
step:728/1770 train_time:70420ms step_avg:96.73ms
step:729/1770 train_time:70520ms step_avg:96.74ms
step:730/1770 train_time:70621ms step_avg:96.74ms
step:731/1770 train_time:70721ms step_avg:96.74ms
step:732/1770 train_time:70821ms step_avg:96.75ms
step:733/1770 train_time:70921ms step_avg:96.75ms
step:734/1770 train_time:71021ms step_avg:96.76ms
step:735/1770 train_time:71121ms step_avg:96.76ms
step:736/1770 train_time:71221ms step_avg:96.77ms
step:737/1770 train_time:71321ms step_avg:96.77ms
step:738/1770 train_time:71420ms step_avg:96.78ms
step:739/1770 train_time:71520ms step_avg:96.78ms
step:740/1770 train_time:71620ms step_avg:96.78ms
step:741/1770 train_time:71720ms step_avg:96.79ms
step:742/1770 train_time:71820ms step_avg:96.79ms
step:743/1770 train_time:71919ms step_avg:96.80ms
step:744/1770 train_time:72020ms step_avg:96.80ms
step:745/1770 train_time:72120ms step_avg:96.81ms
step:746/1770 train_time:72221ms step_avg:96.81ms
step:747/1770 train_time:72320ms step_avg:96.81ms
step:748/1770 train_time:72420ms step_avg:96.82ms
step:749/1770 train_time:72520ms step_avg:96.82ms
step:750/1770 train_time:72620ms step_avg:96.83ms
step:750/1770 val_loss:3.5996 train_time:72907ms step_avg:97.21ms
step:751/1770 train_time:72916ms step_avg:97.09ms
step:752/1770 train_time:72925ms step_avg:96.97ms
step:753/1770 train_time:72934ms step_avg:96.86ms
step:754/1770 train_time:73027ms step_avg:96.85ms
step:755/1770 train_time:73126ms step_avg:96.86ms
step:756/1770 train_time:73225ms step_avg:96.86ms
step:757/1770 train_time:73325ms step_avg:96.86ms
step:758/1770 train_time:73425ms step_avg:96.87ms
step:759/1770 train_time:73525ms step_avg:96.87ms
step:760/1770 train_time:73625ms step_avg:96.87ms
step:761/1770 train_time:73724ms step_avg:96.88ms
step:762/1770 train_time:73825ms step_avg:96.88ms
step:763/1770 train_time:73928ms step_avg:96.89ms
step:764/1770 train_time:74030ms step_avg:96.90ms
step:765/1770 train_time:74130ms step_avg:96.90ms
step:766/1770 train_time:74230ms step_avg:96.91ms
step:767/1770 train_time:74330ms step_avg:96.91ms
step:768/1770 train_time:74429ms step_avg:96.91ms
step:769/1770 train_time:74528ms step_avg:96.92ms
step:770/1770 train_time:74627ms step_avg:96.92ms
step:771/1770 train_time:74726ms step_avg:96.92ms
step:772/1770 train_time:74826ms step_avg:96.92ms
step:773/1770 train_time:74928ms step_avg:96.93ms
step:774/1770 train_time:75029ms step_avg:96.94ms
step:775/1770 train_time:75129ms step_avg:96.94ms
step:776/1770 train_time:75228ms step_avg:96.94ms
step:777/1770 train_time:75328ms step_avg:96.95ms
step:778/1770 train_time:75428ms step_avg:96.95ms
step:779/1770 train_time:75527ms step_avg:96.95ms
step:780/1770 train_time:75627ms step_avg:96.96ms
step:781/1770 train_time:75726ms step_avg:96.96ms
step:782/1770 train_time:75826ms step_avg:96.96ms
step:783/1770 train_time:75927ms step_avg:96.97ms
step:784/1770 train_time:76027ms step_avg:96.97ms
step:785/1770 train_time:76128ms step_avg:96.98ms
step:786/1770 train_time:76228ms step_avg:96.98ms
step:787/1770 train_time:76328ms step_avg:96.99ms
step:788/1770 train_time:76428ms step_avg:96.99ms
step:789/1770 train_time:76527ms step_avg:96.99ms
step:790/1770 train_time:76627ms step_avg:97.00ms
step:791/1770 train_time:76727ms step_avg:97.00ms
step:792/1770 train_time:76827ms step_avg:97.00ms
step:793/1770 train_time:76928ms step_avg:97.01ms
step:794/1770 train_time:77029ms step_avg:97.01ms
step:795/1770 train_time:77129ms step_avg:97.02ms
step:796/1770 train_time:77230ms step_avg:97.02ms
step:797/1770 train_time:77330ms step_avg:97.03ms
step:798/1770 train_time:77429ms step_avg:97.03ms
step:799/1770 train_time:77529ms step_avg:97.03ms
step:800/1770 train_time:77628ms step_avg:97.04ms
step:801/1770 train_time:77728ms step_avg:97.04ms
step:802/1770 train_time:77828ms step_avg:97.04ms
step:803/1770 train_time:77929ms step_avg:97.05ms
step:804/1770 train_time:78030ms step_avg:97.05ms
step:805/1770 train_time:78131ms step_avg:97.06ms
step:806/1770 train_time:78230ms step_avg:97.06ms
step:807/1770 train_time:78331ms step_avg:97.06ms
step:808/1770 train_time:78430ms step_avg:97.07ms
step:809/1770 train_time:78530ms step_avg:97.07ms
step:810/1770 train_time:78630ms step_avg:97.07ms
step:811/1770 train_time:78729ms step_avg:97.08ms
step:812/1770 train_time:78830ms step_avg:97.08ms
step:813/1770 train_time:78929ms step_avg:97.08ms
step:814/1770 train_time:79029ms step_avg:97.09ms
step:815/1770 train_time:79129ms step_avg:97.09ms
step:816/1770 train_time:79228ms step_avg:97.09ms
step:817/1770 train_time:79329ms step_avg:97.10ms
step:818/1770 train_time:79429ms step_avg:97.10ms
step:819/1770 train_time:79529ms step_avg:97.11ms
step:820/1770 train_time:79630ms step_avg:97.11ms
step:821/1770 train_time:79730ms step_avg:97.11ms
step:822/1770 train_time:79829ms step_avg:97.12ms
step:823/1770 train_time:79928ms step_avg:97.12ms
step:824/1770 train_time:80027ms step_avg:97.12ms
step:825/1770 train_time:80127ms step_avg:97.12ms
step:826/1770 train_time:80226ms step_avg:97.13ms
step:827/1770 train_time:80327ms step_avg:97.13ms
step:828/1770 train_time:80428ms step_avg:97.13ms
step:829/1770 train_time:80528ms step_avg:97.14ms
step:830/1770 train_time:80628ms step_avg:97.14ms
step:831/1770 train_time:80729ms step_avg:97.15ms
step:832/1770 train_time:80829ms step_avg:97.15ms
step:833/1770 train_time:80928ms step_avg:97.15ms
step:834/1770 train_time:81028ms step_avg:97.16ms
step:835/1770 train_time:81129ms step_avg:97.16ms
step:836/1770 train_time:81229ms step_avg:97.16ms
step:837/1770 train_time:81328ms step_avg:97.17ms
step:838/1770 train_time:81429ms step_avg:97.17ms
step:839/1770 train_time:81530ms step_avg:97.17ms
step:840/1770 train_time:81629ms step_avg:97.18ms
step:841/1770 train_time:81730ms step_avg:97.18ms
step:842/1770 train_time:81829ms step_avg:97.18ms
step:843/1770 train_time:81930ms step_avg:97.19ms
step:844/1770 train_time:82028ms step_avg:97.19ms
step:845/1770 train_time:82128ms step_avg:97.19ms
step:846/1770 train_time:82227ms step_avg:97.20ms
step:847/1770 train_time:82327ms step_avg:97.20ms
step:848/1770 train_time:82428ms step_avg:97.20ms
step:849/1770 train_time:82528ms step_avg:97.21ms
step:850/1770 train_time:82629ms step_avg:97.21ms
step:851/1770 train_time:82729ms step_avg:97.21ms
step:852/1770 train_time:82829ms step_avg:97.22ms
step:853/1770 train_time:82929ms step_avg:97.22ms
step:854/1770 train_time:83029ms step_avg:97.22ms
step:855/1770 train_time:83128ms step_avg:97.23ms
step:856/1770 train_time:83228ms step_avg:97.23ms
step:857/1770 train_time:83328ms step_avg:97.23ms
step:858/1770 train_time:83428ms step_avg:97.24ms
step:859/1770 train_time:83528ms step_avg:97.24ms
step:860/1770 train_time:83628ms step_avg:97.24ms
step:861/1770 train_time:83728ms step_avg:97.25ms
step:862/1770 train_time:83828ms step_avg:97.25ms
step:863/1770 train_time:83928ms step_avg:97.25ms
step:864/1770 train_time:84028ms step_avg:97.25ms
step:865/1770 train_time:84129ms step_avg:97.26ms
step:866/1770 train_time:84228ms step_avg:97.26ms
step:867/1770 train_time:84328ms step_avg:97.26ms
step:868/1770 train_time:84428ms step_avg:97.27ms
step:869/1770 train_time:84529ms step_avg:97.27ms
step:870/1770 train_time:84629ms step_avg:97.27ms
step:871/1770 train_time:84729ms step_avg:97.28ms
step:872/1770 train_time:84828ms step_avg:97.28ms
step:873/1770 train_time:84929ms step_avg:97.28ms
step:874/1770 train_time:85028ms step_avg:97.29ms
step:875/1770 train_time:85128ms step_avg:97.29ms
step:875/1770 val_loss:3.5489 train_time:85415ms step_avg:97.62ms
step:876/1770 train_time:85425ms step_avg:97.52ms
step:877/1770 train_time:85433ms step_avg:97.41ms
step:878/1770 train_time:85441ms step_avg:97.31ms
step:879/1770 train_time:85532ms step_avg:97.31ms
step:880/1770 train_time:85633ms step_avg:97.31ms
step:881/1770 train_time:85732ms step_avg:97.31ms
step:882/1770 train_time:85831ms step_avg:97.31ms
step:883/1770 train_time:85930ms step_avg:97.32ms
step:884/1770 train_time:86029ms step_avg:97.32ms
step:885/1770 train_time:86129ms step_avg:97.32ms
step:886/1770 train_time:86229ms step_avg:97.32ms
step:887/1770 train_time:86331ms step_avg:97.33ms
step:888/1770 train_time:86435ms step_avg:97.34ms
step:889/1770 train_time:86536ms step_avg:97.34ms
step:890/1770 train_time:86636ms step_avg:97.34ms
step:891/1770 train_time:86736ms step_avg:97.35ms
step:892/1770 train_time:86836ms step_avg:97.35ms
step:893/1770 train_time:86935ms step_avg:97.35ms
step:894/1770 train_time:87034ms step_avg:97.35ms
step:895/1770 train_time:87133ms step_avg:97.36ms
step:896/1770 train_time:87232ms step_avg:97.36ms
step:897/1770 train_time:87333ms step_avg:97.36ms
step:898/1770 train_time:87434ms step_avg:97.36ms
step:899/1770 train_time:87535ms step_avg:97.37ms
step:900/1770 train_time:87636ms step_avg:97.37ms
step:901/1770 train_time:87736ms step_avg:97.38ms
step:902/1770 train_time:87837ms step_avg:97.38ms
step:903/1770 train_time:87936ms step_avg:97.38ms
step:904/1770 train_time:88035ms step_avg:97.38ms
step:905/1770 train_time:88135ms step_avg:97.39ms
step:906/1770 train_time:88236ms step_avg:97.39ms
step:907/1770 train_time:88337ms step_avg:97.39ms
step:908/1770 train_time:88437ms step_avg:97.40ms
step:909/1770 train_time:88538ms step_avg:97.40ms
step:910/1770 train_time:88638ms step_avg:97.40ms
step:911/1770 train_time:88738ms step_avg:97.41ms
step:912/1770 train_time:88838ms step_avg:97.41ms
step:913/1770 train_time:88938ms step_avg:97.41ms
step:914/1770 train_time:89039ms step_avg:97.42ms
step:915/1770 train_time:89138ms step_avg:97.42ms
step:916/1770 train_time:89240ms step_avg:97.42ms
step:917/1770 train_time:89341ms step_avg:97.43ms
step:918/1770 train_time:89441ms step_avg:97.43ms
step:919/1770 train_time:89542ms step_avg:97.43ms
step:920/1770 train_time:89643ms step_avg:97.44ms
step:921/1770 train_time:89744ms step_avg:97.44ms
step:922/1770 train_time:89846ms step_avg:97.45ms
step:923/1770 train_time:89949ms step_avg:97.45ms
step:924/1770 train_time:90051ms step_avg:97.46ms
step:925/1770 train_time:90153ms step_avg:97.46ms
step:926/1770 train_time:90253ms step_avg:97.47ms
step:927/1770 train_time:90354ms step_avg:97.47ms
step:928/1770 train_time:90454ms step_avg:97.47ms
step:929/1770 train_time:90555ms step_avg:97.48ms
step:930/1770 train_time:90656ms step_avg:97.48ms
step:931/1770 train_time:90757ms step_avg:97.48ms
step:932/1770 train_time:90860ms step_avg:97.49ms
step:933/1770 train_time:90961ms step_avg:97.49ms
step:934/1770 train_time:91063ms step_avg:97.50ms
step:935/1770 train_time:91164ms step_avg:97.50ms
step:936/1770 train_time:91268ms step_avg:97.51ms
step:937/1770 train_time:91371ms step_avg:97.51ms
step:938/1770 train_time:91473ms step_avg:97.52ms
step:939/1770 train_time:91574ms step_avg:97.52ms
step:940/1770 train_time:91675ms step_avg:97.53ms
step:941/1770 train_time:91776ms step_avg:97.53ms
step:942/1770 train_time:91877ms step_avg:97.53ms
step:943/1770 train_time:91978ms step_avg:97.54ms
step:944/1770 train_time:92080ms step_avg:97.54ms
step:945/1770 train_time:92182ms step_avg:97.55ms
step:946/1770 train_time:92285ms step_avg:97.55ms
step:947/1770 train_time:92387ms step_avg:97.56ms
step:948/1770 train_time:92491ms step_avg:97.56ms
step:949/1770 train_time:92593ms step_avg:97.57ms
step:950/1770 train_time:92694ms step_avg:97.57ms
step:951/1770 train_time:92794ms step_avg:97.58ms
step:952/1770 train_time:92895ms step_avg:97.58ms
step:953/1770 train_time:92995ms step_avg:97.58ms
step:954/1770 train_time:93097ms step_avg:97.59ms
step:955/1770 train_time:93199ms step_avg:97.59ms
step:956/1770 train_time:93301ms step_avg:97.60ms
step:957/1770 train_time:93405ms step_avg:97.60ms
step:958/1770 train_time:93508ms step_avg:97.61ms
step:959/1770 train_time:93611ms step_avg:97.61ms
step:960/1770 train_time:93712ms step_avg:97.62ms
step:961/1770 train_time:93814ms step_avg:97.62ms
step:962/1770 train_time:93915ms step_avg:97.62ms
step:963/1770 train_time:94015ms step_avg:97.63ms
step:964/1770 train_time:94117ms step_avg:97.63ms
step:965/1770 train_time:94218ms step_avg:97.63ms
step:966/1770 train_time:94321ms step_avg:97.64ms
step:967/1770 train_time:94425ms step_avg:97.65ms
step:968/1770 train_time:94527ms step_avg:97.65ms
step:969/1770 train_time:94631ms step_avg:97.66ms
step:970/1770 train_time:94732ms step_avg:97.66ms
step:971/1770 train_time:94833ms step_avg:97.67ms
step:972/1770 train_time:94934ms step_avg:97.67ms
step:973/1770 train_time:95035ms step_avg:97.67ms
step:974/1770 train_time:95135ms step_avg:97.67ms
step:975/1770 train_time:95237ms step_avg:97.68ms
step:976/1770 train_time:95338ms step_avg:97.68ms
step:977/1770 train_time:95440ms step_avg:97.69ms
step:978/1770 train_time:95544ms step_avg:97.69ms
step:979/1770 train_time:95646ms step_avg:97.70ms
step:980/1770 train_time:95748ms step_avg:97.70ms
step:981/1770 train_time:95851ms step_avg:97.71ms
step:982/1770 train_time:95952ms step_avg:97.71ms
step:983/1770 train_time:96054ms step_avg:97.72ms
step:984/1770 train_time:96154ms step_avg:97.72ms
step:985/1770 train_time:96255ms step_avg:97.72ms
step:986/1770 train_time:96357ms step_avg:97.72ms
step:987/1770 train_time:96458ms step_avg:97.73ms
step:988/1770 train_time:96560ms step_avg:97.73ms
step:989/1770 train_time:96662ms step_avg:97.74ms
step:990/1770 train_time:96764ms step_avg:97.74ms
step:991/1770 train_time:96868ms step_avg:97.75ms
step:992/1770 train_time:96971ms step_avg:97.75ms
step:993/1770 train_time:97073ms step_avg:97.76ms
step:994/1770 train_time:97173ms step_avg:97.76ms
step:995/1770 train_time:97274ms step_avg:97.76ms
step:996/1770 train_time:97374ms step_avg:97.77ms
step:997/1770 train_time:97476ms step_avg:97.77ms
step:998/1770 train_time:97577ms step_avg:97.77ms
step:999/1770 train_time:97679ms step_avg:97.78ms
step:1000/1770 train_time:97781ms step_avg:97.78ms
step:1000/1770 val_loss:3.5120 train_time:98079ms step_avg:98.08ms
step:1001/1770 train_time:98088ms step_avg:97.99ms
step:1002/1770 train_time:98097ms step_avg:97.90ms
step:1003/1770 train_time:98105ms step_avg:97.81ms
step:1004/1770 train_time:98198ms step_avg:97.81ms
step:1005/1770 train_time:98301ms step_avg:97.81ms
step:1006/1770 train_time:98405ms step_avg:97.82ms
step:1007/1770 train_time:98507ms step_avg:97.82ms
step:1008/1770 train_time:98608ms step_avg:97.83ms
step:1009/1770 train_time:98709ms step_avg:97.83ms
step:1010/1770 train_time:98810ms step_avg:97.83ms
step:1011/1770 train_time:98910ms step_avg:97.83ms
step:1012/1770 train_time:99011ms step_avg:97.84ms
step:1013/1770 train_time:99114ms step_avg:97.84ms
step:1014/1770 train_time:99216ms step_avg:97.85ms
step:1015/1770 train_time:99318ms step_avg:97.85ms
step:1016/1770 train_time:99420ms step_avg:97.85ms
step:1017/1770 train_time:99522ms step_avg:97.86ms
step:1018/1770 train_time:99626ms step_avg:97.86ms
step:1019/1770 train_time:99727ms step_avg:97.87ms
step:1020/1770 train_time:99828ms step_avg:97.87ms
step:1021/1770 train_time:99929ms step_avg:97.87ms
step:1022/1770 train_time:100030ms step_avg:97.88ms
step:1023/1770 train_time:100131ms step_avg:97.88ms
step:1024/1770 train_time:100233ms step_avg:97.88ms
step:1025/1770 train_time:100334ms step_avg:97.89ms
step:1026/1770 train_time:100435ms step_avg:97.89ms
step:1027/1770 train_time:100538ms step_avg:97.89ms
step:1028/1770 train_time:100640ms step_avg:97.90ms
step:1029/1770 train_time:100742ms step_avg:97.90ms
step:1030/1770 train_time:100844ms step_avg:97.91ms
step:1031/1770 train_time:100948ms step_avg:97.91ms
step:1032/1770 train_time:101049ms step_avg:97.92ms
step:1033/1770 train_time:101150ms step_avg:97.92ms
step:1034/1770 train_time:101251ms step_avg:97.92ms
step:1035/1770 train_time:101352ms step_avg:97.93ms
step:1036/1770 train_time:101453ms step_avg:97.93ms
step:1037/1770 train_time:101555ms step_avg:97.93ms
step:1038/1770 train_time:101656ms step_avg:97.93ms
step:1039/1770 train_time:101758ms step_avg:97.94ms
step:1040/1770 train_time:101862ms step_avg:97.94ms
step:1041/1770 train_time:101965ms step_avg:97.95ms
step:1042/1770 train_time:102067ms step_avg:97.95ms
step:1043/1770 train_time:102169ms step_avg:97.96ms
step:1044/1770 train_time:102270ms step_avg:97.96ms
step:1045/1770 train_time:102371ms step_avg:97.96ms
step:1046/1770 train_time:102472ms step_avg:97.97ms
step:1047/1770 train_time:102572ms step_avg:97.97ms
step:1048/1770 train_time:102673ms step_avg:97.97ms
step:1049/1770 train_time:102775ms step_avg:97.97ms
step:1050/1770 train_time:102877ms step_avg:97.98ms
step:1051/1770 train_time:102979ms step_avg:97.98ms
step:1052/1770 train_time:103082ms step_avg:97.99ms
step:1053/1770 train_time:103186ms step_avg:97.99ms
step:1054/1770 train_time:103288ms step_avg:98.00ms
step:1055/1770 train_time:103390ms step_avg:98.00ms
step:1056/1770 train_time:103490ms step_avg:98.00ms
step:1057/1770 train_time:103591ms step_avg:98.00ms
step:1058/1770 train_time:103691ms step_avg:98.01ms
step:1059/1770 train_time:103792ms step_avg:98.01ms
step:1060/1770 train_time:103894ms step_avg:98.01ms
step:1061/1770 train_time:103996ms step_avg:98.02ms
step:1062/1770 train_time:104098ms step_avg:98.02ms
step:1063/1770 train_time:104202ms step_avg:98.03ms
step:1064/1770 train_time:104306ms step_avg:98.03ms
step:1065/1770 train_time:104409ms step_avg:98.04ms
step:1066/1770 train_time:104510ms step_avg:98.04ms
step:1067/1770 train_time:104612ms step_avg:98.04ms
step:1068/1770 train_time:104712ms step_avg:98.04ms
step:1069/1770 train_time:104812ms step_avg:98.05ms
step:1070/1770 train_time:104914ms step_avg:98.05ms
step:1071/1770 train_time:105015ms step_avg:98.05ms
step:1072/1770 train_time:105117ms step_avg:98.06ms
step:1073/1770 train_time:105219ms step_avg:98.06ms
step:1074/1770 train_time:105323ms step_avg:98.07ms
step:1075/1770 train_time:105426ms step_avg:98.07ms
step:1076/1770 train_time:105528ms step_avg:98.07ms
step:1077/1770 train_time:105629ms step_avg:98.08ms
step:1078/1770 train_time:105731ms step_avg:98.08ms
step:1079/1770 train_time:105832ms step_avg:98.08ms
step:1080/1770 train_time:105932ms step_avg:98.09ms
step:1081/1770 train_time:106033ms step_avg:98.09ms
step:1082/1770 train_time:106136ms step_avg:98.09ms
step:1083/1770 train_time:106238ms step_avg:98.10ms
step:1084/1770 train_time:106340ms step_avg:98.10ms
step:1085/1770 train_time:106444ms step_avg:98.10ms
step:1086/1770 train_time:106546ms step_avg:98.11ms
step:1087/1770 train_time:106649ms step_avg:98.11ms
step:1088/1770 train_time:106751ms step_avg:98.12ms
step:1089/1770 train_time:106852ms step_avg:98.12ms
step:1090/1770 train_time:106952ms step_avg:98.12ms
step:1091/1770 train_time:107053ms step_avg:98.12ms
step:1092/1770 train_time:107155ms step_avg:98.13ms
step:1093/1770 train_time:107257ms step_avg:98.13ms
step:1094/1770 train_time:107360ms step_avg:98.14ms
step:1095/1770 train_time:107462ms step_avg:98.14ms
step:1096/1770 train_time:107565ms step_avg:98.14ms
step:1097/1770 train_time:107669ms step_avg:98.15ms
step:1098/1770 train_time:107770ms step_avg:98.15ms
step:1099/1770 train_time:107873ms step_avg:98.16ms
step:1100/1770 train_time:107975ms step_avg:98.16ms
step:1101/1770 train_time:108075ms step_avg:98.16ms
step:1102/1770 train_time:108176ms step_avg:98.16ms
step:1103/1770 train_time:108278ms step_avg:98.17ms
step:1104/1770 train_time:108381ms step_avg:98.17ms
step:1105/1770 train_time:108484ms step_avg:98.18ms
step:1106/1770 train_time:108589ms step_avg:98.18ms
step:1107/1770 train_time:108690ms step_avg:98.18ms
step:1108/1770 train_time:108792ms step_avg:98.19ms
step:1109/1770 train_time:108894ms step_avg:98.19ms
step:1110/1770 train_time:108995ms step_avg:98.19ms
step:1111/1770 train_time:109096ms step_avg:98.20ms
step:1112/1770 train_time:109198ms step_avg:98.20ms
step:1113/1770 train_time:109299ms step_avg:98.20ms
step:1114/1770 train_time:109402ms step_avg:98.21ms
step:1115/1770 train_time:109505ms step_avg:98.21ms
step:1116/1770 train_time:109608ms step_avg:98.22ms
step:1117/1770 train_time:109711ms step_avg:98.22ms
step:1118/1770 train_time:109811ms step_avg:98.22ms
step:1119/1770 train_time:109914ms step_avg:98.23ms
step:1120/1770 train_time:110015ms step_avg:98.23ms
step:1121/1770 train_time:110116ms step_avg:98.23ms
step:1122/1770 train_time:110218ms step_avg:98.23ms
step:1123/1770 train_time:110319ms step_avg:98.24ms
step:1124/1770 train_time:110423ms step_avg:98.24ms
step:1125/1770 train_time:110526ms step_avg:98.24ms
step:1125/1770 val_loss:3.4715 train_time:110820ms step_avg:98.51ms
step:1126/1770 train_time:110829ms step_avg:98.43ms
step:1127/1770 train_time:110838ms step_avg:98.35ms
step:1128/1770 train_time:110846ms step_avg:98.27ms
step:1129/1770 train_time:110937ms step_avg:98.26ms
step:1130/1770 train_time:111039ms step_avg:98.26ms
step:1131/1770 train_time:111139ms step_avg:98.27ms
step:1132/1770 train_time:111240ms step_avg:98.27ms
step:1133/1770 train_time:111340ms step_avg:98.27ms
step:1134/1770 train_time:111440ms step_avg:98.27ms
step:1135/1770 train_time:111540ms step_avg:98.27ms
step:1136/1770 train_time:111641ms step_avg:98.28ms
step:1137/1770 train_time:111745ms step_avg:98.28ms
step:1138/1770 train_time:111850ms step_avg:98.29ms
step:1139/1770 train_time:111955ms step_avg:98.29ms
step:1140/1770 train_time:112056ms step_avg:98.29ms
step:1141/1770 train_time:112160ms step_avg:98.30ms
step:1142/1770 train_time:112261ms step_avg:98.30ms
step:1143/1770 train_time:112361ms step_avg:98.30ms
step:1144/1770 train_time:112462ms step_avg:98.31ms
step:1145/1770 train_time:112562ms step_avg:98.31ms
step:1146/1770 train_time:112663ms step_avg:98.31ms
step:1147/1770 train_time:112764ms step_avg:98.31ms
step:1148/1770 train_time:112869ms step_avg:98.32ms
step:1149/1770 train_time:112973ms step_avg:98.32ms
step:1150/1770 train_time:113077ms step_avg:98.33ms
step:1151/1770 train_time:113179ms step_avg:98.33ms
step:1152/1770 train_time:113281ms step_avg:98.33ms
step:1153/1770 train_time:113382ms step_avg:98.34ms
step:1154/1770 train_time:113483ms step_avg:98.34ms
step:1155/1770 train_time:113584ms step_avg:98.34ms
step:1156/1770 train_time:113686ms step_avg:98.34ms
step:1157/1770 train_time:113789ms step_avg:98.35ms
step:1158/1770 train_time:113891ms step_avg:98.35ms
step:1159/1770 train_time:113994ms step_avg:98.36ms
step:1160/1770 train_time:114097ms step_avg:98.36ms
step:1161/1770 train_time:114199ms step_avg:98.36ms
step:1162/1770 train_time:114300ms step_avg:98.36ms
step:1163/1770 train_time:114400ms step_avg:98.37ms
step:1164/1770 train_time:114502ms step_avg:98.37ms
step:1165/1770 train_time:114603ms step_avg:98.37ms
step:1166/1770 train_time:114703ms step_avg:98.37ms
step:1167/1770 train_time:114805ms step_avg:98.38ms
step:1168/1770 train_time:114907ms step_avg:98.38ms
step:1169/1770 train_time:115010ms step_avg:98.38ms
step:1170/1770 train_time:115114ms step_avg:98.39ms
step:1171/1770 train_time:115218ms step_avg:98.39ms
step:1172/1770 train_time:115320ms step_avg:98.40ms
step:1173/1770 train_time:115420ms step_avg:98.40ms
step:1174/1770 train_time:115521ms step_avg:98.40ms
step:1175/1770 train_time:115622ms step_avg:98.40ms
step:1176/1770 train_time:115723ms step_avg:98.40ms
step:1177/1770 train_time:115825ms step_avg:98.41ms
step:1178/1770 train_time:115926ms step_avg:98.41ms
step:1179/1770 train_time:116029ms step_avg:98.41ms
step:1180/1770 train_time:116131ms step_avg:98.42ms
step:1181/1770 train_time:116234ms step_avg:98.42ms
step:1182/1770 train_time:116337ms step_avg:98.42ms
step:1183/1770 train_time:116439ms step_avg:98.43ms
step:1184/1770 train_time:116540ms step_avg:98.43ms
step:1185/1770 train_time:116642ms step_avg:98.43ms
step:1186/1770 train_time:116744ms step_avg:98.43ms
step:1187/1770 train_time:116846ms step_avg:98.44ms
step:1188/1770 train_time:116950ms step_avg:98.44ms
step:1189/1770 train_time:117054ms step_avg:98.45ms
step:1190/1770 train_time:117158ms step_avg:98.45ms
step:1191/1770 train_time:117261ms step_avg:98.46ms
step:1192/1770 train_time:117365ms step_avg:98.46ms
step:1193/1770 train_time:117469ms step_avg:98.46ms
step:1194/1770 train_time:117572ms step_avg:98.47ms
step:1195/1770 train_time:117675ms step_avg:98.47ms
step:1196/1770 train_time:117778ms step_avg:98.48ms
step:1197/1770 train_time:117881ms step_avg:98.48ms
step:1198/1770 train_time:117983ms step_avg:98.48ms
step:1199/1770 train_time:118087ms step_avg:98.49ms
step:1200/1770 train_time:118189ms step_avg:98.49ms
step:1201/1770 train_time:118293ms step_avg:98.50ms
step:1202/1770 train_time:118397ms step_avg:98.50ms
step:1203/1770 train_time:118501ms step_avg:98.50ms
step:1204/1770 train_time:118603ms step_avg:98.51ms
step:1205/1770 train_time:118705ms step_avg:98.51ms
step:1206/1770 train_time:118808ms step_avg:98.51ms
step:1207/1770 train_time:118910ms step_avg:98.52ms
step:1208/1770 train_time:119017ms step_avg:98.52ms
step:1209/1770 train_time:119121ms step_avg:98.53ms
step:1210/1770 train_time:119223ms step_avg:98.53ms
step:1211/1770 train_time:119326ms step_avg:98.54ms
step:1212/1770 train_time:119429ms step_avg:98.54ms
step:1213/1770 train_time:119533ms step_avg:98.54ms
step:1214/1770 train_time:119638ms step_avg:98.55ms
step:1215/1770 train_time:119740ms step_avg:98.55ms
step:1216/1770 train_time:119843ms step_avg:98.56ms
step:1217/1770 train_time:119947ms step_avg:98.56ms
step:1218/1770 train_time:120053ms step_avg:98.57ms
step:1219/1770 train_time:120156ms step_avg:98.57ms
step:1220/1770 train_time:120258ms step_avg:98.57ms
step:1221/1770 train_time:120361ms step_avg:98.58ms
step:1222/1770 train_time:120463ms step_avg:98.58ms
step:1223/1770 train_time:120565ms step_avg:98.58ms
step:1224/1770 train_time:120669ms step_avg:98.59ms
step:1225/1770 train_time:120772ms step_avg:98.59ms
step:1226/1770 train_time:120877ms step_avg:98.59ms
step:1227/1770 train_time:120981ms step_avg:98.60ms
step:1228/1770 train_time:121083ms step_avg:98.60ms
step:1229/1770 train_time:121186ms step_avg:98.61ms
step:1230/1770 train_time:121290ms step_avg:98.61ms
step:1231/1770 train_time:121393ms step_avg:98.61ms
step:1232/1770 train_time:121497ms step_avg:98.62ms
step:1233/1770 train_time:121599ms step_avg:98.62ms
step:1234/1770 train_time:121701ms step_avg:98.62ms
step:1235/1770 train_time:121803ms step_avg:98.63ms
step:1236/1770 train_time:121906ms step_avg:98.63ms
step:1237/1770 train_time:122010ms step_avg:98.63ms
step:1238/1770 train_time:122115ms step_avg:98.64ms
step:1239/1770 train_time:122218ms step_avg:98.64ms
step:1240/1770 train_time:122319ms step_avg:98.64ms
step:1241/1770 train_time:122421ms step_avg:98.65ms
step:1242/1770 train_time:122523ms step_avg:98.65ms
step:1243/1770 train_time:122627ms step_avg:98.65ms
step:1244/1770 train_time:122731ms step_avg:98.66ms
step:1245/1770 train_time:122834ms step_avg:98.66ms
step:1246/1770 train_time:122939ms step_avg:98.67ms
step:1247/1770 train_time:123041ms step_avg:98.67ms
step:1248/1770 train_time:123145ms step_avg:98.67ms
step:1249/1770 train_time:123247ms step_avg:98.68ms
step:1250/1770 train_time:123350ms step_avg:98.68ms
step:1250/1770 val_loss:3.4239 train_time:123648ms step_avg:98.92ms
step:1251/1770 train_time:123658ms step_avg:98.85ms
step:1252/1770 train_time:123667ms step_avg:98.78ms
step:1253/1770 train_time:123675ms step_avg:98.70ms
step:1254/1770 train_time:123766ms step_avg:98.70ms
step:1255/1770 train_time:123868ms step_avg:98.70ms
step:1256/1770 train_time:123970ms step_avg:98.70ms
step:1257/1770 train_time:124074ms step_avg:98.71ms
step:1258/1770 train_time:124177ms step_avg:98.71ms
step:1259/1770 train_time:124280ms step_avg:98.71ms
step:1260/1770 train_time:124383ms step_avg:98.72ms
step:1261/1770 train_time:124485ms step_avg:98.72ms
step:1262/1770 train_time:124592ms step_avg:98.73ms
step:1263/1770 train_time:124699ms step_avg:98.73ms
step:1264/1770 train_time:124802ms step_avg:98.74ms
step:1265/1770 train_time:124904ms step_avg:98.74ms
step:1266/1770 train_time:125007ms step_avg:98.74ms
step:1267/1770 train_time:125109ms step_avg:98.74ms
step:1268/1770 train_time:125212ms step_avg:98.75ms
step:1269/1770 train_time:125315ms step_avg:98.75ms
step:1270/1770 train_time:125419ms step_avg:98.76ms
step:1271/1770 train_time:125522ms step_avg:98.76ms
step:1272/1770 train_time:125625ms step_avg:98.76ms
step:1273/1770 train_time:125729ms step_avg:98.77ms
step:1274/1770 train_time:125833ms step_avg:98.77ms
step:1275/1770 train_time:125938ms step_avg:98.78ms
step:1276/1770 train_time:126040ms step_avg:98.78ms
step:1277/1770 train_time:126142ms step_avg:98.78ms
step:1278/1770 train_time:126244ms step_avg:98.78ms
step:1279/1770 train_time:126346ms step_avg:98.79ms
step:1280/1770 train_time:126450ms step_avg:98.79ms
step:1281/1770 train_time:126555ms step_avg:98.79ms
step:1282/1770 train_time:126659ms step_avg:98.80ms
step:1283/1770 train_time:126762ms step_avg:98.80ms
step:1284/1770 train_time:126866ms step_avg:98.81ms
step:1285/1770 train_time:126969ms step_avg:98.81ms
step:1286/1770 train_time:127072ms step_avg:98.81ms
step:1287/1770 train_time:127175ms step_avg:98.82ms
step:1288/1770 train_time:127279ms step_avg:98.82ms
step:1289/1770 train_time:127382ms step_avg:98.82ms
step:1290/1770 train_time:127483ms step_avg:98.82ms
step:1291/1770 train_time:127586ms step_avg:98.83ms
step:1292/1770 train_time:127690ms step_avg:98.83ms
step:1293/1770 train_time:127794ms step_avg:98.84ms
step:1294/1770 train_time:127899ms step_avg:98.84ms
step:1295/1770 train_time:128002ms step_avg:98.84ms
step:1296/1770 train_time:128104ms step_avg:98.85ms
step:1297/1770 train_time:128207ms step_avg:98.85ms
step:1298/1770 train_time:128310ms step_avg:98.85ms
step:1299/1770 train_time:128412ms step_avg:98.85ms
step:1300/1770 train_time:128516ms step_avg:98.86ms
step:1301/1770 train_time:128620ms step_avg:98.86ms
step:1302/1770 train_time:128723ms step_avg:98.87ms
step:1303/1770 train_time:128826ms step_avg:98.87ms
step:1304/1770 train_time:128930ms step_avg:98.87ms
step:1305/1770 train_time:129033ms step_avg:98.88ms
step:1306/1770 train_time:129136ms step_avg:98.88ms
step:1307/1770 train_time:129240ms step_avg:98.88ms
step:1308/1770 train_time:129341ms step_avg:98.88ms
step:1309/1770 train_time:129443ms step_avg:98.89ms
step:1310/1770 train_time:129545ms step_avg:98.89ms
step:1311/1770 train_time:129648ms step_avg:98.89ms
step:1312/1770 train_time:129751ms step_avg:98.90ms
step:1313/1770 train_time:129856ms step_avg:98.90ms
step:1314/1770 train_time:129961ms step_avg:98.90ms
step:1315/1770 train_time:130063ms step_avg:98.91ms
step:1316/1770 train_time:130166ms step_avg:98.91ms
step:1317/1770 train_time:130268ms step_avg:98.91ms
step:1318/1770 train_time:130371ms step_avg:98.92ms
step:1319/1770 train_time:130475ms step_avg:98.92ms
step:1320/1770 train_time:130580ms step_avg:98.92ms
step:1321/1770 train_time:130681ms step_avg:98.93ms
step:1322/1770 train_time:130785ms step_avg:98.93ms
step:1323/1770 train_time:130889ms step_avg:98.93ms
step:1324/1770 train_time:130994ms step_avg:98.94ms
step:1325/1770 train_time:131098ms step_avg:98.94ms
step:1326/1770 train_time:131200ms step_avg:98.94ms
step:1327/1770 train_time:131303ms step_avg:98.95ms
step:1328/1770 train_time:131405ms step_avg:98.95ms
step:1329/1770 train_time:131512ms step_avg:98.96ms
step:1330/1770 train_time:131614ms step_avg:98.96ms
step:1331/1770 train_time:131718ms step_avg:98.96ms
step:1332/1770 train_time:131822ms step_avg:98.97ms
step:1333/1770 train_time:131924ms step_avg:98.97ms
step:1334/1770 train_time:132028ms step_avg:98.97ms
step:1335/1770 train_time:132131ms step_avg:98.97ms
step:1336/1770 train_time:132234ms step_avg:98.98ms
step:1337/1770 train_time:132338ms step_avg:98.98ms
step:1338/1770 train_time:132441ms step_avg:98.98ms
step:1339/1770 train_time:132543ms step_avg:98.99ms
step:1340/1770 train_time:132646ms step_avg:98.99ms
step:1341/1770 train_time:132749ms step_avg:98.99ms
step:1342/1770 train_time:132852ms step_avg:99.00ms
step:1343/1770 train_time:132956ms step_avg:99.00ms
step:1344/1770 train_time:133060ms step_avg:99.00ms
step:1345/1770 train_time:133164ms step_avg:99.01ms
step:1346/1770 train_time:133267ms step_avg:99.01ms
step:1347/1770 train_time:133369ms step_avg:99.01ms
step:1348/1770 train_time:133472ms step_avg:99.01ms
step:1349/1770 train_time:133575ms step_avg:99.02ms
step:1350/1770 train_time:133679ms step_avg:99.02ms
step:1351/1770 train_time:133780ms step_avg:99.02ms
step:1352/1770 train_time:133884ms step_avg:99.03ms
step:1353/1770 train_time:133987ms step_avg:99.03ms
step:1354/1770 train_time:134090ms step_avg:99.03ms
step:1355/1770 train_time:134194ms step_avg:99.04ms
step:1356/1770 train_time:134298ms step_avg:99.04ms
step:1357/1770 train_time:134401ms step_avg:99.04ms
step:1358/1770 train_time:134503ms step_avg:99.04ms
step:1359/1770 train_time:134606ms step_avg:99.05ms
step:1360/1770 train_time:134709ms step_avg:99.05ms
step:1361/1770 train_time:134813ms step_avg:99.05ms
step:1362/1770 train_time:134918ms step_avg:99.06ms
step:1363/1770 train_time:135022ms step_avg:99.06ms
step:1364/1770 train_time:135125ms step_avg:99.06ms
step:1365/1770 train_time:135228ms step_avg:99.07ms
step:1366/1770 train_time:135331ms step_avg:99.07ms
step:1367/1770 train_time:135434ms step_avg:99.07ms
step:1368/1770 train_time:135538ms step_avg:99.08ms
step:1369/1770 train_time:135641ms step_avg:99.08ms
step:1370/1770 train_time:135743ms step_avg:99.08ms
step:1371/1770 train_time:135847ms step_avg:99.09ms
step:1372/1770 train_time:135951ms step_avg:99.09ms
step:1373/1770 train_time:136055ms step_avg:99.09ms
step:1374/1770 train_time:136161ms step_avg:99.10ms
step:1375/1770 train_time:136264ms step_avg:99.10ms
step:1375/1770 val_loss:3.3804 train_time:136563ms step_avg:99.32ms
step:1376/1770 train_time:136573ms step_avg:99.25ms
step:1377/1770 train_time:136581ms step_avg:99.19ms
step:1378/1770 train_time:136590ms step_avg:99.12ms
step:1379/1770 train_time:136684ms step_avg:99.12ms
step:1380/1770 train_time:136785ms step_avg:99.12ms
step:1381/1770 train_time:136887ms step_avg:99.12ms
step:1382/1770 train_time:136989ms step_avg:99.12ms
step:1383/1770 train_time:137091ms step_avg:99.13ms
step:1384/1770 train_time:137195ms step_avg:99.13ms
step:1385/1770 train_time:137297ms step_avg:99.13ms
step:1386/1770 train_time:137400ms step_avg:99.13ms
step:1387/1770 train_time:137507ms step_avg:99.14ms
step:1388/1770 train_time:137614ms step_avg:99.15ms
step:1389/1770 train_time:137718ms step_avg:99.15ms
step:1390/1770 train_time:137821ms step_avg:99.15ms
step:1391/1770 train_time:137923ms step_avg:99.15ms
step:1392/1770 train_time:138024ms step_avg:99.16ms
step:1393/1770 train_time:138127ms step_avg:99.16ms
step:1394/1770 train_time:138229ms step_avg:99.16ms
step:1395/1770 train_time:138333ms step_avg:99.16ms
step:1396/1770 train_time:138438ms step_avg:99.17ms
step:1397/1770 train_time:138542ms step_avg:99.17ms
step:1398/1770 train_time:138648ms step_avg:99.18ms
step:1399/1770 train_time:138753ms step_avg:99.18ms
step:1400/1770 train_time:138857ms step_avg:99.18ms
step:1401/1770 train_time:138960ms step_avg:99.19ms
step:1402/1770 train_time:139061ms step_avg:99.19ms
step:1403/1770 train_time:139163ms step_avg:99.19ms
step:1404/1770 train_time:139265ms step_avg:99.19ms
step:1405/1770 train_time:139369ms step_avg:99.19ms
step:1406/1770 train_time:139475ms step_avg:99.20ms
step:1407/1770 train_time:139578ms step_avg:99.20ms
step:1408/1770 train_time:139683ms step_avg:99.21ms
step:1409/1770 train_time:139786ms step_avg:99.21ms
step:1410/1770 train_time:139890ms step_avg:99.21ms
step:1411/1770 train_time:139995ms step_avg:99.22ms
step:1412/1770 train_time:140097ms step_avg:99.22ms
step:1413/1770 train_time:140200ms step_avg:99.22ms
step:1414/1770 train_time:140302ms step_avg:99.22ms
step:1415/1770 train_time:140406ms step_avg:99.23ms
step:1416/1770 train_time:140511ms step_avg:99.23ms
step:1417/1770 train_time:140616ms step_avg:99.24ms
step:1418/1770 train_time:140720ms step_avg:99.24ms
step:1419/1770 train_time:140822ms step_avg:99.24ms
step:1420/1770 train_time:140925ms step_avg:99.24ms
step:1421/1770 train_time:141029ms step_avg:99.25ms
step:1422/1770 train_time:141133ms step_avg:99.25ms
step:1423/1770 train_time:141235ms step_avg:99.25ms
step:1424/1770 train_time:141337ms step_avg:99.25ms
step:1425/1770 train_time:141440ms step_avg:99.26ms
step:1426/1770 train_time:141543ms step_avg:99.26ms
step:1427/1770 train_time:141646ms step_avg:99.26ms
step:1428/1770 train_time:141751ms step_avg:99.27ms
step:1429/1770 train_time:141855ms step_avg:99.27ms
step:1430/1770 train_time:141958ms step_avg:99.27ms
step:1431/1770 train_time:142060ms step_avg:99.27ms
step:1432/1770 train_time:142163ms step_avg:99.28ms
step:1433/1770 train_time:142266ms step_avg:99.28ms
step:1434/1770 train_time:142370ms step_avg:99.28ms
step:1435/1770 train_time:142474ms step_avg:99.29ms
step:1436/1770 train_time:142578ms step_avg:99.29ms
step:1437/1770 train_time:142681ms step_avg:99.29ms
step:1438/1770 train_time:142788ms step_avg:99.30ms
step:1439/1770 train_time:142891ms step_avg:99.30ms
step:1440/1770 train_time:142994ms step_avg:99.30ms
step:1441/1770 train_time:143096ms step_avg:99.30ms
step:1442/1770 train_time:143199ms step_avg:99.31ms
step:1443/1770 train_time:143303ms step_avg:99.31ms
step:1444/1770 train_time:143405ms step_avg:99.31ms
step:1445/1770 train_time:143509ms step_avg:99.31ms
step:1446/1770 train_time:143616ms step_avg:99.32ms
step:1447/1770 train_time:143720ms step_avg:99.32ms
step:1448/1770 train_time:143824ms step_avg:99.33ms
step:1449/1770 train_time:143928ms step_avg:99.33ms
step:1450/1770 train_time:144033ms step_avg:99.33ms
step:1451/1770 train_time:144140ms step_avg:99.34ms
step:1452/1770 train_time:144243ms step_avg:99.34ms
step:1453/1770 train_time:144347ms step_avg:99.34ms
step:1454/1770 train_time:144451ms step_avg:99.35ms
step:1455/1770 train_time:144558ms step_avg:99.35ms
step:1456/1770 train_time:144662ms step_avg:99.36ms
step:1457/1770 train_time:144766ms step_avg:99.36ms
step:1458/1770 train_time:144871ms step_avg:99.36ms
step:1459/1770 train_time:144978ms step_avg:99.37ms
step:1460/1770 train_time:145082ms step_avg:99.37ms
step:1461/1770 train_time:145185ms step_avg:99.37ms
step:1462/1770 train_time:145289ms step_avg:99.38ms
step:1463/1770 train_time:145394ms step_avg:99.38ms
step:1464/1770 train_time:145500ms step_avg:99.39ms
step:1465/1770 train_time:145605ms step_avg:99.39ms
step:1466/1770 train_time:145709ms step_avg:99.39ms
step:1467/1770 train_time:145814ms step_avg:99.40ms
step:1468/1770 train_time:145920ms step_avg:99.40ms
step:1469/1770 train_time:146025ms step_avg:99.40ms
step:1470/1770 train_time:146129ms step_avg:99.41ms
step:1471/1770 train_time:146232ms step_avg:99.41ms
step:1472/1770 train_time:146336ms step_avg:99.41ms
step:1473/1770 train_time:146439ms step_avg:99.42ms
step:1474/1770 train_time:146544ms step_avg:99.42ms
step:1475/1770 train_time:146651ms step_avg:99.42ms
step:1476/1770 train_time:146755ms step_avg:99.43ms
step:1477/1770 train_time:146859ms step_avg:99.43ms
step:1478/1770 train_time:146964ms step_avg:99.43ms
step:1479/1770 train_time:147068ms step_avg:99.44ms
step:1480/1770 train_time:147172ms step_avg:99.44ms
step:1481/1770 train_time:147279ms step_avg:99.45ms
step:1482/1770 train_time:147384ms step_avg:99.45ms
step:1483/1770 train_time:147491ms step_avg:99.45ms
step:1484/1770 train_time:147595ms step_avg:99.46ms
step:1485/1770 train_time:147699ms step_avg:99.46ms
step:1486/1770 train_time:147804ms step_avg:99.46ms
step:1487/1770 train_time:147908ms step_avg:99.47ms
step:1488/1770 train_time:148013ms step_avg:99.47ms
step:1489/1770 train_time:148116ms step_avg:99.47ms
step:1490/1770 train_time:148222ms step_avg:99.48ms
step:1491/1770 train_time:148328ms step_avg:99.48ms
step:1492/1770 train_time:148431ms step_avg:99.48ms
step:1493/1770 train_time:148535ms step_avg:99.49ms
step:1494/1770 train_time:148639ms step_avg:99.49ms
step:1495/1770 train_time:148746ms step_avg:99.50ms
step:1496/1770 train_time:148854ms step_avg:99.50ms
step:1497/1770 train_time:148959ms step_avg:99.51ms
step:1498/1770 train_time:149062ms step_avg:99.51ms
step:1499/1770 train_time:149165ms step_avg:99.51ms
step:1500/1770 train_time:149269ms step_avg:99.51ms
step:1500/1770 val_loss:3.3426 train_time:149570ms step_avg:99.71ms
step:1501/1770 train_time:149581ms step_avg:99.65ms
step:1502/1770 train_time:149590ms step_avg:99.59ms
step:1503/1770 train_time:149598ms step_avg:99.53ms
step:1504/1770 train_time:149690ms step_avg:99.53ms
step:1505/1770 train_time:149796ms step_avg:99.53ms
step:1506/1770 train_time:149899ms step_avg:99.53ms
step:1507/1770 train_time:150004ms step_avg:99.54ms
step:1508/1770 train_time:150107ms step_avg:99.54ms
step:1509/1770 train_time:150211ms step_avg:99.54ms
step:1510/1770 train_time:150314ms step_avg:99.55ms
step:1511/1770 train_time:150418ms step_avg:99.55ms
step:1512/1770 train_time:150524ms step_avg:99.55ms
step:1513/1770 train_time:150632ms step_avg:99.56ms
step:1514/1770 train_time:150737ms step_avg:99.56ms
step:1515/1770 train_time:150840ms step_avg:99.56ms
step:1516/1770 train_time:150943ms step_avg:99.57ms
step:1517/1770 train_time:151047ms step_avg:99.57ms
step:1518/1770 train_time:151151ms step_avg:99.57ms
step:1519/1770 train_time:151254ms step_avg:99.57ms
step:1520/1770 train_time:151359ms step_avg:99.58ms
step:1521/1770 train_time:151464ms step_avg:99.58ms
step:1522/1770 train_time:151569ms step_avg:99.59ms
step:1523/1770 train_time:151676ms step_avg:99.59ms
step:1524/1770 train_time:151780ms step_avg:99.59ms
step:1525/1770 train_time:151885ms step_avg:99.60ms
step:1526/1770 train_time:151989ms step_avg:99.60ms
step:1527/1770 train_time:152093ms step_avg:99.60ms
step:1528/1770 train_time:152196ms step_avg:99.60ms
step:1529/1770 train_time:152301ms step_avg:99.61ms
step:1530/1770 train_time:152407ms step_avg:99.61ms
step:1531/1770 train_time:152513ms step_avg:99.62ms
step:1532/1770 train_time:152619ms step_avg:99.62ms
step:1533/1770 train_time:152722ms step_avg:99.62ms
step:1534/1770 train_time:152827ms step_avg:99.63ms
step:1535/1770 train_time:152933ms step_avg:99.63ms
step:1536/1770 train_time:153036ms step_avg:99.63ms
step:1537/1770 train_time:153140ms step_avg:99.64ms
step:1538/1770 train_time:153244ms step_avg:99.64ms
step:1539/1770 train_time:153349ms step_avg:99.64ms
step:1540/1770 train_time:153456ms step_avg:99.65ms
step:1541/1770 train_time:153560ms step_avg:99.65ms
step:1542/1770 train_time:153666ms step_avg:99.65ms
step:1543/1770 train_time:153771ms step_avg:99.66ms
step:1544/1770 train_time:153877ms step_avg:99.66ms
step:1545/1770 train_time:153981ms step_avg:99.66ms
step:1546/1770 train_time:154085ms step_avg:99.67ms
step:1547/1770 train_time:154191ms step_avg:99.67ms
step:1548/1770 train_time:154296ms step_avg:99.67ms
step:1549/1770 train_time:154398ms step_avg:99.68ms
step:1550/1770 train_time:154502ms step_avg:99.68ms
step:1551/1770 train_time:154607ms step_avg:99.68ms
step:1552/1770 train_time:154712ms step_avg:99.69ms
step:1553/1770 train_time:154815ms step_avg:99.69ms
step:1554/1770 train_time:154920ms step_avg:99.69ms
step:1555/1770 train_time:155024ms step_avg:99.69ms
step:1556/1770 train_time:155127ms step_avg:99.70ms
step:1557/1770 train_time:155232ms step_avg:99.70ms
step:1558/1770 train_time:155336ms step_avg:99.70ms
step:1559/1770 train_time:155440ms step_avg:99.70ms
step:1560/1770 train_time:155544ms step_avg:99.71ms
step:1561/1770 train_time:155649ms step_avg:99.71ms
step:1562/1770 train_time:155754ms step_avg:99.71ms
step:1563/1770 train_time:155860ms step_avg:99.72ms
step:1564/1770 train_time:155963ms step_avg:99.72ms
step:1565/1770 train_time:156067ms step_avg:99.72ms
step:1566/1770 train_time:156172ms step_avg:99.73ms
step:1567/1770 train_time:156276ms step_avg:99.73ms
step:1568/1770 train_time:156380ms step_avg:99.73ms
step:1569/1770 train_time:156485ms step_avg:99.74ms
step:1570/1770 train_time:156589ms step_avg:99.74ms
step:1571/1770 train_time:156696ms step_avg:99.74ms
step:1572/1770 train_time:156801ms step_avg:99.75ms
step:1573/1770 train_time:156906ms step_avg:99.75ms
step:1574/1770 train_time:157010ms step_avg:99.75ms
step:1575/1770 train_time:157116ms step_avg:99.76ms
step:1576/1770 train_time:157220ms step_avg:99.76ms
step:1577/1770 train_time:157325ms step_avg:99.76ms
step:1578/1770 train_time:157429ms step_avg:99.76ms
step:1579/1770 train_time:157536ms step_avg:99.77ms
step:1580/1770 train_time:157642ms step_avg:99.77ms
step:1581/1770 train_time:157745ms step_avg:99.78ms
step:1582/1770 train_time:157851ms step_avg:99.78ms
step:1583/1770 train_time:157956ms step_avg:99.78ms
step:1584/1770 train_time:158060ms step_avg:99.79ms
step:1585/1770 train_time:158164ms step_avg:99.79ms
step:1586/1770 train_time:158270ms step_avg:99.79ms
step:1587/1770 train_time:158373ms step_avg:99.79ms
step:1588/1770 train_time:158479ms step_avg:99.80ms
step:1589/1770 train_time:158583ms step_avg:99.80ms
step:1590/1770 train_time:158687ms step_avg:99.80ms
step:1591/1770 train_time:158793ms step_avg:99.81ms
step:1592/1770 train_time:158897ms step_avg:99.81ms
step:1593/1770 train_time:158999ms step_avg:99.81ms
step:1594/1770 train_time:159104ms step_avg:99.81ms
step:1595/1770 train_time:159209ms step_avg:99.82ms
step:1596/1770 train_time:159314ms step_avg:99.82ms
step:1597/1770 train_time:159418ms step_avg:99.82ms
step:1598/1770 train_time:159522ms step_avg:99.83ms
step:1599/1770 train_time:159625ms step_avg:99.83ms
step:1600/1770 train_time:159731ms step_avg:99.83ms
step:1601/1770 train_time:159836ms step_avg:99.84ms
step:1602/1770 train_time:159942ms step_avg:99.84ms
step:1603/1770 train_time:160046ms step_avg:99.84ms
step:1604/1770 train_time:160151ms step_avg:99.84ms
step:1605/1770 train_time:160255ms step_avg:99.85ms
step:1606/1770 train_time:160359ms step_avg:99.85ms
step:1607/1770 train_time:160464ms step_avg:99.85ms
step:1608/1770 train_time:160569ms step_avg:99.86ms
step:1609/1770 train_time:160679ms step_avg:99.86ms
step:1610/1770 train_time:160784ms step_avg:99.87ms
step:1611/1770 train_time:160890ms step_avg:99.87ms
step:1612/1770 train_time:160995ms step_avg:99.87ms
step:1613/1770 train_time:161100ms step_avg:99.88ms
step:1614/1770 train_time:161204ms step_avg:99.88ms
step:1615/1770 train_time:161308ms step_avg:99.88ms
step:1616/1770 train_time:161413ms step_avg:99.88ms
step:1617/1770 train_time:161519ms step_avg:99.89ms
step:1618/1770 train_time:161624ms step_avg:99.89ms
step:1619/1770 train_time:161729ms step_avg:99.89ms
step:1620/1770 train_time:161836ms step_avg:99.90ms
step:1621/1770 train_time:161940ms step_avg:99.90ms
step:1622/1770 train_time:162046ms step_avg:99.90ms
step:1623/1770 train_time:162150ms step_avg:99.91ms
step:1624/1770 train_time:162254ms step_avg:99.91ms
step:1625/1770 train_time:162361ms step_avg:99.91ms
step:1625/1770 val_loss:3.3080 train_time:162662ms step_avg:100.10ms
step:1626/1770 train_time:162673ms step_avg:100.04ms
step:1627/1770 train_time:162682ms step_avg:99.99ms
step:1628/1770 train_time:162690ms step_avg:99.93ms
step:1629/1770 train_time:162782ms step_avg:99.93ms
step:1630/1770 train_time:162885ms step_avg:99.93ms
step:1631/1770 train_time:162988ms step_avg:99.93ms
step:1632/1770 train_time:163090ms step_avg:99.93ms
step:1633/1770 train_time:163193ms step_avg:99.93ms
step:1634/1770 train_time:163295ms step_avg:99.94ms
step:1635/1770 train_time:163399ms step_avg:99.94ms
step:1636/1770 train_time:163502ms step_avg:99.94ms
step:1637/1770 train_time:163611ms step_avg:99.95ms
step:1638/1770 train_time:163718ms step_avg:99.95ms
step:1639/1770 train_time:163824ms step_avg:99.95ms
step:1640/1770 train_time:163927ms step_avg:99.96ms
step:1641/1770 train_time:164031ms step_avg:99.96ms
step:1642/1770 train_time:164135ms step_avg:99.96ms
step:1643/1770 train_time:164238ms step_avg:99.96ms
step:1644/1770 train_time:164341ms step_avg:99.96ms
step:1645/1770 train_time:164446ms step_avg:99.97ms
step:1646/1770 train_time:164552ms step_avg:99.97ms
step:1647/1770 train_time:164658ms step_avg:99.97ms
step:1648/1770 train_time:164767ms step_avg:99.98ms
step:1649/1770 train_time:164873ms step_avg:99.98ms
step:1650/1770 train_time:164976ms step_avg:99.99ms
step:1651/1770 train_time:165080ms step_avg:99.99ms
step:1652/1770 train_time:165184ms step_avg:99.99ms
step:1653/1770 train_time:165286ms step_avg:99.99ms
step:1654/1770 train_time:165391ms step_avg:99.99ms
step:1655/1770 train_time:165495ms step_avg:100.00ms
step:1656/1770 train_time:165602ms step_avg:100.00ms
step:1657/1770 train_time:165708ms step_avg:100.00ms
step:1658/1770 train_time:165813ms step_avg:100.01ms
step:1659/1770 train_time:165919ms step_avg:100.01ms
step:1660/1770 train_time:166024ms step_avg:100.01ms
step:1661/1770 train_time:166129ms step_avg:100.02ms
step:1662/1770 train_time:166231ms step_avg:100.02ms
step:1663/1770 train_time:166335ms step_avg:100.02ms
step:1664/1770 train_time:166439ms step_avg:100.02ms
step:1665/1770 train_time:166544ms step_avg:100.03ms
step:1666/1770 train_time:166650ms step_avg:100.03ms
step:1667/1770 train_time:166755ms step_avg:100.03ms
step:1668/1770 train_time:166860ms step_avg:100.04ms
step:1669/1770 train_time:166965ms step_avg:100.04ms
step:1670/1770 train_time:167069ms step_avg:100.04ms
step:1671/1770 train_time:167172ms step_avg:100.04ms
step:1672/1770 train_time:167276ms step_avg:100.05ms
step:1673/1770 train_time:167381ms step_avg:100.05ms
step:1674/1770 train_time:167486ms step_avg:100.05ms
step:1675/1770 train_time:167592ms step_avg:100.05ms
step:1676/1770 train_time:167695ms step_avg:100.06ms
step:1677/1770 train_time:167801ms step_avg:100.06ms
step:1678/1770 train_time:167909ms step_avg:100.06ms
step:1679/1770 train_time:168016ms step_avg:100.07ms
step:1680/1770 train_time:168119ms step_avg:100.07ms
step:1681/1770 train_time:168223ms step_avg:100.07ms
step:1682/1770 train_time:168326ms step_avg:100.08ms
step:1683/1770 train_time:168430ms step_avg:100.08ms
step:1684/1770 train_time:168537ms step_avg:100.08ms
step:1685/1770 train_time:168642ms step_avg:100.08ms
step:1686/1770 train_time:168748ms step_avg:100.09ms
step:1687/1770 train_time:168853ms step_avg:100.09ms
step:1688/1770 train_time:168960ms step_avg:100.09ms
step:1689/1770 train_time:169068ms step_avg:100.10ms
step:1690/1770 train_time:169172ms step_avg:100.10ms
step:1691/1770 train_time:169275ms step_avg:100.10ms
step:1692/1770 train_time:169379ms step_avg:100.11ms
step:1693/1770 train_time:169484ms step_avg:100.11ms
step:1694/1770 train_time:169590ms step_avg:100.11ms
step:1695/1770 train_time:169694ms step_avg:100.11ms
step:1696/1770 train_time:169799ms step_avg:100.12ms
step:1697/1770 train_time:169905ms step_avg:100.12ms
step:1698/1770 train_time:170011ms step_avg:100.12ms
step:1699/1770 train_time:170116ms step_avg:100.13ms
step:1700/1770 train_time:170220ms step_avg:100.13ms
step:1701/1770 train_time:170324ms step_avg:100.13ms
step:1702/1770 train_time:170427ms step_avg:100.13ms
step:1703/1770 train_time:170531ms step_avg:100.14ms
step:1704/1770 train_time:170634ms step_avg:100.14ms
step:1705/1770 train_time:170739ms step_avg:100.14ms
step:1706/1770 train_time:170848ms step_avg:100.15ms
step:1707/1770 train_time:170953ms step_avg:100.15ms
step:1708/1770 train_time:171058ms step_avg:100.15ms
step:1709/1770 train_time:171165ms step_avg:100.15ms
step:1710/1770 train_time:171269ms step_avg:100.16ms
step:1711/1770 train_time:171374ms step_avg:100.16ms
step:1712/1770 train_time:171485ms step_avg:100.17ms
step:1713/1770 train_time:171594ms step_avg:100.17ms
step:1714/1770 train_time:171698ms step_avg:100.17ms
step:1715/1770 train_time:171805ms step_avg:100.18ms
step:1716/1770 train_time:171909ms step_avg:100.18ms
step:1717/1770 train_time:172015ms step_avg:100.18ms
step:1718/1770 train_time:172121ms step_avg:100.19ms
step:1719/1770 train_time:172226ms step_avg:100.19ms
step:1720/1770 train_time:172330ms step_avg:100.19ms
step:1721/1770 train_time:172437ms step_avg:100.20ms
step:1722/1770 train_time:172545ms step_avg:100.20ms
step:1723/1770 train_time:172651ms step_avg:100.20ms
step:1724/1770 train_time:172758ms step_avg:100.21ms
step:1725/1770 train_time:172864ms step_avg:100.21ms
step:1726/1770 train_time:172971ms step_avg:100.21ms
step:1727/1770 train_time:173078ms step_avg:100.22ms
step:1728/1770 train_time:173185ms step_avg:100.22ms
step:1729/1770 train_time:173291ms step_avg:100.23ms
step:1730/1770 train_time:173397ms step_avg:100.23ms
step:1731/1770 train_time:173503ms step_avg:100.23ms
step:1732/1770 train_time:173608ms step_avg:100.24ms
step:1733/1770 train_time:173713ms step_avg:100.24ms
step:1734/1770 train_time:173819ms step_avg:100.24ms
step:1735/1770 train_time:173926ms step_avg:100.25ms
step:1736/1770 train_time:174030ms step_avg:100.25ms
step:1737/1770 train_time:174135ms step_avg:100.25ms
step:1738/1770 train_time:174240ms step_avg:100.25ms
step:1739/1770 train_time:174347ms step_avg:100.26ms
step:1740/1770 train_time:174452ms step_avg:100.26ms
step:1741/1770 train_time:174556ms step_avg:100.26ms
step:1742/1770 train_time:174662ms step_avg:100.26ms
step:1743/1770 train_time:174769ms step_avg:100.27ms
step:1744/1770 train_time:174876ms step_avg:100.27ms
step:1745/1770 train_time:174982ms step_avg:100.28ms
step:1746/1770 train_time:175088ms step_avg:100.28ms
step:1747/1770 train_time:175193ms step_avg:100.28ms
step:1748/1770 train_time:175301ms step_avg:100.29ms
step:1749/1770 train_time:175406ms step_avg:100.29ms
step:1750/1770 train_time:175514ms step_avg:100.29ms
step:1750/1770 val_loss:3.2813 train_time:175816ms step_avg:100.47ms
step:1751/1770 train_time:175826ms step_avg:100.41ms
step:1752/1770 train_time:175836ms step_avg:100.36ms
step:1753/1770 train_time:175844ms step_avg:100.31ms
step:1754/1770 train_time:175935ms step_avg:100.31ms
step:1755/1770 train_time:176040ms step_avg:100.31ms
step:1756/1770 train_time:176144ms step_avg:100.31ms
step:1757/1770 train_time:176248ms step_avg:100.31ms
step:1758/1770 train_time:176352ms step_avg:100.31ms
step:1759/1770 train_time:176457ms step_avg:100.32ms
step:1760/1770 train_time:176561ms step_avg:100.32ms
step:1761/1770 train_time:176667ms step_avg:100.32ms
step:1762/1770 train_time:176778ms step_avg:100.33ms
step:1763/1770 train_time:176885ms step_avg:100.33ms
step:1764/1770 train_time:176995ms step_avg:100.34ms
step:1765/1770 train_time:177099ms step_avg:100.34ms
step:1766/1770 train_time:177202ms step_avg:100.34ms
step:1767/1770 train_time:177307ms step_avg:100.34ms
step:1768/1770 train_time:177415ms step_avg:100.35ms
step:1769/1770 train_time:177517ms step_avg:100.35ms
step:1770/1770 train_time:177622ms step_avg:100.35ms
step:1770/1770 val_loss:3.2782 train_time:177930ms step_avg:100.53ms
peak memory allocated: 29784 MiB reserved: 40536 MiB
|