Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/nxd_model/token_generation_model/_tp0_bk0/model.MODULE_00d39c882e9e437ead19+747527b0.hlo_module.pb', '--output', '/tmp/nxd_model/token_generation_model/_tp0_bk0/model.MODULE_00d39c882e9e437ead19+747527b0.neff', '--target=trn2', '--auto-cast=none', '--model-type=transformer', '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma ', '-O2', '--lnc=2', '--logfile=/tmp/nxd_model/token_generation_model/_tp0_bk0/log-neuron-cc.txt', '--enable-internal-neff-wrapper', '--verbose=35']: 2025-10-17T07:11:57Z 2025-10-17 07:11:57.671531: F hilo/hlo_passes/NeuronHloVerifier.cc:504] [ERROR] [NCC_VRF009] Memory requirement exceeds target architecture's HBM limit. Needed 17455884400 bytes (16 GB) vs. available 17179869184 bytes (16 GB). TIP: Consider using smaller batches or applying model parallelism