Add batch size 4 configurations for LLama 1B and 3B models 3b6312a verified dacorvo HF Staff commited on Jun 25
Rename inference-cache-config/llama-3.1-8B.json to inference-cache-config/llama.json 14844a0 verified dacorvo HF Staff commited on Sep 26, 2024
Rename inference-cache-config/llama.json to inference-cache-config/llama2.json f06a55a verified dacorvo HF Staff commited on Apr 19, 2024
Added Llama-70b batch_size 4 to inference cache 593822e verified dacorvo HF Staff commited on Mar 8, 2024