Is the LLM part yet granite-3.1-2b-instruct model?

#3
by jcvasquez - opened

I would like to point that the release note mentions improvements on the visual part. Helas I would like to discuss the LLM part since I have found in the past the output of granite-vision-3.2 inconsistent for my use cases and that might be due to its llm part. I think LLM part is the culprit because it has one of the worst hallucination rates in vectara HHEM benchmark hosted in HF (https://huggingface.co/spaces/vectara/leaderboard) up to 15.7% while top models are belo 2%mark and even some older granite like 3.0.8B has only a third of that rate with 6%.
Screenshot 2025-07-08 at 10.30.11 AM.png

From a brief exploration of this leaderboard, some popular models aren't that great either:

image (1).png
image (2).png

From Granite side, we don't track this leaderboard. We look at, RAGTruth which focuses on hallucinations in RAG response generation. In that, our 8B models are not far off from the top models.

Sign up or log in to comment