Is the LLM part yet granite-3.1-2b-instruct model?

by jcvasquez - opened Jul 8

Jul 8

•

I would like to point that the release note mentions improvements on the visual part. Helas I would like to discuss the LLM part since I have found in the past the output of granite-vision-3.2 inconsistent for my use cases and that might be due to its llm part. I think LLM part is the culprit because it has one of the worst hallucination rates in vectara HHEM benchmark hosted in HF (https://huggingface.co/spaces/vectara/leaderboard) up to 15.7% while top models are belo 2%mark and even some older granite like 3.0.8B has only a third of that rate with 6%.

HuiWuUS

Jul 9

From a brief exploration of this leaderboard, some popular models aren't that great either:

HuiWuUS

Jul 9

From Granite side, we don't track this leaderboard. We look at, RAGTruth which focuses on hallucinations in RAG response generation. In that, our 8B models are not far off from the top models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment