its work!

by uzvisa - opened 2 days ago

2 days ago

Hi there!

Thanks a bunch for your help and attention!

I just checked and it seems to be working perfectly!

By the way, is it hard to take away the vision feature from the model?
For example, if I don’t use the vision feature, can the model be smaller?
I mostly write some texts, short descriptions, and stories.
My laptop isn’t up for heavy models.
It’s a MacBook Pro M1 with 16GB of RAM. And when I use LLMs, I end up unloading everything from RAM.

Here’s my list of models:

ExaltedSlayer

Owner about 9 hours ago

•

edited about 9 hours ago

No, not at this time. Because of how quantizing with MXFP4 isn't natively supported in MLX-VLM (yet), I have to bolt on that feature and quantizing the embeddings doesn't go quite as much as a regular 4bit vision model. I can't remove the vision capability without damaging the model overall.

You could look at alternative models and see if they will work for your needs. There are many good text-only models. Olmo3 has a good 7b model that even at 8bit is only 7.76GB (https://huggingface.co/mlx-community/Olmo-3-7B-Instruct-8bit). I use it for a lot of things. I could see what an MXFP4 quant of that would come out to. My guess would be between 4-4.5GB. They also have a 7B Reasoning/Thinking model as well. Their 32B think is great, but I've not tried 7B think, only 7B instruct which is good.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment