its work!

#1
by uzvisa - opened

Hi there!

Thanks a bunch for your help and attention!

I just checked and it seems to be working perfectly!

By the way, is it hard to take away the vision feature from the model?
For example, if I don’t use the vision feature, can the model be smaller?
I mostly write some texts, short descriptions, and stories.
My laptop isn’t up for heavy models.
It’s a MacBook Pro M1 with 16GB of RAM. And when I use LLMs, I end up unloading everything from RAM.

Here’s my list of models:
image

No, not at this time. Because of how quantizing with MXFP4 isn't natively supported in MLX-VLM (yet), I have to bolt on that feature and quantizing the embeddings doesn't go quite as much as a regular 4bit vision model. I can't remove the vision capability without damaging the model overall.

You could look at alternative models and see if they will work for your needs. There are many good text-only models. Olmo3 has a good 7b model that even at 8bit is only 7.76GB (https://huggingface.co/mlx-community/Olmo-3-7B-Instruct-8bit). I use it for a lot of things. I could see what an MXFP4 quant of that would come out to. My guess would be between 4-4.5GB. They also have a 7B Reasoning/Thinking model as well. Their 32B think is great, but I've not tried 7B think, only 7B instruct which is good.

Sign up or log in to comment