Activity Feed

AI & ML interests

Request to join this organization to beta-test notebooks on Hugging Face!

Recent Activity

mrfakenameย 
posted an update 10 days ago
view post
Post
3088
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn ๐Ÿ”Š on to hear audio samples)
ยท
merveย 
posted an update 17 days ago
view post
Post
4988
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 2 replies
ยท
lbourdoisย 
posted an update 24 days ago
merveย 
posted an update about 2 months ago
view post
Post
6620
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท
merveย 
posted an update about 2 months ago
view post
Post
3272
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐Ÿ”ฅ

> not only a document converter but also can do document question answering, understand multiple languages ๐Ÿคฏ
> best part: released with Apache 2.0 license ๐Ÿ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! ๐Ÿค—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo ๐Ÿ’—
merveย 
posted an update about 2 months ago
view post
Post
1129
a ton of image/video generation models and LLMs from big labs ๐Ÿ”ฅ

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use ๐Ÿ’ฌ
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR ๐Ÿ“
> ByteDance released bytedance-research/HuMo, video generation from any input โฏ๏ธ

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveย 
posted an update about 2 months ago
view post
Post
947
fan-favorite vision LM Florence-2 is now officially supported in transformers ๐Ÿค—

find all the models in florence-community org ๐Ÿซก
ariG23498ย 
posted an update about 2 months ago
view post
Post
983
New post is live!

This time we cover some major updates to transformers.

๐Ÿค—
  • 1 reply
ยท
merveย 
posted an update about 2 months ago
merveย 
posted an update about 2 months ago
davanstrienย 
posted an update 2 months ago
merveย 
posted an update 2 months ago
view post
Post
6259
large AI labs have dropped so many open models last week ๐Ÿ”ฅ don't miss out on them

โ†’ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
โ†’ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
โ†’ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
ยท
merveย 
posted an update 2 months ago
view post
Post
6043
first vision language model built off openai/gpt-oss-20b just dropped! ๐Ÿ”ฅ

InternVL3.5 comes with 32 models ๐Ÿคฏ pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part โคต๏ธ
  • 1 reply
ยท
merveย 
posted an update 3 months ago
view post
Post
3297
GPT-4.1-mini level model right in your iPhone ๐Ÿคฏ

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks ๐Ÿ”ฅ

allows commercial use as well!