Post
4755
๐ Update: We release the technical report of MGM-Omni. Moreover, we introduce Long-TTS-Eval, a benchmark for long-form and complex case TTS evaluation.
๐ Arxiv: https://arxiv.org/abs/2509.25131
๐ benchmark: wcy1122/Long-TTS-Eval
-------------------------
๐ Introducing MGM-Omni, an omni-chatbot capable of processing text, image, video, and speech inputs, and can generate both text and speech responses.
๐ MGM-Omni support hour-level audio understanding.
๐ฃ๏ธ MGM-Omni support 10-minute speech generation and voice cloning.
For more details, please check:
๐ Blog: https://mgm-omni.notion.site/MGM-Omni-An-Open-source-Omni-Chatbot-2395728e0b0180149ac9f24683fc9907
๐ Code: https://github.com/dvlab-research/MGM-Omni
๐ค Model: wcy1122/mgm-omni-6896075e97317a88825032e1
๐ฎ Demo: wcy1122/MGM-Omni
๐ Arxiv: https://arxiv.org/abs/2509.25131
๐ benchmark: wcy1122/Long-TTS-Eval
-------------------------
๐ Introducing MGM-Omni, an omni-chatbot capable of processing text, image, video, and speech inputs, and can generate both text and speech responses.
๐ MGM-Omni support hour-level audio understanding.
๐ฃ๏ธ MGM-Omni support 10-minute speech generation and voice cloning.
For more details, please check:
๐ Blog: https://mgm-omni.notion.site/MGM-Omni-An-Open-source-Omni-Chatbot-2395728e0b0180149ac9f24683fc9907
๐ Code: https://github.com/dvlab-research/MGM-Omni
๐ค Model: wcy1122/mgm-omni-6896075e97317a88825032e1
๐ฎ Demo: wcy1122/MGM-Omni