The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there: Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B Qwen2.5-Coder: 1.5B, 7B, and 32B on the way Qwen2.5-Math: 1.5B, 7B, and 72B.
And they didn't sleep: the performance is top of the game for each weight category!
๐๐๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:
๐ All models have ๐ญ๐ฎ๐ด๐ธ ๐๐ผ๐ธ๐ฒ๐ป ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐ ๐น๐ฒ๐ป๐ด๐๐ต
๐ Models pre-trained on 18T tokens, even longer than the 15T of Llama-3
๐ซ๐ท On top of this, it ๐๐ฎ๐ธ๐ฒ๐ ๐๐ต๐ฒ #๐ญ ๐๐ฝ๐ผ๐ ๐ผ๐ป ๐บ๐๐น๐๐ถ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐๐ฎ๐๐ธ๐ so it might become my standard for French
๐ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!
๐งฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."
๐ Technical report to be released "very soon"
๐ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"
Hey everyone! ๐ This is my first post here and Iโm super excited to start with not just one, but two awesome updates! ๐
Some of you might already know that I recently started my internship at Hugging Face. Iโm grateful to be a part of the LLMs evaluation team and the Open LLM Leaderboard! ๐ค
Next, Iโm excited to share a cool new feature โ you can now search for models on the Open LLM Leaderboard by their licenses! ๐ต๏ธโโ๏ธ This feature will help you find the perfect model for your projects way faster. Just type "license: MIT" as a test run!
I'd be super happy if you'd follow me here for more updates on the Leaderboard and other exciting developments. Canโt wait to share more with you soon! โจ