🔥 𝐐𝐰𝐞𝐧 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐭𝐡𝐞𝐢𝐫 𝟐.𝟓 𝐟𝐚𝐦𝐢𝐥𝐲 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥𝐬: 𝐍𝐞𝐰 𝐒𝐎𝐓𝐀 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐬𝐢𝐳𝐞𝐬 𝐮𝐩 𝐭𝐨 𝟕𝟐𝐁!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

𝐊𝐞𝐲 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬:

🌐 All models have 𝟭𝟮𝟴𝗸 𝘁𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵

📚 Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

💪 The flagship 𝗤𝘄𝗲𝗻𝟮.𝟱-𝟳𝟮𝗕 𝗶𝘀 ~𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝘄𝗶𝘁𝗵 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟰𝟬𝟱𝗕, 𝗮𝗻𝗱 𝗵𝗮𝘀 𝗮 𝟯-𝟱% 𝗺𝗮𝗿𝗴𝗶𝗻 𝗼𝗻 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟳𝟬𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀.

🇫🇷 On top of this, it 𝘁𝗮𝗸𝗲𝘀 𝘁𝗵𝗲 #𝟭 𝘀𝗽𝗼𝘁 𝗼𝗻 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 so it might become my standard for French

💻 Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

🧮 Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

📄 Technical report to be released "very soon"

🔓 All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

🤗 All models are available on the HF Hub! ➡️ Qwen/qwen25-66e81a666513e518adb90d9e

2 replies

upvoted a paper over 1 year ago

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7, 2024 • 25

reacted to alozowski's post with ❤️ over 1 year ago

Post

3045

Hey everyone! 👋
This is my first post here and I’m super excited to start with not just one, but two awesome updates! 🚀

Some of you might already know that I recently started my internship at Hugging Face. I’m grateful to be a part of the LLMs evaluation team and the Open LLM Leaderboard! 🤗

First up, we’ve got some big news: we’ve just completed the evaluations for the mistral-community/Mixtral-8x22B-v0.1, and guess what? It’s now the top-performing pretrained model on the Open LLM Leaderboard! A huge shoutout to Mistral! 🏆👏 You can see more details and check out the evaluation results right here – https://huggingface.co/datasets/open-llm-leaderboard/details_mistral-community__Mixtral-8x22B-v0.1

Next, I’m excited to share a cool new feature – you can now search for models on the Open LLM Leaderboard by their licenses! 🕵️‍♂️ This feature will help you find the perfect model for your projects way faster. Just type "license: MIT" as a test run!

I'd be super happy if you'd follow me here for more updates on the Leaderboard and other exciting developments. Can’t wait to share more with you soon! ✨

updated a collection about 2 years ago

llama-2

Collection

1 item • Updated Oct 20, 2023

Santiago Komadina

AI & ML interests

Recent Activity

Organizations

skomadinajs's activity