Enhance model card: Add project page, abstract, and transformers sample usage
Browse filesThis PR improves the model card by:
- Adding an explicit link to the Hugging Face collection project page.
- Including the full paper abstract for better context.
- Adding a `transformers` specific sample usage section, directly taken from the GitHub repository, to help users quickly get started with the model.
- Adding relevant tags: `llm-agent` and `summarization`.
- Renaming the existing 'Quick Start' section for clarity to 'Quick Start (MiniCPM4-Survey Agent)'.
README.md
CHANGED
|
@@ -5,6 +5,9 @@ language:
|
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
| 7 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
<div align="center">
|
|
@@ -12,9 +15,10 @@ pipeline_tag: text-generation
|
|
| 12 |
</div>
|
| 13 |
|
| 14 |
<p align="center">
|
| 15 |
-
<a href="https://github.com/OpenBMB/MiniCPM
|
|
|
|
| 16 |
<a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a> |
|
| 17 |
-
<a href="https://huggingface.co/
|
| 18 |
</p>
|
| 19 |
<p align="center">
|
| 20 |
π Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
|
|
@@ -22,6 +26,10 @@ pipeline_tag: text-generation
|
|
| 22 |
|
| 23 |
This repository contains the model described in the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## What's New
|
| 26 |
|
| 27 |
* [2025-06-05] πππ We have open-sourced **MiniCPM4-Survey**, a model built upon MiniCPM4-8B that is capable of generating trustworthy, long-form survey papers while maintaining competitive performance relative to significantly larger models.
|
|
@@ -54,7 +62,50 @@ Key features include:
|
|
| 54 |
- **Multi-Step RL Training Strategy** β We propose a *Context Manager* to ensure retention of essential information while facilitating efficient reasoning, and we construct *Parallel Environment* to maintain efficient RL training cycles.
|
| 55 |
|
| 56 |
|
| 57 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
### Download the model
|
| 60 |
|
|
@@ -105,7 +156,7 @@ Then you can visit `http://localhost:5173` in your browser to use the model.
|
|
| 105 |
| Naive RAG (driven by G2FT) | 3.25 | 2.95 | 3.35 | 2.60 | 3.04 | 43.68 |
|
| 106 |
| AutoSurvey (driven by G2FT) | 3.10 | 3.25 | 3.15 | **3.15**| 3.16 | 46.56 |
|
| 107 |
| Webthinker (driven by WTR1-7B) | 3.30 | 3.00 | 2.75 | 2.50 | 2.89 | -- |
|
| 108 |
-
| Webthinker (driven by QwQ-32B) | 3.40 | 3.30 | 3.30 | 2.50 | 3.13 | --
|
| 109 |
| OpenAI Deep Research (driven by GPT-4o) | 3.50 |**3.95** | 3.55 | 3.00 | **3.50** | -- |
|
| 110 |
| MiniCPM4-Survey | 3.45 | 3.70 | **3.85** | 3.00 | **3.50** | **68.73** |
|
| 111 |
| *w/o* RL | **3.55** | 3.35 | 3.30 | 2.25 | 3.11 | 50.24 |
|
|
|
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- llm-agent
|
| 10 |
+
- summarization
|
| 11 |
---
|
| 12 |
|
| 13 |
<div align="center">
|
|
|
|
| 15 |
</div>
|
| 16 |
|
| 17 |
<p align="center">
|
| 18 |
+
<a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
|
| 19 |
+
<a href="https://huggingface.co/papers/2506.07900" target="_blank">Paper</a> |
|
| 20 |
<a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a> |
|
| 21 |
+
<a href="https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b" target="_blank">Project Page</a>
|
| 22 |
</p>
|
| 23 |
<p align="center">
|
| 24 |
π Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
|
|
|
|
| 26 |
|
| 27 |
This repository contains the model described in the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
|
| 28 |
|
| 29 |
+
## Paper Abstract
|
| 30 |
+
|
| 31 |
+
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose this http URL that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Furthermore, we construct a hybrid reasoning model, MiniCPM4.1, which can be used in both deep reasoning mode and non-reasoning mode. Evaluation results demonstrate that MiniCPM4 and MiniCPM4.1 outperform similar-sized open-source models across benchmarks, with the 8B variants showing significant speed improvements on long sequence understanding and generation.
|
| 32 |
+
|
| 33 |
## What's New
|
| 34 |
|
| 35 |
* [2025-06-05] πππ We have open-sourced **MiniCPM4-Survey**, a model built upon MiniCPM4-8B that is capable of generating trustworthy, long-form survey papers while maintaining competitive performance relative to significantly larger models.
|
|
|
|
| 62 |
- **Multi-Step RL Training Strategy** β We propose a *Context Manager* to ensure retention of essential information while facilitating efficient reasoning, and we construct *Parallel Environment* to maintain efficient RL training cycles.
|
| 63 |
|
| 64 |
|
| 65 |
+
## Sample Usage (Transformers)
|
| 66 |
+
|
| 67 |
+
The model is compatible with the `transformers` library. Here's how to use it for text generation:
|
| 68 |
+
|
| 69 |
+
```python
|
| 70 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 71 |
+
import torch
|
| 72 |
+
torch.manual_seed(0)
|
| 73 |
+
|
| 74 |
+
path = 'openbmb/MiniCPM4-8B' # Note: This example uses MiniCPM4-8B, but can be adapted for MiniCPM4-Survey.
|
| 75 |
+
device = "cuda"
|
| 76 |
+
tokenizer = AutoTokenizer.from_pretrained(path)
|
| 77 |
+
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
|
| 78 |
+
|
| 79 |
+
# User can directly use the chat interface
|
| 80 |
+
# responds, history = model.chat(tokenizer, "Write an article about Artificial Intelligence.", temperature=0.7, top_p=0.7)
|
| 81 |
+
# print(responds)
|
| 82 |
+
|
| 83 |
+
# User can also use the generate interface
|
| 84 |
+
messages = [
|
| 85 |
+
{"role": "user", "content": "Write an article about Artificial Intelligence."},
|
| 86 |
+
]
|
| 87 |
+
prompt_text = tokenizer.apply_chat_template(
|
| 88 |
+
messages,
|
| 89 |
+
tokenize=False,
|
| 90 |
+
add_generation_prompt=True,
|
| 91 |
+
)
|
| 92 |
+
model_inputs = tokenizer([prompt_text], return_tensors="pt").to(device)
|
| 93 |
+
|
| 94 |
+
model_outputs = model.generate(
|
| 95 |
+
**model_inputs,
|
| 96 |
+
max_new_tokens=1024,
|
| 97 |
+
top_p=0.7,
|
| 98 |
+
temperature=0.7
|
| 99 |
+
)
|
| 100 |
+
output_token_ids = [
|
| 101 |
+
model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs['input_ids']))
|
| 102 |
+
]
|
| 103 |
+
|
| 104 |
+
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
|
| 105 |
+
print(responses)
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
## Quick Start (MiniCPM4-Survey Agent)
|
| 109 |
|
| 110 |
### Download the model
|
| 111 |
|
|
|
|
| 156 |
| Naive RAG (driven by G2FT) | 3.25 | 2.95 | 3.35 | 2.60 | 3.04 | 43.68 |
|
| 157 |
| AutoSurvey (driven by G2FT) | 3.10 | 3.25 | 3.15 | **3.15**| 3.16 | 46.56 |
|
| 158 |
| Webthinker (driven by WTR1-7B) | 3.30 | 3.00 | 2.75 | 2.50 | 2.89 | -- |
|
| 159 |
+
| Webthinker (driven by QwQ-32B) | 3.40 | 3.30 | 3.30 | 2.50 | 3.13 | -- |\
|
| 160 |
| OpenAI Deep Research (driven by GPT-4o) | 3.50 |**3.95** | 3.55 | 3.00 | **3.50** | -- |
|
| 161 |
| MiniCPM4-Survey | 3.45 | 3.70 | **3.85** | 3.00 | **3.50** | **68.73** |
|
| 162 |
| *w/o* RL | **3.55** | 3.35 | 3.30 | 2.25 | 3.11 | 50.24 |
|