nielsr HF Staff commited on
Commit
bb12f6e
Β·
verified Β·
1 Parent(s): adfe3fb

Enhance model card: Add project page, abstract, and transformers sample usage

Browse files

This PR improves the model card by:
- Adding an explicit link to the Hugging Face collection project page.
- Including the full paper abstract for better context.
- Adding a `transformers` specific sample usage section, directly taken from the GitHub repository, to help users quickly get started with the model.
- Adding relevant tags: `llm-agent` and `summarization`.
- Renaming the existing 'Quick Start' section for clarity to 'Quick Start (MiniCPM4-Survey Agent)'.

Files changed (1) hide show
  1. README.md +55 -4
README.md CHANGED
@@ -5,6 +5,9 @@ language:
5
  library_name: transformers
6
  license: apache-2.0
7
  pipeline_tag: text-generation
 
 
 
8
  ---
9
 
10
  <div align="center">
@@ -12,9 +15,10 @@ pipeline_tag: text-generation
12
  </div>
13
 
14
  <p align="center">
15
- <a href="https://github.com/OpenBMB/MiniCPM/\" target="_blank">GitHub Repo</a> |
 
16
  <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a> |
17
- <a href="https://huggingface.co/papers/2506.07900" target="_blank">Paper</a>
18
  </p>
19
  <p align="center">
20
  πŸ‘‹ Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
@@ -22,6 +26,10 @@ pipeline_tag: text-generation
22
 
23
  This repository contains the model described in the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
24
 
 
 
 
 
25
  ## What's New
26
 
27
  * [2025-06-05] πŸš€πŸš€πŸš€ We have open-sourced **MiniCPM4-Survey**, a model built upon MiniCPM4-8B that is capable of generating trustworthy, long-form survey papers while maintaining competitive performance relative to significantly larger models.
@@ -54,7 +62,50 @@ Key features include:
54
  - **Multi-Step RL Training Strategy** β€” We propose a *Context Manager* to ensure retention of essential information while facilitating efficient reasoning, and we construct *Parallel Environment* to maintain efficient RL training cycles.
55
 
56
 
57
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  ### Download the model
60
 
@@ -105,7 +156,7 @@ Then you can visit `http://localhost:5173` in your browser to use the model.
105
  | Naive RAG (driven by G2FT) | 3.25 | 2.95 | 3.35 | 2.60 | 3.04 | 43.68 |
106
  | AutoSurvey (driven by G2FT) | 3.10 | 3.25 | 3.15 | **3.15**| 3.16 | 46.56 |
107
  | Webthinker (driven by WTR1-7B) | 3.30 | 3.00 | 2.75 | 2.50 | 2.89 | -- |
108
- | Webthinker (driven by QwQ-32B) | 3.40 | 3.30 | 3.30 | 2.50 | 3.13 | -- |
109
  | OpenAI Deep Research (driven by GPT-4o) | 3.50 |**3.95** | 3.55 | 3.00 | **3.50** | -- |
110
  | MiniCPM4-Survey | 3.45 | 3.70 | **3.85** | 3.00 | **3.50** | **68.73** |
111
  | &nbsp;&nbsp;&nbsp;*w/o* RL | **3.55** | 3.35 | 3.30 | 2.25 | 3.11 | 50.24 |
 
5
  library_name: transformers
6
  license: apache-2.0
7
  pipeline_tag: text-generation
8
+ tags:
9
+ - llm-agent
10
+ - summarization
11
  ---
12
 
13
  <div align="center">
 
15
  </div>
16
 
17
  <p align="center">
18
+ <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
19
+ <a href="https://huggingface.co/papers/2506.07900" target="_blank">Paper</a> |
20
  <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a> |
21
+ <a href="https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b" target="_blank">Project Page</a>
22
  </p>
23
  <p align="center">
24
  πŸ‘‹ Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
 
26
 
27
  This repository contains the model described in the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
28
 
29
+ ## Paper Abstract
30
+
31
+ This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose this http URL that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Furthermore, we construct a hybrid reasoning model, MiniCPM4.1, which can be used in both deep reasoning mode and non-reasoning mode. Evaluation results demonstrate that MiniCPM4 and MiniCPM4.1 outperform similar-sized open-source models across benchmarks, with the 8B variants showing significant speed improvements on long sequence understanding and generation.
32
+
33
  ## What's New
34
 
35
  * [2025-06-05] πŸš€πŸš€πŸš€ We have open-sourced **MiniCPM4-Survey**, a model built upon MiniCPM4-8B that is capable of generating trustworthy, long-form survey papers while maintaining competitive performance relative to significantly larger models.
 
62
  - **Multi-Step RL Training Strategy** β€” We propose a *Context Manager* to ensure retention of essential information while facilitating efficient reasoning, and we construct *Parallel Environment* to maintain efficient RL training cycles.
63
 
64
 
65
+ ## Sample Usage (Transformers)
66
+
67
+ The model is compatible with the `transformers` library. Here's how to use it for text generation:
68
+
69
+ ```python
70
+ from transformers import AutoModelForCausalLM, AutoTokenizer
71
+ import torch
72
+ torch.manual_seed(0)
73
+
74
+ path = 'openbmb/MiniCPM4-8B' # Note: This example uses MiniCPM4-8B, but can be adapted for MiniCPM4-Survey.
75
+ device = "cuda"
76
+ tokenizer = AutoTokenizer.from_pretrained(path)
77
+ model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
78
+
79
+ # User can directly use the chat interface
80
+ # responds, history = model.chat(tokenizer, "Write an article about Artificial Intelligence.", temperature=0.7, top_p=0.7)
81
+ # print(responds)
82
+
83
+ # User can also use the generate interface
84
+ messages = [
85
+ {"role": "user", "content": "Write an article about Artificial Intelligence."},
86
+ ]
87
+ prompt_text = tokenizer.apply_chat_template(
88
+ messages,
89
+ tokenize=False,
90
+ add_generation_prompt=True,
91
+ )
92
+ model_inputs = tokenizer([prompt_text], return_tensors="pt").to(device)
93
+
94
+ model_outputs = model.generate(
95
+ **model_inputs,
96
+ max_new_tokens=1024,
97
+ top_p=0.7,
98
+ temperature=0.7
99
+ )
100
+ output_token_ids = [
101
+ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs['input_ids']))
102
+ ]
103
+
104
+ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
105
+ print(responses)
106
+ ```
107
+
108
+ ## Quick Start (MiniCPM4-Survey Agent)
109
 
110
  ### Download the model
111
 
 
156
  | Naive RAG (driven by G2FT) | 3.25 | 2.95 | 3.35 | 2.60 | 3.04 | 43.68 |
157
  | AutoSurvey (driven by G2FT) | 3.10 | 3.25 | 3.15 | **3.15**| 3.16 | 46.56 |
158
  | Webthinker (driven by WTR1-7B) | 3.30 | 3.00 | 2.75 | 2.50 | 2.89 | -- |
159
+ | Webthinker (driven by QwQ-32B) | 3.40 | 3.30 | 3.30 | 2.50 | 3.13 | -- |\
160
  | OpenAI Deep Research (driven by GPT-4o) | 3.50 |**3.95** | 3.55 | 3.00 | **3.50** | -- |
161
  | MiniCPM4-Survey | 3.45 | 3.70 | **3.85** | 3.00 | **3.50** | **68.73** |
162
  | &nbsp;&nbsp;&nbsp;*w/o* RL | **3.55** | 3.35 | 3.30 | 2.25 | 3.11 | 50.24 |