allenai
/

Flex-news-2x7B-1T

Text Generation

Mixture of Experts

Model card Files Files and versions

akshitab commited on Jul 9

Commit

251f986

·

verified ·

1 Parent(s): 192ea37

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ library_name: transformers
 > FlexOlmo-7x7B-1T (without router training) is a Mixture-of-Experts with 33B total parameters, combining independently trained experts on public-mix, news, math, code, academic texts, creative writing, and Reddit data. The public-mix expert is trained on 1T tokens of public data while the other experts are branched from the public-mix expert and trained on 50B tokens of their respective data.
 This information and more can also be found:
-- **Paper**: https://allenai.org/papers/FlexOlmo
 - **Code**: https://github.com/allenai/FlexOlmo
 - **Blog**: https://allenai.org/blog/flexolmo
 - **Data and corresponding models**:
@@ -72,6 +72,6 @@ print(tokenizer.decode(out[0]))
       eprint={2507.00000},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://allenai.org/papers/FlexOlmo},
 }
 ```

 > FlexOlmo-7x7B-1T (without router training) is a Mixture-of-Experts with 33B total parameters, combining independently trained experts on public-mix, news, math, code, academic texts, creative writing, and Reddit data. The public-mix expert is trained on 1T tokens of public data while the other experts are branched from the public-mix expert and trained on 50B tokens of their respective data.
 This information and more can also be found:
+- **Paper**: https://allenai.org/papers/flexolmo
 - **Code**: https://github.com/allenai/FlexOlmo
 - **Blog**: https://allenai.org/blog/flexolmo
 - **Data and corresponding models**:
       eprint={2507.00000},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://allenai.org/papers/flexolmo},
 }
 ```