Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
98.0
TFLOPS
71
120
242
Asankhaya Sharma
codelion
Follow
denver765's profile picture
QamarHajar's profile picture
chenglong92's profile picture
334 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
liked
a model
about 10 hours ago
patched-codes/Llama-3.2-1B-FixVulns
liked
a dataset
about 11 hours ago
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors
reacted
to
their
post
with 🤗
1 day ago
The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix We trained a GPT-2 model to 90%+ performance using just 1/10th the training data through 50+ systematic experiments on dataset mixing strategies. Key Finding: A static mix of 50% finePDFs + 30% DCLM-baseline + 20% FineWeb-Edu consistently outperforms complex curriculum learning approaches. Static mixing is simpler, faster, and avoids catastrophic failures from hard distribution shifts. Results: Our GPT-2-70M model (70M parameters, 1B tokens) scores 38.15% on benchmarks vs GPT-2's 39.13% - only 0.98 points behind despite 10x less data and 44% fewer parameters. It even beats GPT-2 on TruthfulQA (47.31% vs 40.69%). The takeaway: careful dataset curation matters more than total data volume. Model: https://huggingface.co/codelion/gpt-2-70m Datasets: https://huggingface.co/collections/codelion/pre-training-dataset-samples Full blog: https://huggingface.co/blog/codelion/optimal-dataset-mixing
View all activity
Organizations
codelion
's datasets
35
Sort: Recently updated
codelion/finewiki-1B
Viewer
•
Updated
2 days ago
•
52.7k
•
33
codelion/finewiki-10M
Viewer
•
Updated
2 days ago
•
4.91k
•
22
codelion/finewiki-100M
Viewer
•
Updated
2 days ago
•
68k
•
20
codelion/fineweb-edu-1B
Viewer
•
Updated
2 days ago
•
970k
•
94
codelion/fineweb-edu-100M
Viewer
•
Updated
2 days ago
•
115k
•
31
codelion/fineweb-edu-10M
Viewer
•
Updated
2 days ago
•
9.46k
•
18
codelion/dclm-baseline-1B
Viewer
•
Updated
2 days ago
•
774k
•
108
codelion/dclm-baseline-100M
Viewer
•
Updated
2 days ago
•
77.2k
•
31
codelion/dclm-baseline-10M
Viewer
•
Updated
2 days ago
•
7.95k
•
13
codelion/finepdfs-1B
Viewer
•
Updated
2 days ago
•
186k
•
147
codelion/finepdfs-100M
Viewer
•
Updated
2 days ago
•
18.6k
•
29
•
2
codelion/finepdfs-10M
Viewer
•
Updated
2 days ago
•
7.54k
•
13
codelion/execution-world-model-dataset
Viewer
•
Updated
22 days ago
•
621
•
59
codelion/SimpleQA-Verified
Viewer
•
Updated
Sep 11
•
1k
•
127
•
1
codelion/ifeval-high-quality-dpo
Viewer
•
Updated
Sep 9
•
501
•
14
codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference
Viewer
•
Updated
Aug 2
•
245
•
16
codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
Viewer
•
Updated
Jul 20
•
400
•
16
codelion/Llama-3.2-1B-Instruct-magpie-tool-calling
Viewer
•
Updated
Jul 18
•
1.2k
•
20
•
1
codelion/Qwen3-0.6B-icm-dpo-pairs
Viewer
•
Updated
Jul 18
•
122
•
39
codelion/Qwen3-0.6B-icm
Viewer
•
Updated
Jul 18
•
500
•
23
•
1
codelion/gemma-3-1b-it-magpie-reasoning
Viewer
•
Updated
Jul 18
•
131
•
27
•
2
codelion/Qwen3-0.6B-magpie
Viewer
•
Updated
Jul 12
•
735
•
63
•
1
codelion/Qwen3-0.6B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
148
•
35
•
2
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
110
•
35
•
2
codelion/Qwen3-0.6B-pts-dpo-pairs
Viewer
•
Updated
May 19
•
681
•
31
•
2
codelion/Qwen3-0.6B-pts-steering-vectors
Viewer
•
Updated
May 19
•
1.38k
•
38
•
4
codelion/Qwen3-0.6B-pts
Viewer
•
Updated
May 19
•
1.38k
•
12
•
2
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors
Preview
•
Updated
May 13
•
34
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts
Preview
•
Updated
May 13
•
18
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-dpo-pairs
Preview
•
Updated
May 13
•
12
•
1
Previous
1
2
Next