Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
124
264
Asankhaya Sharma
codelion
Follow
pangxiaoxiao's profile picture
Atheed7654's profile picture
FM-1976's profile picture
388 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with ➕
about 8 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
reacted
to
their
post
with 🤗
about 8 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
reacted
to
their
post
with 🚀
about 8 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
View all activity
Organizations
codelion
's datasets
38
Sort: Recently updated
codelion/synth-1B
Viewer
•
Updated
Nov 11
•
822k
•
151
codelion/synth-100M
Viewer
•
Updated
Nov 11
•
100k
•
39
codelion/synth-10M
Viewer
•
Updated
Nov 11
•
13.3k
•
76
codelion/finewiki-1B
Viewer
•
Updated
Nov 2
•
52.7k
•
206
•
2
codelion/finewiki-10M
Viewer
•
Updated
Nov 2
•
4.91k
•
1.73k
•
2
codelion/finewiki-100M
Viewer
•
Updated
Nov 2
•
68k
•
63
•
2
codelion/fineweb-edu-1B
Viewer
•
Updated
Nov 2
•
970k
•
684
•
6
codelion/fineweb-edu-100M
Viewer
•
Updated
Nov 2
•
115k
•
267
•
3
codelion/fineweb-edu-10M
Viewer
•
Updated
Nov 2
•
9.46k
•
339
•
2
codelion/dclm-baseline-1B
Viewer
•
Updated
Nov 2
•
774k
•
381
•
4
codelion/dclm-baseline-100M
Viewer
•
Updated
Nov 2
•
77.2k
•
58
•
2
codelion/dclm-baseline-10M
Viewer
•
Updated
Nov 2
•
7.95k
•
138
•
2
codelion/finepdfs-1B
Viewer
•
Updated
Nov 2
•
186k
•
673
•
3
codelion/finepdfs-100M
Viewer
•
Updated
Nov 2
•
18.6k
•
37
•
2
codelion/finepdfs-10M
Viewer
•
Updated
Nov 2
•
7.54k
•
145
•
2
codelion/execution-world-model-dataset
Viewer
•
Updated
Oct 14
•
621
•
33
codelion/SimpleQA-Verified
Viewer
•
Updated
Sep 11
•
1k
•
249
•
1
codelion/ifeval-high-quality-dpo
Viewer
•
Updated
Sep 9
•
501
•
58
codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference
Viewer
•
Updated
Aug 2
•
245
•
25
codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
Viewer
•
Updated
Jul 20
•
400
•
38
codelion/Llama-3.2-1B-Instruct-magpie-tool-calling
Viewer
•
Updated
Jul 18
•
1.2k
•
42
•
1
codelion/Qwen3-0.6B-icm-dpo-pairs
Viewer
•
Updated
Jul 18
•
122
•
37
codelion/Qwen3-0.6B-icm
Viewer
•
Updated
Jul 18
•
500
•
67
•
1
codelion/gemma-3-1b-it-magpie-reasoning
Viewer
•
Updated
Jul 18
•
131
•
43
•
2
codelion/Qwen3-0.6B-magpie
Viewer
•
Updated
Jul 12
•
735
•
43
•
1
codelion/Qwen3-0.6B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
148
•
51
•
2
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
110
•
37
•
2
codelion/Qwen3-0.6B-pts-dpo-pairs
Viewer
•
Updated
May 19
•
681
•
38
•
2
codelion/Qwen3-0.6B-pts-steering-vectors
Viewer
•
Updated
May 19
•
1.38k
•
63
•
4
codelion/Qwen3-0.6B-pts
Viewer
•
Updated
May 19
•
1.38k
•
55
•
2
Previous
1
2
Next