Asankhaya Sharma's picture

In a Training Loop 🔄

Asankhaya Sharma

codelion

hugging-science

·

http://asankhaya.github.io/

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with ➕ about 8 hours ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

reacted to their post with 🤗 about 8 hours ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

reacted to their post with 🚀 about 8 hours ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

View all activity

Organizations

codelion 's datasets 38

codelion/synth-1B

Viewer • Updated Nov 11 • 822k • 151

codelion/synth-100M

Viewer • Updated Nov 11 • 100k • 39

codelion/synth-10M

Viewer • Updated Nov 11 • 13.3k • 76

codelion/finewiki-1B

Viewer • Updated Nov 2 • 52.7k • 206 • 2

codelion/finewiki-10M

Viewer • Updated Nov 2 • 4.91k • 1.73k • 2

codelion/finewiki-100M

Viewer • Updated Nov 2 • 68k • 63 • 2

codelion/fineweb-edu-1B

Viewer • Updated Nov 2 • 970k • 684 • 6

codelion/fineweb-edu-100M

Viewer • Updated Nov 2 • 115k • 267 • 3

codelion/fineweb-edu-10M

Viewer • Updated Nov 2 • 9.46k • 339 • 2

codelion/dclm-baseline-1B

Viewer • Updated Nov 2 • 774k • 381 • 4

codelion/dclm-baseline-100M

Viewer • Updated Nov 2 • 77.2k • 58 • 2

codelion/dclm-baseline-10M

Viewer • Updated Nov 2 • 7.95k • 138 • 2

codelion/finepdfs-1B

Viewer • Updated Nov 2 • 186k • 673 • 3

codelion/finepdfs-100M

Viewer • Updated Nov 2 • 18.6k • 37 • 2

codelion/finepdfs-10M

Viewer • Updated Nov 2 • 7.54k • 145 • 2

codelion/execution-world-model-dataset

Viewer • Updated Oct 14 • 621 • 33

codelion/SimpleQA-Verified

Viewer • Updated Sep 11 • 1k • 249 • 1

codelion/ifeval-high-quality-dpo

Viewer • Updated Sep 9 • 501 • 58

codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference

Viewer • Updated Aug 2 • 245 • 25

codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context

Viewer • Updated Jul 20 • 400 • 38

codelion/Llama-3.2-1B-Instruct-magpie-tool-calling

Viewer • Updated Jul 18 • 1.2k • 42 • 1

codelion/Qwen3-0.6B-icm-dpo-pairs

Viewer • Updated Jul 18 • 122 • 37

codelion/Qwen3-0.6B-icm

Viewer • Updated Jul 18 • 500 • 67 • 1

codelion/gemma-3-1b-it-magpie-reasoning

Viewer • Updated Jul 18 • 131 • 43 • 2

codelion/Qwen3-0.6B-magpie

Viewer • Updated Jul 12 • 735 • 43 • 1

codelion/Qwen3-0.6B-pts-thought-anchors

Viewer • Updated Jul 10 • 148 • 51 • 2

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

Viewer • Updated Jul 10 • 110 • 37 • 2

codelion/Qwen3-0.6B-pts-dpo-pairs

Viewer • Updated May 19 • 681 • 38 • 2

codelion/Qwen3-0.6B-pts-steering-vectors

Viewer • Updated May 19 • 1.38k • 63 • 4

codelion/Qwen3-0.6B-pts

Viewer • Updated May 19 • 1.38k • 55 • 2