Asankhaya Sharma's picture

Asankhaya Sharma

codelion

·

http://asankhaya.github.io/

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

liked a model about 10 hours ago

patched-codes/Llama-3.2-1B-FixVulns

liked a dataset about 11 hours ago

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

reacted to their post with 🤗 1 day ago

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix We trained a GPT-2 model to 90%+ performance using just 1/10th the training data through 50+ systematic experiments on dataset mixing strategies. Key Finding: A static mix of 50% finePDFs + 30% DCLM-baseline + 20% FineWeb-Edu consistently outperforms complex curriculum learning approaches. Static mixing is simpler, faster, and avoids catastrophic failures from hard distribution shifts. Results: Our GPT-2-70M model (70M parameters, 1B tokens) scores 38.15% on benchmarks vs GPT-2's 39.13% - only 0.98 points behind despite 10x less data and 44% fewer parameters. It even beats GPT-2 on TruthfulQA (47.31% vs 40.69%). The takeaway: careful dataset curation matters more than total data volume. Model: https://huggingface.co/codelion/gpt-2-70m Datasets: https://huggingface.co/collections/codelion/pre-training-dataset-samples Full blog: https://huggingface.co/blog/codelion/optimal-dataset-mixing

View all activity

Organizations

codelion 's datasets 35

codelion/finewiki-1B

Viewer • Updated 2 days ago • 52.7k • 33

codelion/finewiki-10M

Viewer • Updated 2 days ago • 4.91k • 22

codelion/finewiki-100M

Viewer • Updated 2 days ago • 68k • 20

codelion/fineweb-edu-1B

Viewer • Updated 2 days ago • 970k • 94

codelion/fineweb-edu-100M

Viewer • Updated 2 days ago • 115k • 31

codelion/fineweb-edu-10M

Viewer • Updated 2 days ago • 9.46k • 18

codelion/dclm-baseline-1B

Viewer • Updated 2 days ago • 774k • 108

codelion/dclm-baseline-100M

Viewer • Updated 2 days ago • 77.2k • 31

codelion/dclm-baseline-10M

Viewer • Updated 2 days ago • 7.95k • 13

codelion/finepdfs-1B

Viewer • Updated 2 days ago • 186k • 147

codelion/finepdfs-100M

Viewer • Updated 2 days ago • 18.6k • 29 • 2

codelion/finepdfs-10M

Viewer • Updated 2 days ago • 7.54k • 13

codelion/execution-world-model-dataset

Viewer • Updated 22 days ago • 621 • 59

codelion/SimpleQA-Verified

Viewer • Updated Sep 11 • 1k • 127 • 1

codelion/ifeval-high-quality-dpo

Viewer • Updated Sep 9 • 501 • 14

codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference

Viewer • Updated Aug 2 • 245 • 16

codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context

Viewer • Updated Jul 20 • 400 • 16

codelion/Llama-3.2-1B-Instruct-magpie-tool-calling

Viewer • Updated Jul 18 • 1.2k • 20 • 1

codelion/Qwen3-0.6B-icm-dpo-pairs

Viewer • Updated Jul 18 • 122 • 39

codelion/Qwen3-0.6B-icm

Viewer • Updated Jul 18 • 500 • 23 • 1

codelion/gemma-3-1b-it-magpie-reasoning

Viewer • Updated Jul 18 • 131 • 27 • 2

codelion/Qwen3-0.6B-magpie

Viewer • Updated Jul 12 • 735 • 63 • 1

codelion/Qwen3-0.6B-pts-thought-anchors

Viewer • Updated Jul 10 • 148 • 35 • 2

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

Viewer • Updated Jul 10 • 110 • 35 • 2

codelion/Qwen3-0.6B-pts-dpo-pairs

Viewer • Updated May 19 • 681 • 31 • 2

codelion/Qwen3-0.6B-pts-steering-vectors

Viewer • Updated May 19 • 1.38k • 38 • 4

codelion/Qwen3-0.6B-pts

Viewer • Updated May 19 • 1.38k • 12 • 2

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors

Preview • Updated May 13 • 34 • 1

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Preview • Updated May 13 • 18 • 1

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-dpo-pairs

Preview • Updated May 13 • 12 • 1