2 41 2

Akshay Nuthanapati

a0308

AI & ML interests

Neural Networks, Large Language Models

Recent Activity

upvoted an article 1 day ago

Deriving the PPO Loss from First Principles

new activity 15 days ago

huggingface/InferenceSupport:haykgrigorian/v2mini-eval1

upvoted an article about 1 month ago

Exploring Quantization Backends in Diffusers

View all activity

Organizations

None yet

upvoted an article 1 day ago

Article

Deriving the PPO Loss from First Principles

2 days ago

•

upvoted 3 articles about 1 month ago

Article

Exploring Quantization Backends in Diffusers

May 21

•

Article

Diffusers welcomes FLUX-2

Nov 25

•

165

Article

Continuous batching from first principles

Nov 25

•

286

upvoted an article about 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

•

436

upvoted 5 articles 2 months ago

Article

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

Feb 11

•

Article

KV Cache from scratch in nanoVLM

Jun 4

•

106

Article

Proximal Policy Optimization (PPO)

Aug 5, 2022

•

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30

•

202

Article

Get your VLM running in 3 simple steps on Intel CPUs

Oct 15

•

upvoted a paper 2 months ago

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 97

upvoted an article 2 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

•

479

upvoted an article 3 months ago

Article

Vision Language Models (Better, faster, stronger)

May 12

•

573

upvoted a paper 3 months ago

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29 • 34

upvoted 6 articles 3 months ago

Article

Introducing Würstchen: Fast Diffusion for Image Generation

Sep 13, 2023

•

Article

How 🤗 Accelerate runs very large models thanks to PyTorch

Sep 27, 2022

•

Article

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Oct 21, 2022

•

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7

•

263

Article

There is no such thing as a tokenizer-free lunch

Sep 25

•

Akshay Nuthanapati

AI & ML interests

Recent Activity

Organizations

a0308's activity

Deriving the PPO Loss from First Principles

Exploring Quantization Backends in Diffusers

Diffusers welcomes FLUX-2

Continuous batching from first principles

SmolLM - blazingly fast and remarkably powerful

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

KV Cache from scratch in nanoVLM

Proximal Policy Optimization (PPO)

KV Caching Explained: Optimizing Transformer Inference Efficiency

Get your VLM running in 3 simple steps on Intel CPUs

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Vision Language Models (Better, faster, stronger)

Introducing Würstchen: Fast Diffusion for Image Generation

How 🤗 Accelerate runs very large models thanks to PyTorch

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Efficient LLM Pretraining: Packed Sequences and Masked Attention

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

There is no such thing as a tokenizer-free lunch