Ankit Shah's picture

Ankit Shah

ankits0052

·

https://ankitshah009.github.io/

AI & ML interests

Artificial Intelligence, Deep Learning, Machine Perception, Machine Learning, Audio Processing

Recent Activity

upvoted a paper 13 days ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

upvoted a paper 13 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

liked a model 21 days ago

zai-org/GLM-4.6-FP8

View all activity

Organizations

upvoted 2 papers 13 days ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123

upvoted a collection about 2 months ago

Cosmos-Predict2

World Foundation Model for Future Prediction • 13 items • Updated about 24 hours ago • 29

upvoted a paper 2 months ago

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28 • 63

upvoted an article 3 months ago

Article

Small Language Models (SLM): A Comprehensive Overview

By

•

Feb 22

• 99

upvoted a paper 3 months ago

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55

upvoted 3 collections 3 months ago

Physical AI

Collection of open, commercial-grade datasets for physical AI developers • 23 items • Updated about 24 hours ago • 89

AceReason

Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated about 24 hours ago • 18

Reward Models

Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated about 24 hours ago • 21

upvoted 2 papers 3 months ago

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Paper • 2310.07161 • Published Oct 11, 2023 • 1

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156

upvoted 3 collections 3 months ago

Qwen3

84 items • Updated Aug 6 • 1.39k

Qwen2.5-1M

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Jul 21 • 125

OpenReasoning-Nemotron

Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science. • 6 items • Updated about 24 hours ago • 44

upvoted 2 collections 5 months ago

Meta's Llama 3.3 models & evals

2 items • Updated Dec 13, 2024 • 83

Llama 4

Llama 4 release • 13 items • Updated Apr 29 • 653

upvoted an article 5 months ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 950

upvoted a collection 6 months ago

Cosmos

The collection of Cosmos models • 31 items • Updated about 24 hours ago • 298

upvoted an article 9 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.31k

upvoted a paper almost 2 years ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159