44 21 20

Leshem Choshen

borgr

https://ktilana.wixsite.com/leshem-choshen

AI & ML interests

Merging models, collaboratively improving pretraining, evaluation, understanding

Recent Activity

commented on a paper 18 days ago

Learning from Naturally Occurring Feedback

upvoted a collection about 1 month ago

BabyBabelLM

liked a model about 1 month ago

ibm-granite/granite-4.0-micro-base

View all activity

Organizations

upvoted a collection about 1 month ago

BabyBabelLM

Collection

A multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. • 45 items • Updated 19 days ago • 7

upvoted a paper 3 months ago

Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy

Paper • 2508.07485 • Published Aug 10 • 10

upvoted an article 3 months ago

Article

The AI Evaluation Chart Crisis

Aug 12

•

upvoted 3 papers 7 months ago

upvoted a paper 8 months ago

Scaling Analysis of Interleaved Speech-Text Language Models

Paper • 2504.02398 • Published Apr 3 • 31

upvoted an article 8 months ago

Article

FeeL: Making Multilingual LMs Better, One Feedback Loop at a Time

Mar 25

•

upvoted a paper 8 months ago

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

upvoted a paper 9 months ago

Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights

Paper • 2502.09619 • Published Feb 13 • 35

upvoted a collection 10 months ago

Dicta-LM 2.0 Collection

Collection

9 items • Updated Apr 27, 2024 • 19

upvoted a paper 12 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 21

upvoted 6 papers about 1 year ago

LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

Paper • 2410.10783 • Published Oct 14, 2024 • 27

SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification

Paper • 2410.05057 • Published Oct 7, 2024 • 7

Acceptable Use Policies for Foundation Models

Paper • 2409.09041 • Published Aug 29, 2024 • 1

The Future of Open Human Feedback

Paper • 2408.16961 • Published Aug 15, 2024 • 22

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

Paper • 2408.08291 • Published Aug 15, 2024 • 11

Learning from Naturally Occurring Feedback

Paper • 2407.10944 • Published Jul 15, 2024 • 4

upvoted 2 papers over 1 year ago

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Paper • 2407.13696 • Published Jul 18, 2024 • 5

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Paper • 2407.00402 • Published Jun 29, 2024 • 23