Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a model about 17 hours ago

nvidia/Nemotron-Orchestrator-8B

upvoted an article 1 day ago

How We Built a Semantic Highlight Model To Save Token Cost for RAG

upvoted a collection 2 days ago

View all activity

Organizations

upvoted an article 1 day ago

Article

How We Built a Semantic Highlight Model To Save Token Cost for RAG

3 days ago

•

42

upvoted a collection 2 days ago

TranslateGemma

3 items • Updated 2 days ago • 128

upvoted 2 papers 3 days ago

It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models

Paper • 2601.08500 • Published 4 days ago • 1

TranslateGemma Technical Report

Paper • 2601.09012 • Published 4 days ago • 16

upvoted an article 10 days ago

Article

NVIDIA brings agents to life with DGX Spark and Reachy Mini

+1

13 days ago

•

54

upvoted a paper 19 days ago

Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

Paper • 2512.22100 • Published 22 days ago • 2

upvoted an article 22 days ago

Article

The Optimal Architecture for Small Language Models

23 days ago

•

108

upvoted a paper 30 days ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 15

upvoted a paper about 1 month ago

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published Dec 15, 2025 • 14

upvoted an article about 1 month ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

576

upvoted a paper about 1 month ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 291

upvoted an article about 2 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

Dec 1, 2025

•

274

upvoted a changelog about 2 months ago

Changelog

Add a Status to your Hugging Face profile

Nov 28, 2025

• 98

upvoted a paper about 2 months ago

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Paper • 2511.21613 • Published Nov 26, 2025 • 2

upvoted a paper 2 months ago

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 95

upvoted an article 2 months ago

Article

Building for an Open Future - our new partnership with Google Cloud

Nov 13, 2025

•

46

upvoted a paper 2 months ago

Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements

Paper • 2511.05560 • Published Nov 4, 2025 • 1

upvoted a collection 2 months ago

Pre-training Dataset Samples

A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 19 items • Updated 24 days ago • 18

upvoted a paper 2 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 128

upvoted an article 2 months ago

Article

SYNTH: the new data frontier

Nov 10, 2025

•

7