AI & ML interests

None defined yet.

Recent Activity

nroggendorffย 
posted an update about 11 hours ago
view post
Post
50
Developing with ZeroGPU without a PRO account is painful. They give you so many requests at once, but then have like a 24 hour cooldown. I vote less requests in a batch, but then a shorter cooldown.


or just less of a cooldown, but i understand if that is not allowed
  • 3 replies
ยท
lunarfluย 
posted an update 7 days ago
lunarfluย 
posted an update 7 days ago
view post
Post
304
The new King ๐Ÿ‘‘has arrived!

Moonshot AI now the top model on Hugging Face ๐Ÿ”ฅ
moonshotai/Kimi-K2-Thinking
lunarfluย 
posted an update 7 days ago
view post
Post
2533
๐Ÿ’ธ๐Ÿค‘You donโ€™t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on ๐Ÿค— :
HuggingFaceTB/smol-training-playbook
DmitryRyuminย 
posted an update 8 days ago
view post
Post
1131
๐Ÿš€๐Ÿ‘๏ธ๐ŸŒŸ New Research Alert - ICCV 2025 (Poster)! ๐ŸŒŸ๐Ÿ‘๏ธ๐Ÿš€
๐Ÿ“„ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation ๐Ÿ”

๐Ÿ“ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

๐Ÿ‘ฅ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

๐Ÿ“ Repository: https://github.com/Jo-wang/TCA

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย 
posted an update 11 days ago
view post
Post
2306
๐Ÿš€๐Ÿ‘๏ธ๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐Ÿ‘๏ธ๐Ÿš€
๐Ÿ“„ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching ๐Ÿ”

๐Ÿ“ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

๐Ÿ‘ฅ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

๐Ÿ“ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย 
posted an update 14 days ago
view post
Post
2805
๐Ÿš€๐Ÿ‘Œ๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐ŸคŒ๐Ÿš€
๐Ÿ“„ Title: Understanding Co-speech Gestures in-the-wild ๐Ÿ”

๐Ÿ“ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

๐Ÿ‘ฅ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

๐ŸŒ Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
๐Ÿ“ Repository: https://github.com/Sindhu-Hegde/jegal
๐Ÿ“บ Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย 
posted an update 18 days ago
view post
Post
3919
๐Ÿš€๐Ÿ’ก๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐Ÿช„๐Ÿš€
๐Ÿ“„ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models ๐Ÿ”

๐Ÿ“ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

๐Ÿ‘ฅ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

๐ŸŒ Github Page: https://andrehuang.github.io/loftup-site
๐Ÿ“ Repository: https://github.com/andrehuang/loftup

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
mrfakenameย 
posted an update 18 days ago
view post
Post
3374
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn ๐Ÿ”Š on to hear audio samples)
ยท
DmitryRyuminย 
posted an update 19 days ago
view post
Post
1924
๐Ÿš€๐Ÿท๏ธ๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐Ÿงฉ๐Ÿš€
๐Ÿ“„ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening ๐Ÿ”

๐Ÿ“ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

๐Ÿ‘ฅ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

๐Ÿ“บ Video: https://www.youtube.com/watch?v=kAyK_3wskgA

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
ยท
DmitryRyuminย 
posted an update 20 days ago
view post
Post
4780
๐Ÿš€๐Ÿค–๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐Ÿค–๐Ÿš€
๐Ÿ“„ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks ๐Ÿ”

๐Ÿ“ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

๐Ÿ‘ฅ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
nroggendorffย 
posted an update 21 days ago
view post
Post
3694
Is it hot in here, or is it just me?
ยท
DmitryRyuminย 
posted an update 21 days ago
view post
Post
2972
๐Ÿš€๐Ÿ‘๏ธ๐ŸŒŸ New Research Alert - ICCV 2025 (Oral)! ๐ŸŒŸ๐Ÿ‘๏ธ๐Ÿš€
๐Ÿ“„ Title: Token Activation Map to Visually Explain Multimodal LLMs ๐Ÿ”

๐Ÿ“ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

๐Ÿ‘ฅ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

๐Ÿ“… Conference: ICCV, 19 โ€“ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

๐Ÿ“ Repository: https://github.com/xmed-lab/TAM

๐Ÿš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

๐Ÿš€ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿ” Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 2 replies
ยท
merveย 
posted an update 26 days ago
view post
Post
5461
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
ยท
nroggendorffย 
posted an update 28 days ago
view post
Post
3327
I love getting emails telling me when there's somebody else's active access token in one of my commit SHAs. HF should really only tell you if it is your token, otherwise I could just make a dataset with a bunch of random strings and wait for a valid token.
user,permission,token
nroggendorff,write,hf_...
pepper13,finegrained,hf_...
...,...,...
...

Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.
  • 2 replies
ยท
m-ricย 
posted an update about 1 month ago
view post
Post
571
Tokenization is one of the most important processes in AI - yet many would like to kill it ๐Ÿ’€

What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.

โžก๏ธ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!

Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.

๐Ÿคจ Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.

Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.

But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
Severianย 
posted an update about 1 month ago
view post
Post
328
New Technique to Deeply Poison AI on Images and Prove Creative Provenance

I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.

Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.

You can see examples and learn more about how and WHY it works better than current methods:

https://severian-poisonous-shield-for-images.static.hf.space

If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.

This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.ย