dev-mode-explorers (Dev Mode Explorers)

posted an update about 11 hours ago

Post

50

Developing with ZeroGPU without a PRO account is painful. They give you so many requests at once, but then have like a 24 hour cooldown. I vote less requests in a batch, but then a shorter cooldown.

or just less of a cooldown, but i understand if that is not allowed

3 replies

·

GeorgeBredis

authored a paper 5 days ago

ESSA: Evolutionary Strategies for Scalable Alignment

Paper • 2507.04453 • Published Jul 6

lunarflu

posted an update 7 days ago

Post

381

The #1 trending AI/ML dataset today 🏆

Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles

lunarflu

posted an update 7 days ago

Post

304

The new King 👑has arrived!

Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking

lunarflu

posted an update 7 days ago

Post

2533

💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook

DmitryRyumin

posted an update 8 days ago

Post

1131

🚀👁️🌟 New Research Alert - ICCV 2025 (Poster)! 🌟👁️🚀
📄 Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation 🔝

📝 Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

👥 Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

📁 Repository: https://github.com/Jo-wang/TCA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight

DmitryRyumin

posted an update 11 days ago

Post

2306

🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching 🔝

📝 Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

👥 Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

📁 Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight

kitiara666

in dev-mode-explorers/README 13 days ago

Build Failure (apt-get permission denied) in Dev Mode on Gradio + ZeroGpu Space

4

#26 opened 13 days ago by

Archime

DmitryRyumin

posted an update 14 days ago

Post

2805

🚀👌🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤌🚀
📄 Title: Understanding Co-speech Gestures in-the-wild 🔝

📝 Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

👥 Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
📁 Repository: https://github.com/Sindhu-Hegde/jegal
📺 Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight

DmitryRyumin

posted an update 18 days ago

Post

3919

🚀💡🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🪄🚀
📄 Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models 🔝

📝 Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

👥 Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
📁 Repository: https://github.com/andrehuang/loftup

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight

mrfakename

posted an update 18 days ago

Post

3374

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

4 replies

·

DmitryRyumin

posted an update 19 days ago

Post

1924

🚀🏷️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🧩🚀
📄 Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening 🔝

📝 Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

👥 Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

📺 Video: https://www.youtube.com/watch?v=kAyK_3wskgA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight

2 replies

·

DmitryRyumin

authored a paper 19 days ago

Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

Paper • 2507.02205 • Published Jul 2

DmitryRyumin

posted an update 20 days ago

Post

4780

🚀🤖🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤖🚀
📄 Title: Variance-based Pruning for Accelerating and Compressing Trained Networks 🔝

📝 Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

👥 Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight

nroggendorff

posted an update 21 days ago

Post

3694

Is it hot in here, or is it just me?

5 replies

·

DmitryRyumin

posted an update 21 days ago

Post

2972

🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Token Activation Map to Visually Explain Multimodal LLMs 🔝

📝 Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

👥 Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

📁 Repository: https://github.com/xmed-lab/TAM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight

2 replies

·

merve

posted an update 26 days ago

Post

5461

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

3 replies

·

nroggendorff

posted an update 28 days ago

Post

3327

I love getting emails telling me when there's somebody else's active access token in one of my commit SHAs. HF should really only tell you if it is your token, otherwise I could just make a dataset with a bunch of random strings and wait for a valid token.

user,permission,token
nroggendorff,write,hf_...
pepper13,finegrained,hf_...
...,...,...
...

Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.

2 replies

·

m-ric

posted an update about 1 month ago

Post

571

Tokenization is one of the most important processes in AI - yet many would like to kill it 💀

What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.

➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!

Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.

🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.

Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.

But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers

Severian

posted an update about 1 month ago

Post

328

New Technique to Deeply Poison AI on Images and Prove Creative Provenance

I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.

Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.

You can see examples and learn more about how and WHY it works better than current methods:

https://severian-poisonous-shield-for-images.static.hf.space

If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.

This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.

Dev Mode Explorers

AI & ML interests

Recent Activity

ESSA: Evolutionary Strategies for Scalable Alignment

Build Failure (apt-get permission denied) in Dev Mode on Gradio + ZeroGpu Space

Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

AI & ML interests

Recent Activity

Team members 145

dev-mode-explorers's activity

Build Failure (apt-get permission denied) in Dev Mode on Gradio + ZeroGpu Space