AI & ML interests
None defined yet.
Recent Activity
View all activity
nroggendorffย
posted
an
update
about 11 hours ago
GeorgeBredisย
authored
a
paper
5 days ago
Post
381
The #1 trending AI/ML dataset today ๐
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Post
304
The new King ๐has arrived!
Moonshot AI now the top model on Hugging Face ๐ฅ
moonshotai/Kimi-K2-Thinking
Moonshot AI now the top model on Hugging Face ๐ฅ
moonshotai/Kimi-K2-Thinking
Post
2533
๐ธ๐คYou donโt need 100 GPUs to train something amazing!
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on ๐ค :
HuggingFaceTB/smol-training-playbook
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on ๐ค :
HuggingFaceTB/smol-training-playbook
DmitryRyuminย
posted
an
update
8 days ago
Post
1131
๐๐๏ธ๐ New Research Alert - ICCV 2025 (Poster)! ๐๐๏ธ๐
๐ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation ๐
๐ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.
๐ฅ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)
๐ Repository: https://github.com/Jo-wang/TCA
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
๐ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation ๐
๐ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.
๐ฅ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)
๐ Repository: https://github.com/Jo-wang/TCA
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย
posted
an
update
11 days ago
Post
2306
๐๐๏ธ๐ New Research Alert - ICCV 2025 (Oral)! ๐๐๏ธ๐
๐ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching ๐
๐ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.
๐ฅ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)
๐ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
๐ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching ๐
๐ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.
๐ฅ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)
๐ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
Build Failure (apt-get permission denied) in Dev Mode on Gradio + ZeroGpu Space
4
#26 opened 13 days ago
by
Archime
DmitryRyuminย
posted
an
update
14 days ago
Post
2805
๐๐๐ New Research Alert - ICCV 2025 (Oral)! ๐๐ค๐
๐ Title: Understanding Co-speech Gestures in-the-wild ๐
๐ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.
๐ฅ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)
๐ Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
๐ Repository: https://github.com/Sindhu-Hegde/jegal
๐บ Video: https://www.youtube.com/watch?v=TYFOLKfM-rM
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
๐ Title: Understanding Co-speech Gestures in-the-wild ๐
๐ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.
๐ฅ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)
๐ Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
๐ Repository: https://github.com/Sindhu-Hegde/jegal
๐บ Video: https://www.youtube.com/watch?v=TYFOLKfM-rM
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย
posted
an
update
18 days ago
Post
3919
๐๐ก๐ New Research Alert - ICCV 2025 (Oral)! ๐๐ช๐
๐ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models ๐
๐ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.
๐ฅ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)
๐ Github Page: https://andrehuang.github.io/loftup-site
๐ Repository: https://github.com/andrehuang/loftup
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
๐ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models ๐
๐ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.
๐ฅ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)
๐ Github Page: https://andrehuang.github.io/loftup-site
๐ Repository: https://github.com/andrehuang/loftup
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
mrfakenameย
posted
an
update
18 days ago
Post
3374
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: mrfakename/EmoAct-MiMo
(Turn ๐ on to hear audio samples)
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: mrfakename/EmoAct-MiMo
(Turn ๐ on to hear audio samples)
DmitryRyuminย
posted
an
update
19 days ago
Post
1924
๐๐ท๏ธ๐ New Research Alert - ICCV 2025 (Oral)! ๐๐งฉ๐
๐ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening ๐
๐ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.
๐ฅ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)
๐บ Video: https://www.youtube.com/watch?v=kAyK_3wskgA
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
๐ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening ๐
๐ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.
๐ฅ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)
๐บ Video: https://www.youtube.com/watch?v=kAyK_3wskgA
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
DmitryRyuminย
authored
a
paper
19 days ago
DmitryRyuminย
posted
an
update
20 days ago
Post
4780
๐๐ค๐ New Research Alert - ICCV 2025 (Oral)! ๐๐ค๐
๐ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks ๐
๐ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.
๐ฅ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
๐ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks ๐
๐ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.
๐ฅ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
nroggendorffย
posted
an
update
21 days ago
DmitryRyuminย
posted
an
update
21 days ago
Post
2972
๐๐๏ธ๐ New Research Alert - ICCV 2025 (Oral)! ๐๐๏ธ๐
๐ Title: Token Activation Map to Visually Explain Multimodal LLMs ๐
๐ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.
๐ฅ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)
๐ Repository: https://github.com/xmed-lab/TAM
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
๐ Title: Token Activation Map to Visually Explain Multimodal LLMs ๐
๐ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.
๐ฅ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li
๐ Conference: ICCV, 19 โ 23 Oct, 2025 | Honolulu, Hawai'i, USA ๐บ๐ธ
๐ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)
๐ Repository: https://github.com/xmed-lab/TAM
๐ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers
๐ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
Post
5461
deepseek-ai/DeepSeek-OCR is out! ๐ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
nroggendorffย
posted
an
update
28 days ago
Post
3327
I love getting emails telling me when there's somebody else's active access token in one of my commit SHAs. HF should really only tell you if it is your token, otherwise I could just make a dataset with a bunch of random strings and wait for a valid token.
Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.
user,permission,token
nroggendorff,write,hf_...
pepper13,finegrained,hf_...
...,...,...
...Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.
Post
571
Tokenization is one of the most important processes in AI - yet many would like to kill it ๐
What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.
โก๏ธ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!
Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.
๐คจ Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.
Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.
But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.
โก๏ธ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!
Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.
๐คจ Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.
Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.
But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
Post
328
New Technique to Deeply Poison AI on Images and Prove Creative Provenance
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.ย
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.ย