BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks
Abstract
BioBench is an open ecology vision benchmark that addresses the limitations of ImageNet-1K accuracy for scientific imagery by evaluating models on a diverse set of ecological tasks and modalities.
ImageNet-1K linear-probe transfer accuracy remains the default proxy for visual representation quality, yet it no longer predicts performance on scientific imagery. Across 46 modern vision model checkpoints, ImageNet top-1 accuracy explains only 34% of variance on ecology tasks and mis-ranks 30% of models above 75% accuracy. We present BioBench, an open ecology vision benchmark that captures what ImageNet misses. BioBench unifies 9 publicly released, application-driven tasks, 4 taxonomic kingdoms, and 6 acquisition modalities (drone RGB, web video, micrographs, in-situ and specimen photos, camera-trap frames), totaling 3.1M images. A single Python API downloads data, fits lightweight classifiers to frozen backbones, and reports class-balanced macro-F1 (plus domain metrics for FishNet and FungiCLEF); ViT-L models evaluate in 6 hours on an A6000 GPU. BioBench provides new signal for computer vision in ecology and a template recipe for building reliable AI-for-science benchmarks in any domain. Code and predictions are available at https://github.com/samuelstevens/biobench and results at https://samuelstevens.me/biobench.
Community
Across 46 modern vision transformer checkpoints, ImageNet top-1 accuracy explains only 34% of variance on ecology tasks and mis-ranks 30% of models above 75% accuracy (emphasis mine)
Interactive results for all models and tasks are here: https://samuelstevens.me/biobench/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models (2025)
- Convolutional Neural Nets vs Vision Transformers: A SpaceNet Case Study with Balanced vs Imbalanced Regimes (2025)
- OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing (2025)
- FineVision: Open Data Is All You Need (2025)
- Unifying Vision-Language Latents for Zero-label Image Caption Enhancement (2025)
- Free-Grained Hierarchical Recognition (2025)
- Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper