zzfive
			's Collections
			 
		
			
		datasets
		
	updated
			
 
				
				
	
	
	
			
			MS MARCO Web Search: a Large-scale Information-rich Web Dataset with
  Millions of Real Click Labels
		
			Paper
			
•
			2405.07526
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Automatic Data Curation for Self-Supervised Learning: A Clustering-Based
  Approach
		
			Paper
			
•
			2405.15613
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			A Touch, Vision, and Language Dataset for Multimodal Alignment
		
			Paper
			
•
			2402.13232
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			How Do Large Language Models Acquire Factual Knowledge During
  Pretraining?
		
			Paper
			
•
			2406.11813
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			DataComp-LM: In search of the next generation of training sets for
  language models
		
			Paper
			
•
			2406.11794
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
  Instruction-Tuning Dataset for LVLMs
		
			Paper
			
•
			2406.11833
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			From Pixels to Prose: A Large Dataset of Dense Image Captions
		
			Paper
			
•
			2406.10328
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
  Dataset with One Trillion Tokens
		
			Paper
			
•
			2406.11271
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			StableSemantics: A Synthetic Language-Vision Dataset of Semantic
  Representations in Naturalistic Images
		
			Paper
			
•
			2406.13735
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			Stylebreeder: Exploring and Democratizing Artistic Styles through
  Text-to-Image Models
		
			Paper
			
•
			2406.14599
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Scaling Synthetic Data Creation with 1,000,000,000 Personas
		
			Paper
			
•
			2406.20094
			
•
			Published
				
			•
				
				104
			
 
	
	 
	
	
	
			
			Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity
		
			Paper
			
•
			2406.17720
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video
  Generation
		
			Paper
			
•
			2407.02371
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			TabReD: A Benchmark of Tabular Machine Learning in-the-Wild
		
			Paper
			
•
			2406.19380
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Stark: Social Long-Term Multi-Modal Conversation with Persona
  Commonsense Knowledge
		
			Paper
			
•
			2407.03958
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
		
			Paper
			
•
			2407.06358
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
		
			Paper
			
•
			2407.10957
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language
  Parallel Corpus
		
			Paper
			
•
			2407.11144
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Visual Text Generation in the Wild
		
			Paper
			
•
			2407.14138
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			VolDoGer: LLM-assisted Datasets for Domain Generalization in
  Vision-Language Tasks
		
			Paper
			
•
			2407.19795
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
  Modeling with LM Knowledge Distillation
		
			Paper
			
•
			2408.00205
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
		
			Paper
			
•
			2408.02629
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular
  Annotations for Medicine
		
			Paper
			
•
			2408.02900
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Diffusion Models as Data Mining Tools
		
			Paper
			
•
			2408.02752
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
		
			Paper
			
•
			2408.03900
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
		
			Paper
			
•
			2408.04594
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads
		
			Paper
			
•
			2407.18245
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			MovieSum: An Abstractive Summarization Dataset for Movie Screenplays
		
			Paper
			
•
			2408.06281
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic
  Mathematical Reasoning
		
			Paper
			
•
			2408.07089
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
		
			Paper
			
•
			2408.05366
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
		
			Paper
			
•
			2408.08441
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language
  Models for Trait Discovery from Biological Images
		
			Paper
			
•
			2408.16176
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			ClimDetect: A Benchmark Dataset for Climate Change Detection and
  Attribution
		
			Paper
			
•
			2408.15993
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Kvasir-VQA: A Text-Image Pair GI Tract Dataset
		
			Paper
			
•
			2409.01437
			
•
			Published
				
			•
				
				71
			
 
	
	 
	
	
	
			
			The MERIT Dataset: Modelling and Efficiently Rendering Interpretable
  Transcripts
		
			Paper
			
•
			2409.00447
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			HumanVid: Demystifying Training Data for Camera-controllable Human Image
  Animation
		
			Paper
			
•
			2407.17438
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
  Image-to-Video Generation
		
			Paper
			
•
			2411.04709
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Improving the detection of technical debt in Java source code with an
  enriched dataset
		
			Paper
			
•
			2411.05457
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			GitChameleon: Unmasking the Version-Switching Capabilities of Code
  Generation Models
		
			Paper
			
•
			2411.05830
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
		
			Paper
			
•
			2411.07461
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video
  Generation
		
			Paper
			
•
			2411.08380
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			RedPajama: an Open Dataset for Training Large Language Models
		
			Paper
			
•
			2411.12372
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained
  Video Reasoning via Core Frame Selection
		
			Paper
			
•
			2411.14794
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			VISTA: Enhancing Long-Duration and High-Resolution Video Understanding
  by Video Spatiotemporal Augmentation
		
			Paper
			
•
			2412.00927
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			VisOnlyQA: Large Vision Language Models Still Struggle with Visual
  Perception of Geometric Information
		
			Paper
			
•
			2412.00947
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Global MMLU: Understanding and Addressing Cultural and Linguistic Biases
  in Multilingual Evaluation
		
			Paper
			
•
			2412.03304
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based
  Image Editing
		
			Paper
			
•
			2412.04280
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at
  Scale
		
			Paper
			
•
			2412.05237
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			BigDocs: An Open and Permissively-Licensed Dataset for Training
  Multimodal Models on Document and Code Tasks
		
			Paper
			
•
			2412.04626
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			LAION-SG: An Enhanced Large-Scale Dataset for Training Complex
  Image-Text Models with Structural Annotations
		
			Paper
			
•
			2412.08580
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
		
			Paper
			
•
			2412.07147
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			VisionArena: 230K Real World User-VLM Conversations with Preference
  Labels
		
			Paper
			
•
			2412.08687
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for
  LLM Training
		
			Paper
			
•
			2501.08197
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating
  Large Language Models
		
			Paper
			
•
			2501.09653
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for
  Speech Generation
		
			Paper
			
•
			2501.15907
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale
  Synthetic Personas
		
			Paper
			
•
			2501.15427
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
  Post-Training
		
			Paper
			
•
			2501.18511
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for
  Fine-Grained Understanding and Generation
		
			Paper
			
•
			2502.02589
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Generating Multi-Image Synthetic Data for Text-to-Image Customization
		
			Paper
			
•
			2502.01720
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Expect the Unexpected: FailSafe Long Context QA for Finance
		
			Paper
			
•
			2502.06329
			
•
			Published
				
			•
				
				132
			
 
	
	 
	
	
	
			
			TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
		
			Paper
			
•
			2502.07870
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
		
			Paper
			
•
			2502.10391
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation
		
			Paper
			
•
			2502.13270
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			Audio-FLAN: A Preliminary Release
		
			Paper
			
•
			2502.16584
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video
  Generation
		
			Paper
			
•
			2503.01739
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Qilin: A Multimodal Information Retrieval Dataset with APP-level User
  Sessions
		
			Paper
			
•
			2503.00501
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
  Coding
		
			Paper
			
•
			2503.02951
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
  Vision-Language Dataset for Southeast Asia
		
			Paper
			
•
			2503.07920
			
•
			Published
				
			•
				
				101
			
 
	
	 
	
	
	
			
			CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic
  Audiovisual Narrative Processing
		
			Paper
			
•
			2503.06940
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal
  Consistent Video Generation
		
			Paper
			
•
			2503.06053
			
•
			Published
				
			•
				
				138
			
 
	
	 
	
	
	
			
			ELTEX: A Framework for Domain-Driven Synthetic Data Generation
		
			Paper
			
•
			2503.15055
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud
  Detection
		
			Paper
			
•
			2503.24115
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			LiveVQA: Live Visual Knowledge Seeking
		
			Paper
			
•
			2504.05288
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for
  Alignment with Human Values
		
			Paper
			
•
			2504.05535
			
•
			Published
				
			•
				
				44
			
 
	
	 
	
	
	
			
			DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and
  Verifiable Mathematical Dataset for Advancing Reasoning
		
			Paper
			
•
			2504.11456
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction
  Fine-Tuning
		
			Paper
			
•
			2504.09081
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
  Language Model Pre-training
		
			Paper
			
•
			2504.13161
			
•
			Published
				
			•
				
				93
			
 
	
	 
	
	
	
			
			MIG: Automatic Data Selection for Instruction Tuning by Maximizing
  Information Gain in Semantic Space
		
			Paper
			
•
			2504.13835
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient
  Training of Code LLMs
		
			Paper
			
•
			2504.14655
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM
  Pretraining
		
			Paper
			
•
			2504.16511
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Dynamic Camera Poses and Where to Find Them
		
			Paper
			
•
			2504.17788
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			R&B: Domain Regrouping and Data Mixture Balancing for Efficient
  Foundation Model Training
		
			Paper
			
•
			2505.00358
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image
  Generative Models
		
			Paper
			
•
			2505.22523
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Large Language Models for Data Synthesis
		
			Paper
			
•
			2505.14752
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
		
			Paper
			
•
			2506.02096
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
			
			One Missing Piece for Open-Source Reasoning Models: A Dataset to
  Mitigate Cold-Starting Short CoT LLMs in RL
		
			Paper
			
•
			2506.02338
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly
  Licensed Text
		
			Paper
			
•
			2506.05209
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning
  Vision Models from DataSeeds' Annotated Imagery
		
			Paper
			
•
			2506.05673
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large
  Language Models
		
			Paper
			
•
			2506.07463
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Sekai: A Video Dataset towards World Exploration
		
			Paper
			
•
			2506.15675
			
•
			Published
				
			•
				
				64
			
 
	
	 
	
	
	
			
			Phantom-Data : Towards a General Subject-Consistent Video Generation
  Dataset
		
			Paper
			
•
			2506.18851
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image
  Generation
		
			Paper
			
•
			2506.18095
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
  Dataset
		
			Paper
			
•
			2507.03483
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM
  Fine-Tuning Data from Unstructured Documents
		
			Paper
			
•
			2507.04009
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
		
			Paper
			
•
			2507.07095
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual
  Dyadic Interactive Human Generation
		
			Paper
			
•
			2507.09862
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges
  in Russian Speech Generative Models
		
			Paper
			
•
			2507.13563
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
			
			MegaScience: Pushing the Frontiers of Post-Training Datasets for Science
  Reasoning
		
			Paper
			
•
			2507.16812
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
		
			Paper
			
•
			2507.16746
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Multi-human Interactive Talking Dataset
		
			Paper
			
•
			2508.03050
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			VeriGUI: Verifiable Long-Chain GUI Dataset
		
			Paper
			
•
			2508.04026
			
•
			Published
				
			•
				
				158
			
 
	
	 
	
	
	
			
			FACTORY: A Challenging Human-Verified Prompt Set for Long-Form
  Factuality
		
			Paper
			
•
			2508.00109
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations
  and Sentences
		
			Paper
			
•
			2508.03542
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved
  Image Generation
		
			Paper
			
•
			2508.09987
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			A Survey of Scientific Large Language Models: From Data Foundations to
  Agent Frontiers
		
			Paper
			
•
			2508.21148
			
•
			Published
				
			•
				
				140
			
 
	
	 
	
	
	
			
			TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head
  Synthesis
		
			Paper
			
•
			2508.13618
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
  Real Instructions?
		
			Paper
			
•
			2509.04292
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Reverse-Engineered Reasoning for Open-Ended Generation
		
			Paper
			
•
			2509.06160
			
•
			Published
				
			•
				
				147
			
 
	
	 
	
	
	
			
			FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning
  Dataset and Comprehensive Benchmark
		
			Paper
			
•
			2509.09680
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
		
			Paper
			
•
			2509.09676
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
		
			Paper
			
•
			2509.12201
			
•
			Published
				
			•
				
				103
			
 
	
	 
	
	
	
			
			PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
		
			Paper
			
•
			2509.11362
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			MultiEdit: Advancing Instruction-based Image Editing on Diverse and
  Challenging Tasks
		
			Paper
			
•
			2509.14638
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Scaling Instruction-Based Video Editing with a High-Quality Synthetic
  Dataset
		
			Paper
			
•
			2510.15742
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
		
			Paper
			
•
			2510.19808
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			FineVision: Open Data Is All You Need
		
			Paper
			
•
			2510.17269
			
•
			Published
				
			•
				
				61