- 
	
	
	
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper • 2310.19773 • Published • 20 - 
	
	
	
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Paper • 2310.05863 • Published • 2 - 
	
	
	
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 95 - 
	
	
	
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Paper • 2311.10126 • Published • 10 
Zach Mustafa PRO
Zmu
		AI & ML interests
None yet
		Recent Activity
						liked
								a Space
							
						4 days ago
						
					
						
						
						
						VetriVelRavi/ai-room-designer
						
						liked
								a Space
							
						8 days ago
						
					
						
						
						
						wcy1122/DreamOmni2-Edit
						
						liked
								a Space
							
						8 days ago
						
					
						
						
						
						prithivMLmods/Qwen3-VL-HF-Demo