Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
jzr0065 's Collections
efficient LLM
LLM Leaderboard
MLLMs

MLLMs

updated Jul 24, 2024
Upvote
-

  • DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Paper • 2407.08303 • Published Jul 11, 2024 • 19

  • Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Paper • 2407.07053 • Published Jul 9, 2024 • 47

  • PaliGemma: A versatile 3B VLM for transfer

    Paper • 2407.07726 • Published Jul 10, 2024 • 72

  • LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

    Paper • 2407.07895 • Published Jul 10, 2024 • 42

  • EVLM: An Efficient Vision-Language Model for Visual Understanding

    Paper • 2407.14177 • Published Jul 19, 2024 • 45

  • INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

    Paper • 2407.16198 • Published Jul 23, 2024 • 13
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs