tsinghua-ee
/

video-SALMONN-2

Video-Text-to-Text

text-generation

text-generation-inference

Model card Files Files and versions

Changli commited on Sep 28

Commit

4a13d74

·

verified ·

1 Parent(s): b91c40b

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -18,4 +18,12 @@ library_name: transformers
 # video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
-Official model release of [video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models](https://github.com/bytedance/video-SALMONN-2)

 # video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
+Official model release of [video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models](https://github.com/bytedance/video-SALMONN-2)
+[Github Link](https://github.com/bytedance/video-SALMONN-2)
+[Paper Link](https://arxiv.org/abs/2506.15220)
+## Results
+<img width="857" height="510" alt="image" src="https://github.com/user-attachments/assets/aca20b2e-1e68-4b44-a26b-03d5f070b213" />