Add model card for GUI-AIMA-3B

#1
by nielsr HF Staff - opened

This PR adds a comprehensive model card for the GUI-AIMA-3B model, as presented in the paper GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

It includes:

  • Essential metadata: license: cc-by-nc-4.0, pipeline_tag: image-text-to-text for discoverability, and library_name: transformers for automated code snippet generation, based on compatibility evidence.
  • Direct links to the paper, the project page, and the GitHub repository.
  • A concise summary of the model from the paper's abstract, alongside key architectural and result images from the GitHub README.
  • The detailed "Main Results" section, including associated images, to highlight the model's performance.
  • A "Sample Usage" code snippet, extracted from eval/example_inference.py in the official GitHub repository, to help users get started easily.
  • The relevant BibTeX citation.

This model card significantly improves the documentation and usability of the GUI-AIMA-3B model on the Hugging Face Hub.

Please review and merge this PR if everything looks good.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment