AI & ML interests

None defined yet.

multimodalartย 
posted an update about 1 month ago
view post
Post
3466
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
multimodalartย 
posted an update 5 months ago
view post
Post
17760
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it ๐Ÿ

I've built a live real time demo on Spaces ๐Ÿ“น๐Ÿ’จ

multimodalart/self-forcing
ยท
linoytsย 
posted an update 7 months ago
view post
Post
8658
FramePack is hands down one of the best OS releases in video generation ๐Ÿ™‡๐Ÿปโ€โ™€๏ธ๐Ÿคฏ
โœ… fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation ๐Ÿ‘‡https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape
  • 2 replies
ยท
linoytsย 
posted an update 7 months ago
multimodalartย 
posted an update over 1 year ago
radamesย 
posted an update over 1 year ago
view post
Post
7770
Thanks to @OzzyGT for pushing the new Anyline preprocessor to https://github.com/huggingface/controlnet_aux. Now you can use the TheMistoAI/MistoLine ControlNet with Diffusers completely.

Here's a demo for you: radames/MistoLine-ControlNet-demo
Super resolution version: radames/Enhance-This-HiDiffusion-SDXL

from controlnet_aux import AnylineDetector

anyline = AnylineDetector.from_pretrained(
    "TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline"
).to("cuda")

source = Image.open("source.png")
result = anyline(source, detect_resolution=1280)
radamesย 
posted an update over 1 year ago
view post
Post
7047
At Google I/O 2024, we're collaborating with the Google Visual Blocks team (https://visualblocks.withgoogle.com) to release custom Hugging Face nodes. Visual Blocks for ML is a browser-based tool that allows users to create machine learning pipelines using a visual interface. We're launching nodes with Transformers.js, running models on the browser, as well as server-side nodes running Transformers pipeline tasks and LLMs using our hosted inference. With @Xenova @JasonMayes

You can learn more about it here https://huggingface.co/blog/radames/hugging-face-google-visual-blocks

Source-code for the custom nodes:
https://github.com/huggingface/visual-blocks-custom-components
radamesย 
posted an update over 1 year ago
view post
Post
2161
AI-town now runs on Hugging Face Spaces with our API for LLMs and embeddings, including the open-source Convex backend, all in one container. Easy to duplicate and config on your own

Demo: radames/ai-town
Instructions: https://github.com/radames/ai-town-huggingface
ยท
multimodalartย 
posted an update over 1 year ago
view post
Post
28567
The first open Stable Diffusion 3-like architecture model is JUST out ๐Ÿ’ฃ - but it is not SD3! ๐Ÿค”

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model ๐Ÿ–ผ๏ธโœจ, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english ๐Ÿค chinese understanding

Try it out by yourself here โ–ถ๏ธ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
radamesย 
posted an update over 1 year ago
view post
Post
2652
HiDiffusion SDXL now supports Image-to-Image, so I've created an "Enhance This" version using the latest ControlNet Line Art model called MistoLine. It's faster than DemoFusion

Demo: radames/Enhance-This-HiDiffusion-SDXL

Older version based on DemoFusion radames/Enhance-This-DemoFusion-SDXL

New Controlnet SDXL Controls Every Line TheMistoAI/MistoLine

HiDiffusion is compatible with diffusers and support many SD models - https://github.com/megvii-research/HiDiffusion
  • 1 reply
ยท
pcuenqย 
posted an update over 1 year ago
view post
Post
9779
OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!
ยท
radamesย 
posted an update over 1 year ago
view post
Post
2539
I've built a custom component that integrates Rerun web viewer with Gradio, making it easier to share your demos as Gradio apps.

Basic snippet
# pip install gradio_rerun gradio
import gradio as gr
from gradio_rerun import Rerun

gr.Interface(
    inputs=gr.File(file_count="multiple", type="filepath"),
    outputs=Rerun(height=900),
    fn=lambda file_path: file_path,
).launch()

More details here https://huggingface.co/spaces/radames/gradio_rerun
Source https://github.com/radames/gradio-rerun-viewer

Follow Rerun here rerun
radamesย 
posted an update over 1 year ago
radamesย 
posted an update over 1 year ago
radamesย 
posted an update over 1 year ago
view post
Post
3873
Here's a utility component for integrating your Gradio app with Hugging Face. This custom component enables you to search for models, spaces, datasets, and users.

pip install gradio_huggingfacehub_search

You can see it in action here. arcee-ai/mergekit-config-generator

And learn how to use it here radames/gradio_huggingfacehub_search
radamesย 
posted an update over 1 year ago
view post
Post
2808
Following up on @vikhyatk 's Moondream2 update and @santiagomed 's implementation on Candle, I quickly put togheter the WASM module so that you could try running the ~1.5GB quantized model in the browser. Perhaps the next step is to rewrite it using https://github.com/huggingface/ratchet and run it even faster with WebGPU, @FL33TW00D-HF .

radames/Candle-Moondream-2

ps: I have a collection of all Candle WASM demos here radames/candle-wasm-examples-650898dee13ff96230ce3e1f
radamesย 
posted an update over 1 year ago
view post
Post
3887
Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference ๐Ÿคฏ

It's very interesting how ControlNet Canny quality is comparable, but in a single step. Looking forward to when they release the code: https://github.com/GaParmar/img2img-turbo/issues/1

I've been keeping a list of fast diffusion model pipelines together with this real-time websocket app. Have a look if you want to test it locally, or check out the demo here on Spaces.

radames/real-time-pix2pix-turbo

Github app:
https://github.com/radames/Real-Time-Latent-Consistency-Model/

You can also check the authors img2img sketch model here

gparmar/img2img-turbo-sketch

Refs:
One-Step Image Translation with Text-to-Image Models (2403.12036)

cc @gparmar @junyanz
multimodalartย 
posted an update over 1 year ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! ๐Ÿ“

Model
๐Ÿ“ 2 base model variants mentioned: 2B and 8B sizes

๐Ÿ“ New architecture in all abstraction levels:
- ๐Ÿ”ฝ UNet; โฌ†๏ธ Multimodal Diffusion Transformer, bye cross attention ๐Ÿ‘‹
- ๐Ÿ†• Rectified flows for the diffusion process
- ๐Ÿงฉ Still a Latent Diffusion Model

๐Ÿ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

๐Ÿ—ƒ๏ธ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
๐Ÿ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
โœ๏ธ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
โœ… State of the art in automated evals for composition and prompt understanding
โœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
ยท