Talk to Qwen2Audio with Gradio and WebRTC ⚡️
Talk to OpenAI using their multimodal API
Segment objects in images using text prompts or scribbles