ICCV2023

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

YeolJoo submitted a paper 8 days ago

Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure

AdinaY authored a paper 9 days ago

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

BryanW authored a paper 9 days ago

RecTok: Reconstruction Distillation along Rectified Flow

View all activity

AdinaY

posted an update 6 days ago

Post

391

Following up on LLaDA 2.0 , the paper is now out on Daily Papers🔥
It has sparked a lot of discussion in the community for showing how discrete diffusion LLMs can scale to 100B and run faster than traditional AR models.
LLaDA2.0: Scaling Up Diffusion Language Models to 100B (2512.15745)

Jianxiong

authored 2 papers 6 days ago

Veila: Panoramic LiDAR Generation from a Monocular RGB Image

Paper • 2508.03690 • Published Aug 5

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Paper • 2512.13604 • Published 9 days ago • 69

Nymbo

posted an update 6 days ago

Post

1673

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

AdinaY

posted an update 9 days ago

Post

4456

Finch 💰 an enterprise-grade benchmark that measures whether AI agents can truly handle real world finance & accounting work.

FinWorkBench/Finch

✨ Built from real enterprise data (Enron + financial institutions), not synthetic tasks
✨ Tests end-to-end finance workflows
✨ Multimodal & cross-file reasoning
✨ Expert annotated (700+ hours) and genuinely challenging hard

AdinaY

authored a paper 9 days ago

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Paper • 2512.13168 • Published 10 days ago • 49

BryanW

authored a paper 9 days ago

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published 10 days ago • 4

Jianxiong

submitted a paper to Daily Papers 9 days ago

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Paper • 2512.13604 • Published 9 days ago • 69

Nymbo

posted an update 29 days ago

Post

4975

🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)

Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool

The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot

1 reply

phyang

authored 2 papers about 1 month ago

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

Paper • 2502.17421 • Published Feb 24

AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery

Paper • 2507.16005 • Published Jul 21

AdinaY

posted an update about 2 months ago

Post

3334

Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license

AdinaY

posted an update about 2 months ago

Post

700

Chinese open source AI in October wasn’t about bigger models, it was about real world impact 🔥

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the “world model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next 🚀