AI & ML interests

Machine learning, deep learning, generative AI, LLMs

Recent Activity

salma-remyxย 
posted an update 24 days ago
view post
Post
3292
We've built over 10K containerized reproductions of papers from arXiv!

Instead of spending all day trying to build an environment to test that new idea, just pull the Docker container from the Remyx registry.

And with Remyx, you can start experimenting faster by generating a test PR in your codebase based on the ideas found in your paper of choice.

Hub: https://hub.docker.com/u/remyxai
Remyx docs: https://docs.remyx.ai/resources/ideate
Coming soon, explore reproduced papers with AG2 + Remyx: https://github.com/ag2ai/ag2/pull/2141
  • 1 reply
ยท
salma-remyxย 
posted an update about 1 month ago
view post
Post
1027
The future is arriving too fast not to use programmatic discovery and replication.
Search arXiv โ†’ Execute in 30 seconds with pre-built Docker environments

Check out our latest integration with AG2 to accelerate your discovery loop.
As easy as:
from remyxai.client.search import SearchClient
from autogen.coding import RemyxCodeExecutor

# Search by topic
papers = SearchClient().search(
    "data synthesis strategies",
    has_docker=True,  # Only papers with pre-built environments
    limit=10
)

executor = RemyxCodeExecutor(arxiv_id=papers[0].arxiv_id)

remyx_executor.explore(
    goal="Run a test with my model remyxai/SpaceThinker-Qwen2.5VL-3B",
    interactive=False  # Automated exploration
)


Tutorial: https://github.com/ag2ai/ag2/blob/4c6954e3959fe672980191f264e30d451bc23554/notebook/agentchat_remyx_executor.ipynb
PR: https://github.com/ag2ai/ag2/pull/2141
salma-remyxย 
posted an update about 2 months ago
view post
Post
3717
Thanks again to @ag2 for hosting us at their Community Talks!
@terry-remyx walked us through a technical deep dive into GitRank, our automated pipeline that converts research papers with code into containerized, executable environments and generates specialized tests tailored to users' specific codebases.

In case you missed it...
Full recording: https://www.youtube.com/watch?v=N_FNfZ71s2I
Deck: https://docs.google.com/presentation/d/1S0q-wGCu2dliVWb9ykGKFz61jZKZI4ipxWBv73HOFBo/edit?usp=sharing
salma-remyxย 
posted an update about 2 months ago
view post
Post
2925
We're joining the @ag2 team in discord to present a deep-dive into how we've used the framework to build GitRank in their Community Talks

The GitRank pipeline is used to:
๐Ÿ“ฐ power personalized paper recommendations
๐Ÿณ build environments as Docker Images
๐ŸŽฏ implement core-methods as PRs for your target repo

Don't miss it! Tomorrow, Sept 25 at 9:00 am PST: https://calendar.app.google/3soCpuHupRr96UaF8
salma-remyxย 
posted an update about 2 months ago
view post
Post
1505
We've added intelligent full-text search across our pre-built Docker images for arXiv papers with ready-to-run code and papers straight from arXiv.

Natural language queries.
Semantic understanding.
One search to find both the paper AND the runnable code.

Try it today: https://engine.remyx.ai/resources/
Join us at Experiment 2025: https://experiment.remyx.ai
salma-remyxย 
posted an update about 2 months ago
view post
Post
5352
Rolling Benchmarks - Evaluating AI Agents on Unseen GitHub Repos

Static benchmarks are prone to leaderboard hacking and training data contamination, so how about a dynamic/rolling benchmark?

By limiting submissions to only freshly published code, we could evaluate based on consistency over time with rolling averages instead of finding agents overfit to a static benchmark.

Can rolling benchmarks bring us closer to evaluating agents in a way more closely aligned with their real-world applications? Perhaps a new direction for agent evaluation?

Would love to hear what you think about this!
More on reddit: https://www.reddit.com/r/LocalLLaMA/comments/1nmvw7a/rolling_benchmarks_evaluating_ai_agents_on_unseen/
salma-remyxย 
posted an update about 2 months ago
view post
Post
3988
Trustworthy AI evals has been an industry challenge for the last few years, so what's missing?
Causal Reasoning.

Model based eval frameworks can't tell you if your changes actually improved user outcomes - you need to take a systems level approach.

At Remyx, weโ€™re building the intelligence layer for AI experimentation. Check out this example on how we start laying the scaffolding to launch controlled experiments to turn your hypotheses into insights on what drives performance for your application.

Check out the latest at Remyx in our docs: https://docs.remyx.ai
Try your first experiment today! https://engine.remyx.ai
salma-remyxย 
posted an update about 2 months ago
view post
Post
3232
Mark you calendars for Thursday Sept 25th at 9am PST ๐Ÿ“†
We're joining the @ag2 team in discord to present a deep-dive into how we've used the framework to build GitRank in their Community Talks

The GitRank pipeline is used to:
๐Ÿ“ฐ power personalized paper recommendations
๐Ÿณ build environments as Docker Images
๐ŸŽฏ implement core-methods as PRs for your target repo

Attached is a draft outlining what we plan to cover in the talk.
Would love to gather your feedback to make this insightful for all: https://docs.google.com/presentation/d/1S0q-wGCu2dliVWb9ykGKFz61jZKZI4ipxWBv73HOFBo/edit?usp=sharing
salma-remyxย 
posted an update about 2 months ago
view post
Post
3242
Reproducing research code shouldn't take longer than reading the paper.
For papers that include code, setting up the right environment often means hours of dependency hell and configuration debugging.

At Remyx AI, we built an agent that automatically creates and tests Docker images for research papers, then shares them publicly so anyone can reproduce results with a single command.

We just submitted PR #908 to integrate this directly into arXiv Labs.

If you believe in making reproducible research accessible to everyone, give it a bump!: https://github.com/arXiv/arxiv-browse/pull/908
  • 3 replies
ยท
salma-remyxย 
posted an update 2 months ago
view post
Post
2538
Search is such a fundamental part of content discovery, yet ends up overlooked or poorly implemented in so many apps we use every day.

We built hundreds of Docker images for arXiv papers with a codebase - it's tough to find what you're looking for unless you happen to have the arXiv id handy using DockerHub's search.

So we added full text search over these resources so that you're that much closer to testing a new promising idea. More resources to be indexed soon!

Full Demo: https://www.youtube.com/watch?v=GjYReWbQZw8
Try it here!: https://engine.remyx.ai/resources
Join us at Experiment 2025!: https://experiment.remyx.ai
salma-remyxย 
posted an update 2 months ago
view post
Post
4051
Most apps don't have great full-text search over their assets.

We've developed an agent to automate the environment building and testing of experimental codebases sourced from arXiv. We push these containerized reproductions daily to Docker Hub: https://hub.docker.com/u/remyxai

However, searching for them can be challenging unless you know the specific arXiv ID associated with each paper.

We are currently working on implementing a search feature in Remyx, which will make these assets easily discoverable and ready for testing ๐Ÿ” Stay tuned!

Discover your next best idea to experiment with here: https://engine.remyx.ai
salma-remyxย 
posted an update 2 months ago
view post
Post
3991
Science is the vibe-killer

Some critique on the state of the technology
Presenting an alternative vision for scaling the scientific method in AI engineering

https://remyxai.substack.com/p/vibes-dont-scale
  • 2 replies
ยท
salma-remyxย 
posted an update 2 months ago
view post
Post
6336
The docs for GitRank are live! Follow along to see how you can:

๐Ÿ“– Daily personalized papers from arXiv matching your project context
๐Ÿ‘ฉโ€๐Ÿ’ป One-click PRs with complete implementation, tests, and docs
๐Ÿš€ Parallel experimentation - test multiple ideas with ease

Your next great idea is probably in a paper you haven't had time to implement.

Try it today! http://docs.remyx.ai/resources/ideate
salma-remyxย 
posted an update 2 months ago
view post
Post
3594
GitRank

We built an agent to surface and implement high-potential ideas for your repo, asynchronously generating containers, tests, and PRs so you can evaluate what works and double down on it.

Check out the demo: https://youtu.be/frgPsTclc1k

Come replicate and specialize a test for your repo! GitRank is live on Remyx.
Docs: https://docs.remyx.ai
App: https://engine.remyx.ai
Example PR here: https://github.com/smellslikeml/experimental-vqasynth/pull/727
salma-remyxย 
posted an update 3 months ago
view post
Post
2785
Are you coming to SF this Fall?

Next week, we'll be at the AI Agent Builders Summit.
And in late October, GitHub Universe, ODSC West, and Experiment 2025.

We're sharing what we've learned while building agents to help you test new research ideas out of the arXiv into PRs for your repo.

This Summer, we've analyzed thousands of papers, ranking each for relevance to our work before building hundreds of Docker images and opening hundreds of PRs for our repos.

Read more about PapersWithPRs: https://www.reddit.com/r/LocalLLaMA/comments/1mq7715/paperswithprs_dont_just_read_the_paper_replicate/

๐—”๐—œ ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—•๐˜‚๐—ถ๐—น๐—ฑ๐—ฒ๐—ฟ๐˜€ ๐—ฆ๐˜‚๐—บ๐—บ๐—ถ๐˜: https://luma.com/agents-world-tour-sf
๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ ๐—จ๐—ป๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฒ: https://githubuniverse.com/
DISCOUNT CODE: TAKEMETOUNIVERSE
๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—บ๐—ฒ๐—ป๐˜ ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ: https://luma.com/145xyuyw
salma-remyxย 
posted an update 3 months ago
view post
Post
2604
๐—ฃ๐—ฎ๐—ฝ๐—ฒ๐—ฟ๐Ÿฎ๐—ฃ๐—ฅ๐˜€
Lately, we've been experimenting with recommending arXiv papers based on the context of what we're building in AI.
At the same time, we're using an agent to help automate the building and testing of Docker Images.

Check out the example here:
https://hub.docker.com/repository/docker/remyxai/2507.20613v1/general

Next, we're tasking our #ExperimentOps agent to open PRs in a target repo, to evaluate the core concepts from a new research paper in the context of your application and your kpis.

Operationalize your Experimentation!
Find Your Frontier!
#BeAnExperimenter
  • 1 reply
ยท