Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
anakin87Β 
posted an update Aug 13
Post
388
πŸ•΅οΈπŸŒ Building Browser Agents - notebook

No API? No problem.
Browser Agents can use websites like you do: click, type, wait, read.

πŸ““ Step-by-step notebook: https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/browser_agents.ipynb

πŸŽ₯ In the video, the Agent:
- Goes to Hugging Face Spaces
- Finds black-forest-labs/FLUX.1-schnell
- Expands a short prompt ("my holiday on Lake Como") into a detailed image generation prompt
- Waits for the image
- Returns the image URL


## What else can it do?
Great for information gathering and summarization

πŸ—žοΈπŸ—žοΈ Compare news websites and create a table of shared stories with links
▢️ Find content creator social profiles from YouTube videos
πŸ›οΈ Find a product's price range on Amazon
πŸš‚ 🚌 Gather public transportation travel options


## How is it built?
πŸ—οΈ Haystack β†’ Agent execution logic
🧠 Google Gemini 2.5 Flash β†’ Good and fast LLM with a generous free tier
πŸ› οΈ Playwright MCP server β†’ Browser automation tools: navigate, click, type, wait...

Even without vision capabilities, this setup can get quite far.


## Next steps
- Try a local open model
- Move from notebook to real deployment
- Incorporate vision

And you? Have you built something similar? What's in your stack?

In this post