Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: UI testing using multimodal LLMs (kodefreeze.com)
1 point by kodefreeze 40 days ago | hide | past | favorite
Hi HN,

I built this tool to solve the "flakiness" problem in UI testing. Existing AI agents often struggle with precise interactions, while traditional frameworks (Selenium/Playwright) break whenever the DOM changes.

The Approach: Instead of relying on hard-coded selectors or pure computer vision, I’m using a multi-agent system powered by multimodal LLMs. We pass both the screenshot (pixels) and the browser context (network requests, console logs, etc) to the model. This allows the agent to:

"See" the UI like a user and accurately map semantic intent ("Click the Signup button") to precise coordinates even if the layout shifts.

The goal is to mimic natural user behavior rather than following a predefined script. It handles exploratory testing and finds visual bugs that code-based assertions miss.

I’d love feedback on the implementation or to discuss the challenges of using LLMs for deterministic testing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: