The final step in UI test automation is to check the correct appearance. What seems to a simple image processing task makes even modern AI struggle. It surprisingly fails to correctly attribute many simple changes.
LLMs can explain images but only detect differences in features they're trained to recognize. Traditional image comparison libraries require perfect pixel alignment with little tolerance for distortions. This limitation is problematic in visual test automation when screenshots of new and previous versions need to be compared. The goal is to identify how an element has changed, not just if it did. Did it just move a bit, or did the content change, too? This talk covers common algorithms used in test automation, compares the performance of various AI tools, and explains why this task is hard.
We will discuss:
- what multimodel foundation model can achieve and where they struggle
- why this task is easy for humans
- what tools are easily available in test automation suites such as applitools, cypress or playwright.
- how to setup and train a neural network to do this using opencv and tensorflow.
- what else can be done and where are limits