Why AI doesn't work for element detection in software testing and beyond
First, let's talk about elements. What are they, and why should AI help me detect them? Elements are objects a user can interact with. On the website, software, systems, and so on– classic user interface elements.
To automate your software testing, it would be helpful if you don't have to do all the work yourself, and the AI would assist you, saving you lots of time through element detection.
Sounds great, BUT … in reality, it doesn't work.
Now you are wondering why it shouldn't work if almost all the big players have it.
Let's put it this way:everyone asks for artificial intelligence. People are willing to spend money on the latest and greatest AI has to offer. The easiest way to use AI is object detection. But AI doesn't mean fancy and added value. Sometimes there is just AI involved to have AI.
The right question for you as a user would be, what do I get from the AI? And let's put aside the marketing material of the software test automation companies offering AI element detection. In theory, it is great. In reality,it's not possible to keep up with the promises.
And I like to show you why.Why is it technically not possible. If a salesperson tells you otherwise, you should get suspicious.
There is no generic way how a specific element looks like. What is a button in one software is a graphic in another one. Or what looks like a text block may be an interactive element in other applications. So you can't train a generic AI model. You can only have an AI model for specific applications.
Think of the Windows phone. It seems like text you see in the picture below, but these text snippets are links.
And another example is the bills on a cashier system. They are buttons but look like images. These are just some random examples to get an idea.
In a corporate environment,you may have a style guide for the software in your company, so every text element looks the same, every bottom looks the same, every link loos the same …You could train an AI model for this specific application to detect these elements. But every company has bought software as well. A Salesforce or Microsoft Dynamics button doesn't look according to your style guide. You would need to train the AI on these applications as well.
The elements are too different to have a general AI model that detects them in every software you like to test.
And there is another catch.
Let's say we have an AI that detects that there is a button. Now the AI needs to know how to click the button. But the same could be with a text block, where you also need to click.And that's a problem.
The elements are detected visually without any interaction context. It should identify visually the interaction and not the element's visual to create value from the element detection.
Let's go back to the two examples from above. The context that it is a link and not text or the bill a button and not an image can't be made by the AI, only by a human being.
Or let's discuss Tesla,another famous example of why it doesn't work.
My favorite example to showcase that is Tesla. How long did it take you to learn to drive a car? Where I live, it's 12 to 35 hours. Tesla has been working on self-driving vehicles for approximately 14 years, already driven billions of kilometers, and their car still doesn't drive autonomously. Even their partially self-driving is only authorized in the US.
Even Mercedes, with their self-driving level 3, can only drive autonomously up to 60km/h on good weather days, straight roads, and some more restrictions.
They put so much time,effort, and training into it, and they still can't drive autonomously.
And that's just one domain.Software applications are from different domains, which makes them even more complex.
With a car, it is just about putting four wheels in the right direction with the right speed level.
Identifying software by an AI and knowing how to use it is way more complex.
Even though element detection by AI doesn't work and never will, augmented intelligence is another intelligent way to save you time. You combine intelligent algorithms with a human. You detect the elements autonomously but not the type of the element.
The AI detects the elements,and you tell the AI how to interact with them.
Our Visual Cues and Visual Sens algorithms detect the elements on the screen that might be relevant for the user. Then you select the elements you'd like to interact with and tell the AI if it is a dropdown, a button, a checkbox, or whatever it is. And with that,it understands how to interact with the elements without any further due.
This way, you're much faster and have a generic way to use AI. And you don't have any of the problems mentioned above.
That is the best approach for this use case right now.
It doesn't matter in what direction the dropdown opens if it's scrollable. As the AI knows it is a dropdown, it can handle it, no matter how it works.