Conf42 Machine Learning 2022 - Online

How Visual AI Makes Testing a Breeze

Video size:

Abstract

Apps break. It’s what they do. Everyone has seen skewed layouts, missing buttons, and overlapping text. Those visual bugs are such a pain because traditional test automation usually can’t catch them. However, they cause serious reputational risk because appearances matter. The types of problems scripts can catch usually require complicated element locators and assertions, too. The best way to catch visual problems is to look at them with human eyes. People are good at quickly noticing things that don’t look right.

If we can train an AI model to look for important visual differences between app snapshots, then we can automate visual testing! In this talk, I’ll show how how to apply AI-backed visual comparisons to end-to-end test automation. We’ll transform traditional tests into much simpler scenarios that save time for both development and execution. You’ll see how to make visual comparisons between baselines and updated snapshots. A picture is truly worth a thousand assertions. Ultimately, visual testing like this enables you to spend more time on proper test coverage and less time on automation implementation!

Summary

  • Andrew Knight is a developer advocate at Appletools. Also the director of Test Automation University, which offers several online courses to help you learn testing and automation. Check out his blog and follow him on Twitter.
  • Visual AI can quickly flag visual differences that both scripts and manual testers might miss. Testing is interaction plus verification. Visual testing is easier than traditional test automation. It's something teams should do first before attempting to automate longer, more complicated tests with traditional techniques.
  • We are going to automate a web test together in Java with selenium Webdriver. Then we will supercharge it with visual testing techniques using Apple tools. First we need a web app to test.
  • Apple tools uses visual AI to detect meaningful changes that humans would see. Pixel to pixel comparisons are inherently fragile. Visual AI makes visual testing robust and practical. Traditional assertions can be completely replaced by one line snapshot calls.
  • The visual test junit class has all visual testing code that I just showed you. It will capture a new snapshots and treat it as a checkpoints and compare against the baseline using visual AI to see if there's any differences. With Apple tools ultrafast grid you can test against multiple browsers.
  • Pandy: Teams should do visual testing from the start. Visual testing simplifies implementation and execution while catching more problems. It offers the advantage of making functional testing easier. You don't need to be an expert in AI or ML to use it.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Thanks for attending my talk today. My name is Andrew Knight, but you can call me Pandy for short. I'm the automation panda. Check out my blog and follow me on Twitter. Just Google Automation Panda and I'm sure youll find me. Currently I'm a developer advocate at Appletools, where I help folks get the most value out of their QA work with automated visual testing. I'm also the director of Test Automation University, which offers several online courses to help you learn testing and automation. And it's completely free. Please check it out. It's awesome. Now, you might be wondering what an automation panda is doing at a machine learning conference. Well, I want to share one way that AI is making a huge difference in the software testing and automation space, and that's visual AI. Traditionally, automated tests focused on things like text and attributes to make sure apps were correct. Unfortunately, that youll miss lots of problems. Visual AI can quickly flag visual differences that both scripts and manual testers might miss. I'm guessing most of you in the audience have done some form of testing before, even if it's as simple as making a small code change and rerunning your program. That's awesome. There are all kinds of testing unit tests, integration tests, end to end tests, web UI rest app, mobile load testing, performance testing, property based testing, behavior driven, data driven, you name it, there's a test for it. But what is testing? In a nutshell? In simplest terms, testing is interaction plus verification. That's it. Youll do something and you make sure it works. Every kind of testing reduces to this formula. Manual testing accomplishes both interactions and verifications by direct human interaction. Somebody needs to bang on a keyboard to drive a test. Automation drives interactions and verifications with a script. We'd like to think that automation is so great because it doesn't need any human intervention, but we all know that's not true. Humans still need to write the tests, develop the scripts, and fix them when they break. Paradoxically, test automation isn't fully autonomous. Visual testing helps to change that. Although humans still need to figure out interactions, visual testing techniques make verifications. Autonomous tests take snapshots of their views and look for changes. Over time, they catch more kinds of problems while ironically simplifying test code. Unfortunately, many folks seen to have the impression that visual testing is an advanced technique that requires a high level of testing maturity to be valuable. That's not the case at all. In fact, I want to flip that scripts entirely. Visual testing is easier than traditional test automation. This isn't some bleeding edge technology useful only to fang companies. It isn't out of reach for teams just starting their test automation journey. Visual testing makes functional testing easier and stronger. AI simplifies test development. It's something teams should do first before attempting to automate longer, more complicated tests with traditional techniques. Big claims, right? Let's see what I mean. We are going to automate a web test together in Java with selenium Webdriver using traditional interactions and verifications, and then we will supercharge it with visual testing techniques using Apple tools. First we need a web app to test. We could test an app of any size, but I'm going to choose a small one for the sake of our demo. This is applitools demo site. It mimics a banking application. You can try it yourself at demo appletools.com. The login page has a main icon, username and password fields, and a sign in button. Since this is a demo site, you can enter any username and password to log in after clicking the sign in button. Those main page loads there's a lot of stuff on the main page. The top bar has the name of the app, a search field, and icons for your account. The main part of the page shows financial data. Those left sidebar shows different account types. We could write a basic login test for this app in four steps. Load the login page, verify that the login page loads correctly, log into the app and finally verify that the main page loads correctly. This could be a smoke test. There's nothing fancy here. The trickiest part for automation would be deciding which elements to check on the loaded pages. We could automate this test in Java using selenium webdriver. Technically, we could automate it using any popular language and tool. Personally, right now I really like playwright and Python, but based on different reports I've seen, Java is still one of the most popular languages for test automation, and selenium Webdriver remains the most popular browser automation tool. JavaScript, C sharp, Python and Ruby are other popular languages for test automation, and Cypress is a very popular alternative to selenium. In Java, we can write a junit test class named logintest and create a test case method named login. This test case method calls four helper methods, one for each step. The first method, load login page, loads the demo app's login page in the browser. The second method, verify login page, verifies the appearance of five critical elements on the page, the logo, username field, password field, sign in button, and the remember me checkbox. It waits for each of these elements to appear using a helper method named wait for appearance. The third method perform login, enters a username and password and then clicks the sign in button. So far, so good. Nothing too bad. These are all typical webdriver calls. The fourth method verify main page is a doozy. Remember all the things on that page? Well, they'll need several assertions to verify. Some assertions merely check the appearance of elements. Others need to perform text matching. For example, to check the banner at the top that says your nearest branch closes in x minutes, we need to find the element, get its text, and then perform a regular expression match. Checking the account types and status fields require getting lists of elements, mapping their text values, and transforming the resulting data for comparisons. Despite this heavy lifting, this page still doesn't check everything on the main page. Dates, amounts, and descriptions are all ignored. Has a riskbased tradeoff. We could add more assertions, but they youll lengthen this method even more. They could also be difficult to write and become brittle over time. Tests just can't cover everything. If we run this login test against our app, it should pass without a problem. But what if the page changes? Here's a different version of the same page, with some slight visual differences. Can you see what they are? Let me go back and forth a few times for you to see. Will our login test still work? Will it pass or fail? Should it pass or fail? Looking at these two pages side by side makes comparison easier. The logos are different and the sign in buttons are different. While I'd probably ask the developers about the sign in button change, I'd categorically consider that the logo change is a bug. Unfortunately, as long as the page structure doesn't change, our login test will still pass. It wouldn't detect these changes. We probably wouldn't find out about these changes if we relied exclusively on traditional test automation. The step to verify that the login page loaded correctly only checks for the appearance of five elements by locators. These assertions will pass as long as these locators find elements somewhere on the page, regardless of where or how they appear or what they look like. Technically, this login page would still pass the test, even though we can clearly see it's broken. Traditional functional testing things on those most basic functionality of web makes, if it clicks, it works. It completely misses visuals. Those are huge test gaps. Adding more assertions probably won't catch these kinds of problems either. So what if we could visually inspect this page? That would easily catch any changes on the page we take a baseline snapshot that we consider good. And every time we run our tests, we take a new checkpoint snapshot. Then we can compare the two side by side to detect any differences or any changes. This is what we call visual testing. If a picture is worth a thousand words, then a snapshots is worth a thousand assertions. Automated visual testing is what tools like Apple tools does. One visual snapshot captures everything on the page. As a tester, you don't need to explicitly state what to check. A snapshot implicitly covers layout, color, size, shape, and styling. That's a huge advantage over traditional functional test automation. To be honest, testers have been doing visual testing since computers first had screens. Anyone can manually bang on a keyboard and look at the screen to see what changes. That's arguably the first kind of testing that anyone does. It's super valuable to take a quick glance at a page to see what's wrong. Humans can intuitively judge if a page is good or bad in a few seconds. Unfortunately, human reviews don't scale well. Modern apps have several screens worth checking, and continuous integration systems deploy changes multiple times a day. Humans make mistakes, they get tired, they miss things. They also have a limited time. This reminds me of the legend of John Henry, a folk hero from the United States. As the legend goes, John Henry was a railroad worker on the great Bend Tunnel along the CNO railway in West Virginia. When the company bought a steam drill, John Henry competed against it head to head with a ten pound hammer in each hand to see which could drill faster. John Henry drilled deeper than the steam engine could, technically winning the contest, but he died from exhaustion afterwards. The legend of John Henry serves has a parable that even the strongest, sharpest human is inevitably no match. For a machine to be relevant in a modern software shop, visual testing must be automated. But that's easier said than done. Programming a tool to capture snapshots and perform pixel to pixel comparisons isn't too difficult. But determining if those changes matter is. A good visual testing tool should ignore changes that don't matter, like small padding differences, and focus on the changes that do matter. Otherwise, human testers will need to review every single result, nullifying any benefit of automating visual tests. Take a look at those two pictures. They show a cute underwater scene with an octopus in a garden. There are a total of ten differences between these two pictures. Can you find them? I'll give you a few seconds to look. Unfortunately, a pure pixel to pixel comparison dont find any of those changes. I ran those two pictures through Apple tools using an exact pixel to pixel comparison and this is what happened. Except for the white space on the sides, every pixel was flagged as different as humans. We can clearly see that these images are very similar, but because they were a few pixels off on those side, automation failed to pinpoint meaningful differences. This is where AI really helps. Apple tools uses visual AI to detect meaningful changes that humans would see and ignore inconsequential differences that just make noise. Here I cause apple tools strict comparison which pinpointed each of the ten differences. Take a look. Did you find all ten yourself? Do you see any that you missed? I'll pause a moment for youll to look. That's the second advantage of good automated visual testing. Visual AI, like what Apple tools does, focuses on meaningful changes to avoid noise. Visual test results shouldn't waste testers'time over small pixel shifts or things those human wouldn't even noticing. They should highlight what matters like missing elements, differences, colors, or skewed layouts. Visual AI is a differentiator for visual testing tools. Pixel to pixel comparisons are inherently fragile. Visual AI makes visual testing robust and practical. So let's update our login test to do visual testing with applitools, first we need to create an appletools account. Anyone can create one for free. At the link I'm showing here, you can use your GitHub account or an email, and you don't even need a credit card. The account will come with an API key that must be set as an environment variable for testing. Next, we need to add the applitools eyes SDK to our project. Since we are using selenium Webdriver with Java, we need to add the Appletools eyes selenium Java three maven dependency to our PoM file. Then we need to set up the applitools configuration and runner for all tests in a suite. The visual grid runner will upload snapshots to Apple tools ultrafast grid and the configuration object sets the API key. Those test batch name and the browser configurations we want to test. This snippet shows configuration for Chrome, but we could test any other browser like Safari, Firefox or edge for genuine cross browser testing. The setup for each individual test still needs a Webdriver object, but it also needs an eyes object for capturing snapshots. Here we construct the eyes object, hook it up to the runner, and set its configuration. Then we open our eyes to start taking snapshots. Opening requires the webdriver differences, the app name and the test name. Those interaction methods can remain unchanged but we need to update the verification methods. Traditional assertions can be completely replaced by one line snapshot calls. Take a look at load login page. Five lines reduced to one and the visual snapshot technically has far greater coverage. The impact on verify main page is far greater. One visual snapshot eliminates the need for several lines of assertions. Things is the third major advantage visual testing has over traditional functional testing. Visual snapshots greatly simplify assertions. Instead of spending hours deciding what to check, figuring out locators and writing transformation logic, you can make one concise snapshot call and be done. As an engineer myself, I cannot understate the cognitive load things removes from the automation coding process. I said it before and I'll say it again. If a picture is worth a thousand words, then a snapshot is worth a thousand assertions. So let's see visual testing in action. It's time to dive into code and run it. All right, so here is that Java project I mentioned. I have it opened in Intellij idea and the visual test junit class has all visual testing code that I just showed you. Just to show you again, we have our setup before the entire test suite that creates our runner and our configuration. We have it configured to run against one browser, chrome browser with 800 by 600 viewport. We seen we're using Chrome driver. We're setting up our eyes with our configuration. Opening the eyes test case has four steps, loading the login page, verifying it, performing login, then verifying the main page. At the end we quit and then we have the code for our interactions and verifications. So let's run this test to establish baselines snapshots. I already have it configured so I'm just going to hit the run button. And now Intellij idea will run our test. It should take only about a minute. So right now it's running it locally. Now it's going to upload the snapshot to Apple tools ultrafast grid. And if I switch over to the dashboard right now it's empty. If I refresh we can see that that tests catch is now there has those name of the catch we gave it and after, let's see, after 17 seconds it captured both snapshots and it's marked them as new green for passing. So we can see those are our new baselines. So we have a baseline for the login page and we have a baseline for the main page. So now let's run that test again on the second run since it has baselines, what the visual testing comparison will do is it will capture a new snapshots and treat it as a checkpoints and compare against the baseline using visual AI to see if there's any differences. So let's run again. Again, it'll take just a moment here. Running locally now running in the dashboard. If we refresh the dashboard, we can see it's there, it's running and it passed. So if we look at the checkpoints, we can see baseline versus checkpoint image and they're the same. So nothing is highlighted differently. Same goes for that main page. Everything's the same, everything checks out. But now what happens if we introduce a visual difference like we saw before? So I'm going to run those test again, but this time I'm going to use that alternative login page and we'll see what happens here. So we'll give it a minute. I just launched it again. It's running locally and now it should be uploading to the grid. So if I refresh we can see there's the new batch and it's running. Give it just a few seconds. And this time we have a problem. So the results show here the main page didn't have any significant changes. So it's still green, but the login page certainly did. And so it's marked yellow as unresolved. What that status means unresolved is that appletools has detected a visual difference and it's up to you as those tester to determine if that was a good change or a bad change. So when we pull up the side by side we can see all the things that were highlighted different. We see the icon has different, we see the sign in versus login button was different and we see that the remember me check mark has shifted. So I'm going to say that this is a bad difference and I'm going to give it a thumbs down. And what that does now is it will mark this particular snapshot as failed. So we save our changes. And what's really cool now is that if we were to run this test again, which I'll do here, I'll just click those run and we'll wait for it. What should happen is that since we've already marked that kind of visual change as a failure, anytime that visual checkpoint appearances again app youll should automatically mark it as a failure rather than us need to come in here and repeatedly fail it. So we run again and there it goes. And automatically look at that. Because it was the same type of failure apple tools was smart enough to know. Boom, failed. So there's another really cool thing we can do with these visual snapshots. Let me jump back in the code, look at those browser configurations. Previously I only had one browser configured, but with Apple tools, if you test in the ultrafast grid, you can actually test against multiple browsers. How is this possible? It's the magic of a snapshot. So I've been careful to say snapshot and not screenshot. A snapshot captures everything on those page in the DOM, the HTML, the CSS and the JavaScript. A screenshot is nothing more than a collection of pixels, which it's static and doesn't change. But if you have a snapshots with all the stuff on the page, what youll could do is you could rerender that on any browser configuration that you like. So even though you captured the page, let's say on Chrome, you could rerender that snapshot on Safari, on Firefox, on ie, or maybe even on mobile browsers, which is really, really cool. So let's do that. We'll get rid of our one browser configuration and I'll bring these back to the forefront. So here now we have five different desktop browser configurations, Chrome, Firefox, iE, Edge, Safari, as well as five mobile emulated devices, iPhone X, Pixel, Galaxy Nexus and even can iPad. Each one has a different viewport size as well as a different orientation. What's really cool about this is you can test browsers and devices that aren't on your local machine. For example, I'm running on a Mac, so I'm not going to have ie eleven here, but I could run these tests visually against ie eleven in the Apple tools test cloud. Really awesome stuff. So let's run this again just to see what happens. Going to run it back on the original working website. So we'll kick it off and then let's flip over to the dashboard again and we'll wait for results here. When we do this, even though we specified ten browsers, it will still only run one browser on our local machine, which we decided was Chrome. And it'll take that snapshot. And here we go. You can see how it's testing all those checkpoints against the different browser configurations. Chrome, ie, Firefox, Edge, Safari, and it's doing it concurrently. I in my code have set a concurrency level of five, meaning five checkpoints can run in parallel. And already we're seeing. So the first one is marked as pass because that was the previous configuration that we had, but all the others should be marked as new because they would be new baseline images. And there we go. Everything passed. This is a really, really cool way to not only do visual testing, but also to achieve cross browser testing. You run the test once on your machine or in a CI server and then you leave the rest up to applitools test cloud to render it against all the different browser configurations. This is much, much faster than traditional cross browser testing where you would need to run those entire test from start to finish however many times you want for the differences. Browser configs if we look here, these ten tests finished in only 46 seconds. Typically a web UI test would take about half a minute to a minute to run ten tests. You're talking several minutes. Here were sub minute 46 seconds. Really, really cool stuff. Lightning fast cross browser testing is visual testing's fourth big advantage. To do cross browser testing with traditional functional tests, each test must run on each browser configuration all the way through with visual snapshots. Each test runs only one time and snapshots are rerendered on each target configuration, making tests faster and more reliable. Before I conclude this talk, those is one more thing I want y'all to consider when a team should adopt visual testing. I can't tell you how many times folks have told me, Andy, that visual testing thing looks so cool and so helpful, but I don't think my team will ever get there. We're just getting started and we're new to automation and automation is so hard and I don't think we'll ever be mature enough to use a tool like apple tools. I just smacked myself in those face because visual testing makes automation easier. I really think teams should do visual testing from the start. Consider this strategy start by automating a smoke test that navigates to different pages of an app and captures snapshots of catch. The interaction code would be straightforward, and the snapshots are just oneliners that would provide an immense amount of value for relatively little automation work. It's the 80 20 rule, 80% of the value for 20% of the work. Then later, when a team has more time or more maturity, they can expand the automation project with larger tests that use both traditional and visual assertions. Let those power of AI help you test automation is hard. No matter what tool or language you use. Teams struggle to automate tests in time and to keep them rumbling. Visual testing simplifies implementation and execution while catching more problems. It offers the advantage of making functional testing easier. It's not a technique only for those on the bleeding edge. It's here today and it's accessible to anyone doing test automation. You don't need to be an expert in AI or ML to use it. Visual testing is a winning strategy. It has several advantages over traditional functional testing. Please note, however, the visual testing does not replace functional testing. Instead, it supercharges it. If you want to give it a try, you can sign up for a free account with Apple tools. Clone the example project I showed today, or any of our others, and run tests on your local machine. Thank you all for attending my talk today on visual testing. Again, my name is Pandy and I'm the automation Panda developer advocate at Apple Tools and director of Test Automation University. Be sure to read my blog and follow me on Twitter. I always love to chat about testing and automation, so thanks again and enjoy the rest of 42.
...

Andrew Knight

Developer Advocate @ Applitools

Andrew Knight's LinkedIn account Andrew Knight's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways