Continuous Integration Strategies

Automate Your Automation

Hey awesome! You have some automation at your company, and your tests are being run for you.

This should free up some time right?

Eh… maybe not.

As DevOps takes hold on many software teams, the need for “automating the automation” becomes apparent.

We need a way to start the automation for us, under certain conditions, and report whether the tests passed or failed, allowing us to figure out what the next step is.

This process is called Continuous Integration (CI), or Continuous Delivery (CD).

People use the terms interchangeably. So I do too. It’s continuous, that’s the important part.

Scope

CI/CD is a pretty big topic. Lots of options out there depending on what you’re using for software versioning, what operating system you use, what kind of product you’re supporting, and how to deploy it.

There are lots of resources online concerning these pieces. I’m just going to be talking about the part of it that concerns testing.

Why Not Just Run Automation Manually?

I know it might seem trivial to run tests, even if you have it boiled down to a script where all you have to do is type “run”. But it’s still a risk to run automation manually.

Why? Because we tend to put off doing simple stuff. If it’s really easy, other things can take precedence. And then it doesn’t get done.

It’s also a context switch. If you have to change focus look up from deep testing, to kick off a set of tests, then come back, you lose focus.

“What was I doing? Was it.. man. MAN. What was I doing??

That is millions of dollars right there, I tell you what.

How Often Should You Run It?

Automation is almost useless unless you’re running it regularly.

CI is for giving fast feedback on the health of your product. Running your tests often will give a lot of primary and secondary benefits. Here are a few:

It Pesters People Into Responding

When there are tests that fail, it’s a beacon and it’s due to a bug, that failing test will be a beacon to someone that something needs to be fixed. Whether it’s code or the test itself that’s broken, the quick feedback you get from running automation at regular intervals, means you get a fix faster. The longer something’s broken, the harder it is to fix.

It Hardens Your Test Suite

Nobody’s perfect at writing tests. Some tests will fail due to problems with the test logic. Some will fail intermittently.

Maybe there’s a pattern to the failures. Without running the tests regularly, that visibility is hard.

It Prevents Bit Rot

The worst thing about picking up a project after it’s sat there for months is when you expect it to work (because it worked before and hasn’t been touched for months), and all the wheels fall off.

The technical term for this is “bit rot”. I mean, the tests aren’t rotting, but enough stuff has changed that now the tests don’t apply. It’s the tests trying to hit a target that has moved.

Running tests often will prevent this from happening–if tests start failing due to changed requirements then you can fix them when the change happens, not months (or years!) later when nobody remembers what the tests even do.

Strategies

For this post, let’s assume you have a large amount of tests. Maybe 1000.

The technical term for this is “buttload”. This is a buttload of tests. It’s a lot. 

So how do we strategically run this buttload of tests, while still adhering to the “fail fast” methodology? Some strategies are outlined below.

Again, there are many tools out there, and this would be a huge post if I covered all of them. Although most of these have a Jenkins flavor, I’m sure some or all of these would apply elsewhere.

Categorize Your Tests

Most any test framework (JUnit, Cucumber, etc.) has a feature to let you group similar tests together under a custom name.

Instead of manually trying to pick out which tests to run, just tag them as a certain category, and run that category’s worth of tests.

You can add new tests to those categories, and the method of executing those tests wouldn’t need to change. It’s less to think about. Some good examples are smoke, integration and regression tests.

Why do this? It makes it easier to pick tests to run. It also makes for easier debugging.

For example, if you have 100 tests and run all of them, and they all fail, it would take time to look at each one. It’s highly likely that the same reason failed all the tests though (and let’s be honest–after the first, what, 20? you’d probably assume the same thing).

However, if you have some tests that you run first–like smoke tests–where that bug could be found first, that would save a ton of time.

If your tests fail at the Login section, then having a Login smoke test would tell you that exact problem earlier what tests should you put in smoke that, if one fails, then should anything else run?

Chain Jobs

If you’ve categorized your tests, you can now put those categories in a particular order.

For example, people generally agree that smoke tests should be run first. But a continuous integration tool can let you run your more complex tests after your smoke tests pass. If the smoke tests fail, then the process stops.

This is called “chaining jobs”, and it’s a great practice to help with the “fail fast” methodology.

If something in the smoke tests would cause a failure, then it saves time two ways:

  • That same failure would likely show up in a lot of other tests, so now you don’t have to weed through those tests to determine the same problem happened for all of them, and
  • If any of these tests fail, you don’t have to run anything else. With the proper reporting structure too, you’d find out quicker than having to wait for the entire buttload of tests to finish. If you can get the same answer in 10 or 50 tests, that’s much better.

You can also chain your tests to a precondition such as deploying a build, or compiling a package, or even running unit tests.

Visualize Results

It’s much easier to make sense of things when we have some graphic representation of test results, than looking at a bunch of raw output.

I know, I know, looking at wads of data makes you feel like a hacker, and you get that dopamine hit from doing Nerd Stuff. But really, you want to be able to present what’s going on to people like BAs, other QAs, PMs or whoever else.

If the CEO can rock up to your location, and see how healthy the product is, that’s awesome.

Huge TVs are getting cheaper every day and they are great for showing build health. They’re also great for helping enforce accountability.

For Jenkins anyway (and apologies as I don’t know much about the other CI tools out there yet), there are many plugins available to take raw output, and present in a pleasant human-consumable format.

Parallelize Your Tests

Why? Faster execution. Even if you categorize your tests, there may be chunks of tests that would still take awhile to get through. And, this is where you’re going to see huge benefits in CI.

I think you’d be hard pressed to find a machine/tool that DOES NOT know how to handle multiple threads. All the cool kids are doing it these days.

Is there a way for you to break your tests up into discrete pieces? Well, I probably don’t need to ask. I bet there is a way. Maybe run by feature, or run by function or something. But once you have them chopped up, try running those smaller chunks in parallel. The tests will finish even faster.

Tell, Don’t Wait To Be Asked

It’s better for people to be let know something failed than to require them to go find out something. I’m pretty sure most tools in the CI/CD arena can send you an email when a test fails.

When something bad happens, let people know about it. Not only will they be able to respond faster, but they’ll be held accountable more too.

Quarantine Known Bugs

There may be a delay between when your test finds a bug and the bug gets fixed. Meanwhile, each time the test runs, it will continue to fail.

I’ve had good success in the Cucumber space, with putting scenarios into a feature called “known_bugs.feature”. Once a dev is notified of the bug. then, instead of having known bugs scattered throughout all the other features, I have them all in one place.

This solves 3 problems:

  • People know it’s being looked at, simply because it’s in that category–“known”,
  • It’s highly likely that it’s another problem causing the failure, so it’s less time that needs to be spent looking at the tests that probably failed for the exact same reason, and,
  • If a failure happens in another feature, we know right away that hey, this is something new, so there’s less chance of inspecting the failure and finding out it’s that same thing that happened last run. Saves time.

Quarantine Unstable Tests

Sometimes your tests will act flaky–sometimes passing, sometimes not–and you might not have time to figure out what’s wrong with them.

Until you do, rather than keep having them fail (or not) out in your working tests, categorize them as unstable so you can work on them later.

Keep running them of course, because when they pass they give good info. But if they fail, you’ll know because of the category that the failure is likely because of it being unstable. Plus, if they’re all grouped together, it’s easier to detect a pattern that can cause those tests to be flaky.

Run Known Bugs First

You know, I haven’t employed this strategy yet but it makes sense: why not just run the tests that found bugs, first?

They’ll probably fail, which is expected, but also, the more the tests are yelling about stuff, the quicker they’ll get fixed (hopefully!)

If you use this job as a precondition to other tests running, you’ll get quicker results than running all the tests.

Test Changed Code First

If a product was working fine before, and a change was made, it’s highly likely that a bug introduced will be either right there in that code, or elsewhere and has been turned up by the new code.

You may have to get creative here, but if you can link changed code in your software repo, to certain tests, then run just the tests that involve the changed code. Then run the remaining tests.

What’s All This For?

It’s for helping create (and enforce) and environment where any code is always ready to deploy, anytime.

As a dev, without continuous integration, you might not check in as often because you’re afraid to break the build. This causes a lot of technical debt after awhile, and when you finally DO check in, it’s like chucking a boulder into a pond–it makes waves.

If however, as a dev, you know that your code will be tested as soon as your check in, you probably won’t check in gigantic changes. It’s safe to check in even single-line changes and confirm that it works.

As a QA, obviously you don’t want to manually run those tests–that’s why you have automation in the first place. But you also don’t want to have to kick those tests off because the whole point of automation is to free up time for you to determine better test cases instead of running the same manual tests again.

If you’re spending that time running the automation and stuff, then you haven’t reclaimed that time, and you’re also context switching between testing and running tests.

There’s also a lot of strategy that can be programmed in, that helps take some of the decision off you, again, freeing up time for you to do other fun stuff!

As you look in your organization and your company, keep your eyes open for opportunities to free up time with CI/CD. Where can you try out some of these tactics and gain some benefits? Share below!

–Fritz

Advertisements

5 thoughts on “Continuous Integration Strategies

  1. Great post.

    Suggestions on how one can start experimenting with Jenkins if they have not used it before to setup a test run of a handful of automation tests?

    Like

    1. Great Post
      Suggest me how can I improve negative scenario’s more, usually we develop scripts for Happy path, so we never find more defects, when we do Manual testing we are finding more Defects, and when we are running Automation test scripts taking more than 16 hours to complete our execution, how can we overcome with more execution time and how can we avoid the Manual effort in this scenario.

      For example we have 64 scripts and this is taking for us to Run daily 16 to 18 Hours, how can we reduce the execution time,what is the best way to reduce the execution time,please suggest.

      Like

      1. Hey Sriveni,

        The easiest way to get test scripts to complete faster is to run them in parallel.

        If you’re already using Jenkins, try setting up multiple jobs, each responsible for running a few of the 64 scripts.

        You may want to group the scripts by execution time–have the really long ones by themselves, and group the quicker ones together. Ideally, the whole suite of 64 tests should complete in the time it takes to run your longest test.

        For negative scenario strateiges, let me direct you to a couple of other posts: https://testzius.wordpress.com/2015/05/30/testing-is-like-fishing/ and https://testzius.wordpress.com/2015/06/05/tripwire-testing/.

        As you do more negative tests, you’ll probably notice something common about the bugs you find. And the first instinct is to write a test for every bug, but that creates a maintenance cost.

        So instead of writing a test for every bug, ask: What kind of bug am I looking for here? And then follow up with: Is there a way to write a single test for that -class- of bug?

        An example just happened at my client. We have a lot of pages with a grid of search results, and multiple columns worth of data–think “First Name”, “Last Name”, “City”, “State”, etc.

        There’s one search bar, and when you enter a term it fires off a hairy-looking database query to pull rows where one of the fields contains that term. It’s written with UX in mind though, so if I type “Michael”, I’m probably looking for rows where the first name is Michael.

        What kind of bugs am I looking for in this case? If I expected some rows to come back, maybe it’s because a new column added on the screen wasn’t added in the query too. Or maybe column names in the query are misspelled. Or if I’m supposed to be able to do partial matches, the query looks for an exact match instead.

        But for all bugs in that class of bugs, I’m still just doing a search, and making sure that all items in a particular column have an expected value. And that one test can be applied across multiple screens.

        So I think if you look at negative scenarios in terms of bug classes, instead of individual bugs, you’ll get where you want to go quicker.

        Hope this helps,
        Fritz

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s