A Dev Ops operating model shows its best qualities when most testing is done automatically. Remove exploratory testing by replacing it with automated testing.
This can be a difficult selling point to both business users and those in IT. Developers would or should be familiar with the concept, but it could be brand new thinking to operators.
Most developers will have experience of writing test code, and also of trying to maintain someone else’s tests. This can be easy if the tests are well written, but can grind development to a halt if not. If there are not tests, making changes can be extremely difficult and error prone.
Writing good unit tests is almost an art form, it is learned through experience of performing the skill regularly. Using test driven development to drive application code usually means that at least the code under test will be easy to write tests for, but that is not always the case.
We cannot, however, expect business users to trust unit tests. They do not shed any light on what is going on under the hood. We need a different set of tests to demonstrate the behaviour of the system rather than at the individual test level.
In my company, we have been growing our use of acceptance tests written with Cucumber syntax (given, when, then) and trying to use the domain language that the users understand. Again this is somewhat of an art form; it can be difficult to get something practical, understandable, and technical enough to work.
Users need to be able to independently verify each part of the test scenario. They need to ensure the “given” is correctly defined, the “when” actually happens, and that the “then” is verified. They only have to do this when the test is first written, or when it is changed. Once that is done, they need to learn to trust the test.
I’m less familiar with how our operations team carry out changes, but my understanding is that they log onto a server, make a number of changes which may include a reboot, run some verification tests such as checking services are running, and then log the work in a manual log.
A good first step would be to automate some of the verification scripts that they use. Developers are in a good position to do this; perhaps writing PowerShell or Bash scripts to do the verification.
The next step might be to have these tests running regularly. Why run them only after changing something? In theory they can be set to run on a schedule and log the results to a centralised location. This should be a shared location where the applications can also log.
A third step would be to build a dashboard with graphs or other visualisation techniques that show the state of entire organisation. Give everyone in the team access: developers, operators, and QA. Extend coverage to the development and test boxes. Let business users see the results from testing environments and production.
Classic v. Mockist Unit Testing v. Acceptance
The Mockist school of thought is that each unit, usually a method but possibly a class, should have tests written for it by itself. Everything not in the method that is referred to must be mocked.
One negative with this is that if the method name changes, say during the regular re-factoring to keep code clean, then the tests will probably be miss-named unless someone spots it.
Another problem is that you may need to write a lot more code to mock the behaviour of the collaborating classes than you would if you allowed the collaborator to be a part of the code under test.
An alternate approach, called Classic by Martin Fowler, is to test groups of logical functionality together. This could be testing a group of methods together, or a class with some of its collaborator. It breaks the strict rules of the unit, by redefining a unit as a piece of functionality that runs as a whole.
For more information, read Martin Fowler’s post Mocks Aren’t Stubs.
The Testing Pyramid
The above types of test will still be difficult for a business user to trust. They cannot sign off tests that mock out huge pieces of the system, databases, for example. Behavioural testing can allow this by specifying tests using Gherkin, which defines the setup of the scenario, the action that takes place and the result we will verify.
Acceptance tests are arguably more difficult to write, but serve as living documentation for the build.
A rule of thumb should be to measure the number of test in this ascending pattern:
- Manual tests — aim to have the least number of these
- GUI automated tests
- Integration and DB tests
- Automated behavioural tests
- Unit tests (both mockist and classic) — have a lot of these
The time it takes to execute something at the top of the above list is going to be longer than those at the bottom. Make sure that unit level tests run fast, and ensure that there are many of them.
Slow running tests tend not to be kicked off by developers. Fast running unit tests are quick to run and developers can get into the habit of running them before committing code to source control.
Unit tests should run fast and should run automatically. Developers should be able to run them locally before committing in a few minutes. You should automate the running of these tests so that commits to trunk are not accepted until the unit tests pass.
Acceptance tests may take longer to run, and should be also automatically run. The results need to be made publicly visible so that QA can see that the artefacts they pull into an environment have had appropriate tests run.
The preference would be to test one application only with acceptance test, mocking certain dependencies to facilitate testing. For example, if a service is used, mock the interface within the test code rather than trying to deploy the service onto a server so that the test can run.
As we move up the Testing Pyramid, you want to there to be fewer tests going on, but each may take longer to run. GUI based testing can be slow to run, requiring a desktop environment matching that of a user to be created, tested on and torn down afterwards. Try to keep these tests broad rather than specific, trusting unit tests and acceptance tests to cover all the different branches that are involved.
Some level of integration tests may be necessary. Integration means many different things depending on who you talk to. I define it as two distinct parts of the system testing that the interaction between the two is working correctly.
For example, most applications will use some sort of database. Connecting to these can be expensive (slow) and not ideal for unit testing. However, it would be better to do some level of testing to ensure that database access is functioning correctly. Because these tests will naturally be slower, it makes sense to have them run as a part of the CI build.
Manual testing should sit at the top of the pyramid. These should be supplemental only to the automated tests already run. A nice model to use is that when QA pull an artefact into an environment, that the continuous integration system creates an identical environment to run all the long running tests. By the time QA had completed their manual testing, the automated tests can be reviewed and the change signed off.
If your pyramid is upside down, with lots of heavy GUI testing at the top and few unit level tests at the bottom, it is likely that testing will take a long time and become a major blocker and delayer for getting code released to production.
It may be possible to run the tests in the cloud, spinning up enough servers and running different parts of the test cycle on each in parallel, so that the overall time is reduced. Once the tests have been run, the servers can be released back to the cloud, keeping the cost quite low.
This approach will buy you time. Focus testing on reversing the “ice cream cone” of testing and turning it into a solid pyramid. See these slides for some visualisations of the testing pyramid and the ice cream cone.
An important metric for your code base is how much of it is covered by test code. Eighty per cent is a good target to have. Do you know your code coverage statistics? Could you get the information easily? What direction is it tracking in?
We use a product called SonarQube for monitoring levels of test code coverage, as well as many other checks using static code analysis.
SonarQube allows you to define and use control gates, for example, that new code checked into your source control has a minimum level of test code coverage. We get warnings whenever this rule is breached.
In this way we can gradually begin to build up a suite of automated tests. As you change code, you can add tests. New code must have the target test coverage.
On the downside, we do not have the SonarQube build running upon every commit to source control. Rather it runs once per day overnight or on demand. This is because some of our behavioural and integration tests are slow. So the feedback we get is often a next day thing.
Another issue is that our Continuous Integration build does not fail if a gate threshold is breached. We have further work to do here to tie everything together.
SonarQube has a plug-in for most IDEs so that developers can see issues directly highlighted in the code as they type. This helps clear up code smells and bugs as changes are made to existing code.
- Do you have adequate testing? Would you trust your automated tests alone before releasing a minor change to your production application?
- What is your test coverage for your code base? Don’t know, then find out.
- How many tests would you show a business user and be confident that they would understand and sign off on the product with? If none, investigate BDD and ATDD style tests.
- Do you operations team have manual checks they run after making changes? Could these be automated? Could they be run continuously?
- Are your tests running automatically? Do you have a Continuous Integration build server? If not get your tests running automatically on a dedicated (maybe multiple dedicated) server.
- What process to you have to stop new code that does not have unit tests from being added to you source code? If you don’t have one investigate SonarQube.