Given When Then

Automated behavioural tests provide living documentation and help to create a shared language between business experts and the development team.

Given When Then
Photo by Jason Goodman / Unsplash

Writing unit tests as part of a Test Driven Development practice is a great way to design and build libraries and applications that are testable as a first principle, with the side effect of reducing the number of bugs, and making maintenance costs lower when the code needs to change. Business users will not get any benefits, however, as the tests are tied too closely to individual units and don't help explain what the code does. Another layer of tests above the unit level is needed to give some visibility about what a collection of classes and functions can do.

Enter Behavioural Driven Development (BDD), which creates a shared language between developers and people with less technical skills, such as product owners, business analysts, subject matter experts or product managers. This shared language, or lexicon, can be useful for describing what a system does in terms of business outcomes. It also lends itself to writing system level tests that verify these business requirements in the form of automated tests.

Cucumber and Gherkin

The idea is that all conditions of satisfaction, the "how we know we are done" of user stories, are written in a standard format known as Gherkin. The structure of each specification is always the same and will look like this:

Given <the system in a particular state>
When <some event occurs>
Then <a certain side effect occurs>

The key elements in the statement are that it is written in English, it should be written in business language avoiding overuse of technical terms, and that it is a shared requirement between the business and development team. Think of it as a conversation starter that the product owner states and shares to all interested groups. It will be tweaked and changed until it represents what is required for the software.

The structure of the statement is always the same and must have the three words Given, When and Then. These correspond to the three phases of any unit or integration test: setting up the test, running the code under test and verifying the results. Additional clauses beginning with And can be added if there is more than one thing to setup or to verify. Usually the when clause should only have a single action. An simple example with two statements in the Given section might be as follows:

Given a user, Bob, that is logged into the website
And they are on the homepage
When they click the settings icon
Then the settings page is displayed

When teams use Gherkin to describe the desired behaviour of a system that is undergoing development, a shared language quickly becomes apparent. I recall one project where the developers on my team had never heard the word "tranche", which describes a slice of an insurance policy. Allowing our business users to help create the gherkin descriptions, prompted us to ask about the meaning of the word and quickly adopt it when we spoke with them. Each story was documented with one or more gherkin statements and there was consistency throughout the stories related to the feature. Since we used terminology that the finance team used, anyone from that department was able to understand what the given-when-then statements meant.

Describing tests in terms that everyone can understand is great. There is less chance of making mistakes in development. The QA can more easily understand what they need to verify. The statements remain true (although some can become obsolete as features evolve) and can become a part of the documentation that describes how the software works.

The biggest advantage is that these statements can easily be converted into automated tests. That's where Cucumber comes into the picture. Cucumber is testing library that allows each statement in the given-when-then triplets to be tied to actions in the code. It is available for a host of languages, including most JVM languages, Ruby and C#. Check out the Cucumber Installation page for details. There are many other testing frameworks that support BDD testing using Gherkin, by the way, but I'll be concentrating on Cucumber for this series.

Photo by Lucie Rangel / Unsplash

Taking the example above, the first two lines would run a test web server for the application, create a user named Bob, carry out the actions to log them in and then render the homepage. The When line would be tied to clicking a certain part of the screen, the settings icon, by using the DOM object returned from rendering the webpage. With the click simulated, the final line would verify that a new page is rendered as a side effect of the action.

Each step requires code to bind the required action to code that does the required interaction. This can become tricky to do, especially when databases or other service calls are required. Code that has been written using TDD will generally be easier to tie into system tests using Cucumber due to their inherent testability.

Benefits and Drawbacks

Living Documentation

Tests that have been automated with Gherkin and Cucumber (or a similar product) should run upon every build of the system. As with unit tests, each time a developer checks in code and pushes to the shared repository, the Continuous Integration (CI) system should kick off the build and make sure that no tests are failing. As the tests can be read and understood by both the development team and the subject matter experts, tests written in this way serve to provide living documentation of the system.

Published Results

Cucumber can be configured to output the results from test runs in multiple ways. The idea might be for the CI system to pick up the output and to publish it so that everyone can view the results. QAs can use it to see what testing has been automatically done, focusing their exploratory testing in other places. Business users can see that specific scenarios are covered and request new ones for new test cases. Developers can quickly find out where in the code a bug has been introduced to cause a failed test.

Tables of Inputs and Output

Although they are arguably more difficult to read, Gherkin allows for tables of data to be used as input. This allows a single Gherkin test to cover a variety of input samples. Each row of the table can be labelled, so if a single scenario fails the problem row can be identified in the test failure output.

Overly Verbose

Having too many Gherkin tests can be daunting if trying to understand the behaviour of the system. This counteracts the "Living Documentation" somewhat, as it can be difficult to scroll through hundreds of given-when-then tests to find the right one. When using Cucumber to automate business tests, I tend to lean towards the "less is more" approach. Certainly, use Gherkin to describe all conditions of satisfaction, but try to select only the most useful ones to automate, preferably for the most common actions, or the happy-path through the system.

Developers Only

While anyone that knows the subject matter should, with some practice, be able to create worthwhile and descriptive Gherkin statements, the promise that business users will be able to write automated tests using this format is false. While a business user might be able to copy an existing automated test to add a new scenario by changing some numbers, there are too many things stopping them doing this as part of the regular development process. The tests need code to tie them to actions and observations, and understanding the regular expressions used to extract meaning is something developers are going to have to help with. Not to mention that the source code will usually not be available or editable by anyone outside the development team itself anyway. By all means, involve business users with crafting Gherkin statements and discussing and tweaking them, but never expect that they will write the tests themselves.

Update: Cucumber now uses what they call Cucumber expressions which can be used in place of regular expressions. In theory they are easier to use, but will still need deveoper assistance to bind them to the code.

Conclusion

While unit tests are a key piece of any well built software, they are intended for use by developers only and don't prove the system as a whole works as it should. Behavioural testing expresses conditions of satisfaction using a standard format called Gherkin. They are discussed and negotiated between non-technical subject matter experts and the development team which encourages a shared domain language and are a great way to describe system behaviour.

Cucumber allows behavioural tests to be automated by tying each statement to action in the code. The tests, once written, will run with each build of the continuous integration system with results published if so desired. In this way they act as a living documentation for the system.