I’m a fan of ATDD. Whenever I start a new feature I try to describe it in form of BDD or SBE tests. Why it’s important to me? It’s not only about having tests in the project but also about having control over development. These tests and code coverage metric is my feedback loop over implementation.
On code review on the first commit, I ask my QA to review test scenarios. This is a great fast way to get information if we understand the feature correctly. But let’s see it with an example.
Example feature
Create API that allows its clients to manage BTC buy orders. Clients place orders to purchase BTC for fiat at a market price. API does not create transactions on the Bitcoin blockchain, but simply stores the order information in its database for further processing.
- Creation of a Buy Order requires the following data at minimum
- currency – represents the currency (ISO3 code one of: EUR, GBP, USD)
- amount – represents the amount of currency (0 < x < 1,000,000,000)
- Buy Order creation must be idempotent.
- amount of BTC which is the requested amount of fiat buys at the exchange rate. Use a precision of 8 decimal digits, and always round up. Do not lose precision in calculations.
- Sum of bitcoin amount of all orders stored in the system must not exceed 100BTC.
System must not allow creation of new orders which would cause the constraint to be violated.
Initial project setup
Let’s first see how I prepare the project before feature implementation. Example pull request with the initial setup is here and here. But for this topic let’s focus on testing setup. Besides the CI pipeline, I’m setting a 100% code coverage requirement from the start and mutation testing that fail if any mutant will pass tests.
[coverage:run] branch = True [coverage:report] fail_under = 100 omit = */migrations/* src/application/endpoints_*/api_pb2*.py exclude_lines = pragma: no cover def __repr__ def __str__ @abstractmethod raise NotImplementedError raise
Why fail under 100% code coverage?
By 100% I don’t mean everything in the project. There are things that I don’t want to test. In the configuration, you can see that I omit DB migrations files and I’m excluding abstract methods of interfaces, a string representation of objects, and raises of NotImplementedError (this will be explained later).
The goal with this configuration is to have fast feedback if I write a code that is not covered by my test scenarios. If it’s not then I expect two reasons behind it. The first is that I’ve didn’t see some scenarios. In this case, I need to add them and ask for a review and discussion with my QA. This is a very important thing cause I will find out about it not on production, but in the middle of implementation. The second is that my code is not correct. It may be that code is unreachable or it’s not changing behavior (mutation tests cover it). So also good information for quality of code.
Test scenarios
Like I mentioned the first reviewer will be QA. So I need just to describe scenarios in a readable way. Gherkin is great for it, but I’m avoiding using any extra frameworks. I believe that writing scenarios in a BDD way, but using plain python is good enough. So first commit is looking like this. Below is only the testing part.
class TestOrdering: @mark.xfail(raises=NotImplementedError, strict=True) @mark.parametrize("currency", ["EUR", "GBP", "USD"]) def test_creating_an_order(self, currency: Text, ordering): ordering.when_creating_buy_order_with(currency=currency) ordering.assert_that_order_was_created() @mark.xfail(raises=NotImplementedError, strict=True) def test_created_order_uses_current_exchange_rate(self, ordering): ordering.given_1btc_exchange_rate(EUR=33681.3874) ordering.when_creating_buy_order_with(33681.3874, "EUR") ordering.assert_that_order_was_created(with_bitcoins=1) @mark.xfail(raises=NotImplementedError, strict=True) def test_summary_amount_of_orders_cannot_exceed_100btc(self, ordering): ordering.given_1btc_exchange_rate(EUR=100) ordering.given_created_order_with(bitcoins=50) ordering.given_created_order_with(bitcoins=50) ordering.when_creating_buy_order_with(2000, "EUR") ordering.expect_failure_for("Exceeded 100BTC ordering limit") @mark.xfail(raises=NotImplementedError, strict=True) @given( paid=decimals(min_value=0.0001, max_value=999.9999, places=4), exchange_rate=decimals(min_value=20000, max_value=90000, places=4), ) @settings(suppress_health_check=[HealthCheck.function_scoped_fixture]) def test_bought_bitcoins_are_round_up_with_precision_of_8_digits( self, paid: Decimal, exchange_rate: Decimal, ordering, ): ordering.given_1btc_exchange_rate(EUR=exchange_rate) ordering.when_creating_buy_order_with(paid, "EUR") ordering.assert_that_order_was_created( with_bitcoins=round_up(paid/exchange_rate, to_precision=8), )
If it’s not looking like BDD, then look at it if it would be almost word to word in Gherkin.
Scenario Outline: Creating an Order When creating buy order with <currency> Then order was created Examples: | currency | | EUR | | GBP | | USD | Scenario: Created order used current exchange rate Given BTC exchange rate on 33681.3874 EUR When creating buy order with 33681.3874 EUR Then order was created with 1 BTC Scenario: Summary amount of orders cannot exceed 100 bitcoins Given bitcoin exchange rate on 100 EUR And created order with 50 BTC And created order with 50 BTC When creating buy order with 2000 EUR Then creating buy order fails with "Exceeded 100BTC ordering limit" Scenario Outline: Bought bitcoins are round up with precision of 8 digits Given BTC exchange rate on {rate} EUR When creating buy order with {paid} EUR Then order was created with {bought} BTC Examples: | rate | paid | bought | | 20000.0026 | 361.1169 | 0.01805585 | | 20000.0004 | 26.3681 | 0.00131841 | | 20003.2924 | 26.3681 | 0.00131819 | | 20000.0001 | 26.3681 | 0.00131841 | | 26710.8865 | 26.2145 | 0.00098142 | | 20000.0183 | 26.2145 | 0.00131073 | | 26710.9047 | 26.2145 | 0.00098142 | | 20005.9632 | 462.4657 | 0.02311640 | | 20000.0240 | 39.4759 | 0.00197380 | | 20000.0183 | 26.2145 | 0.00131073 | | 20000.0000 | 26.2145 | 0.00131073 |
Same scenarios, and honestly I never had a problem showing the python version to QA or even non-technical people and they could validate such scenarios. This needs to be in the BDD convention, the BDD framework in my opinion is not important.
NotImplementedError usage
I want to have these ATTD tests, and other tests to be committed separately from code. But these tests will fail without code. Decorating tests with pytest.mark.xfail can hide problems on tests cause by default it will pass all results from tests. Even if it will succeed you will see no difference. So using it prevents you from having feedback from these tests.
To have maximum control and benefits from tests that I have created I use a combination of NotImplementedError and pytest.mark.xfail.
In tests I still use pytest.mark.xfail but with some additional configuration “@mark.xfail(raises=NotImplementedError, strict=True)“. I’m expecting the test to fail with NotImplementedError every time. If it will fail with any other exception this test will fail and I will have feedback that something is wrong. Also if the code will pass without error then the test will fail. So I will see that scenario is achieved, and I can remove xfail decorator.
I implement very minimum code, mostly function name with body raising NotImplementedError. This is indicating to review what is my entry point for implementation, and what I will implement in the next commit. You will see how it works in the next part.
API proposition
Let’s assume that QA and PO reviewed and agreed on proposed scenarios. In this task, I need to implement API. This is something that I want to review as soon as possible. But this time it’s strictly technical review. That’s why I don’t need BDD-like tests. So next commit has tests for API and API declaration. Minimum what needs to be coded to review proposition by another developer.
Let’s start with tests. This is more about API structure, not from use-cases so tests are more mechanical, checking calls, responses, and validations. Also on this level solution for idempotency is introduced. I proposed to add a request_id parameter on which API will verify if it was called more than once.
class ApiCreateBuyOrderRequestFactory(DictFactory): request_id = Faker("uuid4") amount = Faker("pyfloat", right_digits=4, min_value=1, max_value=400) currency = FuzzyChoice(["EUR", "GBP", "USD"])
class TestCreateBuyOrderRequest: @mark.xfail(raises=NotImplementedError, strict=True) def test_after_creating_redirects_to_created_order(self, api_client): request = CreateBuyOrder() response = api_client.post(CREATE_ORDER_URL, json=request) assert response.status_code == 201 order_url = response.headers["Location"] order = api_client.get(order_url).json() assert order["request_id"] == request["request_id"] @mark.xfail(raises=NotImplementedError, strict=True) def test_creating_order_is_idempotent(self, api_client): request = CreateBuyOrder() first = api_client.post(CREATE_ORDER_URL, json=request) second = api_client.post(CREATE_ORDER_URL, json=request) assert first.headers["Location"] == second.headers["Location"] @mark.parametrize( "request_id", ["ILLEGAL", uuid4().hex[:-3], "", random(), None] ) def test_reject_when_no_uuid_id(self, api_client, request_id): request = CreateBuyOrder().update(request_id=request_id) response = api_client.post(CREATE_ORDER_URL, json=request) assert response.status_code == 422 def test_reject_when_negative_amount(self, api_client): request = CreateBuyOrder().update(amount=-1) response = api_client.post(CREATE_ORDER_URL, json=request) assert response.status_code == 422 def test_reject_when_amount_higher_than_1_000_000_000(self, api_client): request = CreateBuyOrder().update(amount=1_000_000_000) response = api_client.post(CREATE_ORDER_URL, json=request) assert response.status_code == 422 @mark.parametrize("currency", ["PLN", "AUD", "XXX"]) def test_reject_when_currency_not_eur_gbp_usd(self, api_client, currency): request = CreateBuyOrder().update(currency=currency) response = api_client.post(CREATE_ORDER_URL, json=request) assert response.status_code == 422
To make it work I need to implement minimal code that will raise NotImplementedError in the controller. You can see it below. Validations are verified by the framework so these tests will pass from the start, but full functionality will be introduced in the next commits.
class BuyOrder(BaseModel): id: UUID request_id: UUID bitcoins: condecimal(decimal_places=8) bought_for: condecimal(decimal_places=4) currency: Currency @router.get( "/{order_id}", name="orders:get_order", response_model=BuyOrder, ) async def get_order(order_id: UUID) -> BuyOrder: raise NotImplementedError class CreateBuyOrderError(BaseModel): detail: Text class BuyOrderCreated(BaseModel): order_id: UUID location: Text @router.post( "/", name="orders:create_order", status_code=201, response_model=BuyOrderCreated, responses={409: {"model": CreateBuyOrderError}}, ) async def create_order( request_id: UUID = Body(...), amount: condecimal( decimal_places=4, gt=Decimal(0), lt=Decimal(1_000_000_000), ) = Body(...), currency: Currency = Body(...), ) -> BuyOrderCreated: raise NotImplementedError
This commit goes to another developer to verify the next crucial part of this feature.
So with very minimal coding, I can review the most crucial parts (but not all, there is a tricky feature of max 100BTC in all orders, which won’t be covered in this post).
100% coverage as safety net
At this point, I’m pretty sure I have a good safety net for my feature. I have reviewed acceptance test scenarios, I have reviewed API calls tests, and combining it with 100% test coverage configuration I feel I’m in a very good spot.
From this moment if coverage will drop I have fast feedback and can react as soon as possible. Did I forget about some scenario? Did I implement code that won’t be used? Now I can react faster. Also with acceptance tests feature is documented.
With these tests, you have great tests for refactoring. With the scaling of a system, and changing requirements you may want another solution for these features. But with this approach, you are prepared for refactoring, and being more agile.
Summary
It’s not about 100% code coverage, but the whole approach to the first steps of the development workflow.
Decide at the front what you want to be covered by tests. Put a 100% coverage rule on it, and don’t think about what % is accurate. This is just a metric, not a quality indicator.
Start from reviewing crucial parts at beginning of development. To do this you need to write acceptance scenarios and solution proposition as first steps. This will save you trouble if problems will appear on the last day of development or on production.
Make it as default behavior and change it into automatic tests. You will gain documentation of features and automated tests this way.
Having tests don’t upfront with 100% coverage you gain easier development, easier refactoring, and a general safety net for your development. Now it’s much in your control of what is happening.